's Avatar

@datascienceweekly

64
Followers
356
Following
45
Posts
13.11.2024
Joined
Posts Following

Latest posts by @datascienceweekly

Preview
Data Science Weekly - Issue 642 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 642, by @DataSciNews open.substack.com/pub/datascie...

12.03.2026 23:03 👍 0 🔁 0 💬 0 📌 0
’ve been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I’m using Codex as my coding agent. Similar to what I did with Claude, I’ve been “fine-tuning” Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking.

The pattern in this post works surprisingly well when you have the following conditions:

Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, …
Powerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository.
Structured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn’t and is really powerful. e.g., return exactly True or False and nothing else.
Save money! Unlike llm or other tools/libraries that require an OPENAI_API_KEY, Codex can use your ChatGPT subscription, making things “free”.

’ve been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I’m using Codex as my coding agent. Similar to what I did with Claude, I’ve been “fine-tuning” Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking. The pattern in this post works surprisingly well when you have the following conditions: Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, … Powerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository. Structured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn’t and is really powerful. e.g., return exactly True or False and nothing else. Save money! Unlike llm or other tools/libraries that require an OPENAI_API_KEY, Codex can use your ChatGPT subscription, making things “free”.

I've been using this pattern to "specialize" Codex for vaguely defined tasks like classification, filtering, soft sorting, ...

davidgasquez.com/specializing...

Made more than 10,000 invocations so far (reusing my ChatGPT subsciption) and am really happy with the pattern!

22.10.2025 18:38 👍 10 🔁 1 💬 0 📌 2
Preview
22 years of Brain Science: what CoSyNe tells us about the evolution of Neuroscience Tracking the intellectual DNA of Computational and Systems Neuroscience through its flagship meeting

I tracked every keyword in 22 years of Cosyne abstracts to map how computational neuroscience evolved — from Bayesian brains to neural manifolds to LLMs — and where it's heading next.

11.03.2026 01:09 👍 147 🔁 64 💬 7 📌 17

so hard to *really* comprehend

12.03.2026 20:44 👍 0 🔁 0 💬 0 📌 0
Preview
SORTEE Webinar: On what makes good sharable and reproducible R code, how to do it, and why it’s good for science For this month's SORTEE webinar, Dr. Dax Kellie from the Atlas of Living Australia will present on good, sharable, and reproducible R code in science

In one week I'll be talking about tips for reproducible R code and why science would love you to try these tips on your own code too 🧪😍🌏

It's an online talk, so feel free to watch comfortably from your couch. Hope to see you there!
@sortee.bsky.social #rstats

events.humanitix.com/sortee-webin...

10.03.2026 02:46 👍 51 🔁 29 💬 2 📌 0
Post image

I knew it. This confirms what I knew all my life. I may have Aphantasia (I do ...) but I see colors exceptionally well.

www.keithcirkel.co.uk/whats-my-jnd...

11.03.2026 21:46 👍 18 🔁 1 💬 9 📌 2
Preview
B12 3.0 A decade of helping customers build their home online

I've never built anything for a decade professionally, but here we are! blog.marcua.net/2026/03/12/b...

12.03.2026 14:58 👍 2 🔁 2 💬 0 📌 0

But is it far enough away to start running in the opposite direction? (or at least try to get behind some heavy-duty stuff?)

12.03.2026 16:13 👍 0 🔁 0 💬 1 📌 0

98 million videos for a grapefruit video. (watched it twice).

Takeaway - the secret to life is caring more about something than anybody else.

12.03.2026 16:12 👍 2 🔁 0 💬 0 📌 0
Post image

In this months' blog post, we’ll explore how to add vector layers and legend in a map with QGIS. step by step here: www.miriam-lerma.com/blog/2026-03...

12.03.2026 07:52 👍 4 🔁 3 💬 0 📌 0
Preview
Claude Code isn’t going to replace data engineers (yet)

A new blog post! In which I discover that even Claude Code has its limits, certainly when it comes to replacing data engineers

👉🏻 rmoff.net/2026/03/11/c...

(There's also a companion post if you like poking around Claude session logs to see what it's up to: rmoff.net/2026/03/11/c...)

12.03.2026 10:16 👍 11 🔁 1 💬 0 📌 0
Figure shows the proportion of successful putts by distance (where we have integrated out the missing distances) and geometrical model 2 (as presented in https://users.aalto.fi/~ave/casestudies/disc_putting/disc_putting.html) based putting probability by distances based on data for top 33 PDGA MPO players.

Figure shows the proportion of successful putts by distance (where we have integrated out the missing distances) and geometrical model 2 (as presented in https://users.aalto.fi/~ave/casestudies/disc_putting/disc_putting.html) based putting probability by distances based on data for top 33 PDGA MPO players.

I've made a geometrical model for disc golf putting with uncertainty in 2D angle and distance control.
Based on the model, the putting angle accuracies of top PDGA MPO and FPO players are about 1° and 1.4°, respectively. See more at users.aalto.fi/~ave/casestu...

12.03.2026 10:33 👍 16 🔁 1 💬 3 📌 0
Preview
Data Science Weekly - Issue 641 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 641, by @DataSciNews open.substack.com/pub/datascie...

05.03.2026 18:13 👍 0 🔁 0 💬 0 📌 0

Congratulations, Bruno!

05.03.2026 10:58 👍 1 🔁 0 💬 0 📌 0
Preview
A Few Claude Skills for R Users – R Works The community has come together to create some great Claude Skills that you can try out today.

I rounded up a few Claude Skills for #RStats users.

Huge thanks to the creators who developed them. They share Skills for everything from tidyverse code to brand.yml files to learning while using AI.

Hope the list is useful, and please let me know what I missed! 🧡

rworks.dev/posts/claude...

03.03.2026 14:05 👍 149 🔁 41 💬 4 📌 6
Video thumbnail

You can now visualize how a color palette distributes across OKHsv, OKHsl, OKLCh and CIELab, and compare 6 distance metrics side by side. Zero dependencies, raw WebGL2.

Took me 3 years to make something I wasn't too embarrassed to share 🙃

meodai.github.io/color-palett...

04.03.2026 21:40 👍 994 🔁 134 💬 27 📌 3
Preview
Data Science Weekly - Issue 640 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 640, by @DataSciNews open.substack.com/pub/datascie...

26.02.2026 13:54 👍 0 🔁 0 💬 0 📌 0
Preview
Data Science Weekly - Issue 639 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 639, by @DataSciNews open.substack.com/pub/datascie...

19.02.2026 22:01 👍 0 🔁 0 💬 0 📌 0
Preview
Data Science Weekly - Issue 638 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 638, by @DataSciNews open.substack.com/pub/datascie...

12.02.2026 19:36 👍 0 🔁 0 💬 0 📌 0
Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142

Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142

Simulated null distribution of a slope of 0.8 and p-value of 0.002

Simulated null distribution of a slope of 0.8 and p-value of 0.002

Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis).

There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%).

Statistically significant
The p-value is < 0.001 and our threshold for α is 0.05

In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1%

Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.

Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis). There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%). Statistically significant The p-value is < 0.001 and our threshold for α is 0.05 In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1% Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.

Evidentiary standards

When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty.

With p-values:

If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero.
If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero.
Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.

Evidentiary standards When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty. With p-values: If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero. If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero. Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.

I just whipped up this little #QuartoPub site last week that demonstrates how I teach p-values/hyp-testing through simulation both with live OJS and with #rstats, and I think it's super neat! It has examples for diff-in-means, diff-in-props, and regression slopes nullworlds.andrewheiss.com #statsky

11.02.2026 21:14 👍 139 🔁 26 💬 3 📌 5
Post image

Malaysia’s R community is growing! 🇲🇾 From a small network into a platform that actively connects students, researchers, and industry practitioners

r-consortium.org/posts/bringi...

#rstats #opensource #datascience #Malaysia #Shiny #tidyverse #community #analytics

09.02.2026 22:22 👍 6 🔁 2 💬 0 📌 0
A screenshot of an interactive map of all medals won so far at the 2026 Winter Olympics by place of birth of the winners

A screenshot of an interactive map of all medals won so far at the 2026 Winter Olympics by place of birth of the winners

Winter Olympics 2026 medalists by place of birth: an interactive map I built (again) thanks to @wikipedia, @wikidata and #rstats).

Check out the interactive version of the map: https://giocomai.github.io/olympics2026nuts/medalists_map.html

11.02.2026 07:23 👍 9 🔁 5 💬 1 📌 0

📈 A while back I did promise to put a post together how I generate publication-ready figures. Before I am off to #CNY2026, I finally found the time. May this be useful to some...
Also I am curious to hear what other tricks are out there.
jaquent.github.io/2026/02/crea...

#rstats #ggplot #dataviz

10.02.2026 10:14 👍 26 🔁 9 💬 4 📌 0
A hexagon R package logo, with the package name “whistledown” in a calligraphy-style font. A silhouette of a quill representing Lady Whistledown’s letters, and three bees for the Bridgerton family crest.

A hexagon R package logo, with the package name “whistledown” in a calligraphy-style font. A silhouette of a quill representing Lady Whistledown’s letters, and three bees for the Bridgerton family crest.

Dearest Gentle Reader,
I’m happy to announce the release of my new R package, “whistledown”, with color palettes from the hit show #Bridgerton!

#RStats #ggplot

11.02.2026 20:02 👍 20 🔁 3 💬 1 📌 0
Preview
Transport Modeling in R High-performance tools for transport modeling - network processing, route enumeration, and traffic assignment in R. The package implements the Path-Sized Logit model for traffic assignment - Ben-Akiva...

I’m thrilled to introduce flownet (sebkrantz.github.io/flownet/), a new R package for transport modeling, supporting stochastic or deterministic traffic assignment to large networks, and powerful tools for (multimodal) network processing/simplification: sebkrantz.github.io/Rblog/2026/0... #Rstats

09.02.2026 19:06 👍 21 🔁 4 💬 1 📌 0
Preview
Data Science Weekly - Issue 637 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 637, by @DataSciNews open.substack.com/pub/datascie...

05.02.2026 22:55 👍 0 🔁 0 💬 0 📌 0
Some Favorite Data Science Tools Going into 2026 – Practical Significance A blog post highlighting some of data science tools I’m excited about going into the new year.

On a positive note, here's a new blog post highlighting some polyglot data science tools in R and Python that I've enjoyed lately

#rstats #pydata

www.practicalsignificance.com/posts/favori...

23.01.2026 00:00 👍 15 🔁 2 💬 2 📌 2
Post image

The first data science book that has a chapter on monads reproducible-data-science.dev

Learn how to build robust #DataScience pipelines with #RStats, #Python , #Julia and #Nix !

01.02.2026 11:47 👍 26 🔁 9 💬 0 📌 2
Preview
Using R to extract results from Stata log files – Ben Harrap

Are you a #Stata user? Maybe you work with one?

Have you ever found yourself copy-pasting from the results window?

It's annoying as hell! And terrible practice. So I wrote a blog post on using #rstats to extract results from Stata log files

benharrap.com/post/2026-02...

04.02.2026 04:34 👍 10 🔁 4 💬 3 📌 1
On this page
What’s the difference between statistical significance and substantial significance?
Can we measure substantial significance with statistics?
What are all the different ways we can look at model coefficients?
Print the object name
Use summary()
Use tidy() from the {broom} package
Use model_parameters() and model_details() from the {parameters} and {performance} packages
Make nice polished side-by-side regression tables with {modelsummary}
Make automatic coefficient plots with modelplot() from {modelsummary}
Plot model predictions and marginal effects
Automatic interpretation with {report}

On this page What’s the difference between statistical significance and substantial significance? Can we measure substantial significance with statistics? What are all the different ways we can look at model coefficients? Print the object name Use summary() Use tidy() from the {broom} package Use model_parameters() and model_details() from the {parameters} and {performance} packages Make nice polished side-by-side regression tables with {modelsummary} Make automatic coefficient plots with modelplot() from {modelsummary} Plot model predictions and marginal effects Automatic interpretation with {report}

Posted a helpful little set of FAQs about regression for my causal inference class, including illustrations of statistical vs. substantive signficance and all the different things you can do with #rstats model objects

evalsp26.classes.andrewheiss.com/news/2026-02...

03.02.2026 19:49 👍 68 🔁 10 💬 3 📌 1