Måns Thulin's Avatar

Måns Thulin

@mansthulin

I work in statistics and AI. Consultant, teacher, researcher. #Rstats user. Book: https://modernstatisticswithr.com Homepage: https://mansthulin.se Co-founder of https://aireview.se

239
Followers
103
Following
146
Posts
15.11.2024
Joined
Posts Following

Latest posts by Måns Thulin @mansthulin

Is your working directory and/or your files on OneDrive? Strangely, this can cause these kinds of problems (the solution being storing things on a local drive instead).

09.03.2026 07:24 👍 1 🔁 0 💬 1 📌 0

No, but I'll check it out. Thanks!

28.02.2026 14:16 👍 0 🔁 0 💬 0 📌 0

train |> ... |> fit(train) gives my soul a papercut

28.02.2026 10:57 👍 0 🔁 0 💬 1 📌 0

Thanks, Di. I too am hoping that these issues will be fixed. Until then I'm sticking to caret in my teaching, as it also does a good job of coordinating machine learning software. I'm reluctant to tell students to use the tidymodels ecosystem because of the issues mentioned in my post.

28.02.2026 09:27 👍 1 🔁 0 💬 1 📌 0

Yeah, I think that's an important difference between tidymodels and ggplot2!

28.02.2026 08:10 👍 0 🔁 0 💬 0 📌 0

Kind of awkward to have to add data at two different steps. But definitely an improvement on the flow recommended in the documentation!

27.02.2026 20:31 👍 0 🔁 0 💬 2 📌 0

Absolutely. But in all the examples you mention, you'd start with data. And starting a pipeline with the data is the de facto standard in R. Creating a separate logic for how to pipe things is not very helpful to beginners.

27.02.2026 20:29 👍 1 🔁 0 💬 0 📌 0

Happy to submit these as issues within the next few days!

27.02.2026 20:12 👍 2 🔁 0 💬 0 📌 0

Great description! And so it boils down to whether you're willing to accept two different logics for how the pipe works. I maintain that the dual logics create more problems than they solve, but I get that some people like the tidymodels approach.

27.02.2026 10:06 👍 2 🔁 0 💬 1 📌 0

Some, yes. All ignored unfortunately. I agree that most of these issues could be fixed (which of course is the reason that I wrote this in the first place!)

27.02.2026 06:00 👍 2 🔁 0 💬 1 📌 0

Glad you liked it!

26.02.2026 21:05 👍 0 🔁 0 💬 0 📌 0
Why I don’t use {tidymodels} – Måns Thulin

While we're discussing what we like and dislike in #Rstats, here's why I don't like tidymodels: mansthulin.se/posts/tidymo...

26.02.2026 20:28 👍 43 🔁 8 💬 7 📌 3
Diff of a change the 'Getting Started in R - Tinyverse Edition' manuscript with a nice data.table improvement: inside a pair of ( ) we can place each chained operation on its own line improving readability.  

I.e. we have

cw21 <- (  # use '(' to keep ops on separate lines
    cw[Time %in% c(0,21)]          # i: select rows
    [, weight := Weight]           # j: mutate
    [, Group := factor(Group)]
    [, .(Chick,Group,Time,weight)] # j: arrange
    [order(Chick,Time)]            # i: order
    [1:5]                          # i: subset
)

Diff of a change the 'Getting Started in R - Tinyverse Edition' manuscript with a nice data.table improvement: inside a pair of ( ) we can place each chained operation on its own line improving readability. I.e. we have cw21 <- ( # use '(' to keep ops on separate lines cw[Time %in% c(0,21)] # i: select rows [, weight := Weight] # j: mutate [, Group := factor(Group)] [, .(Chick,Group,Time,weight)] # j: arrange [order(Chick,Time)] # i: order [1:5] # i: subset )

Josh Goldstein emailed me a nice tip for @rdatatable.bsky.social chaining: if we start a chained `data.table` operation inside a set of parens, we are no longer subject to the 'REPL constraint' and can keep each operation on a line. See ALT text. #rstats

Now in the pdf at github.com/eddelbuettel...

25.02.2026 18:18 👍 23 🔁 5 💬 2 📌 2

Well, you shouldn't use Python or MATLAB for statistics. Simple as. 😀

25.02.2026 14:28 👍 5 🔁 2 💬 1 📌 0
2 The basics | Modern Statistics with R 2 The basics | Modern Statistics with R

I love the hidden-gem magrittr pipes, but these days I stick with the base pipe. In this case, you can do:

mtcars |> with(cor(disp, mpg))

25.02.2026 14:16 👍 5 🔁 1 💬 1 📌 0
R Medicine 2026 - main graphic - Call for Proposal, deadline March 6

R Medicine 2026 - main graphic - Call for Proposal, deadline March 6

R/Medicine CFP is open 🩺🧪

Deadline: March 6 - still time!

Submit: Talks, Lightning Talks, Demos, Workshops - Using R + Shiny for health, lab, clinical data

First-time speaker? Email for feedback: rmedicine.conference@gmail.com

rconsortium.github.io/RMedicine_we...

#rstats #datascience

24.02.2026 20:28 👍 9 🔁 5 💬 1 📌 1

I love RStudio, but I'm flabbergasted by the fact that
@posit.co still haven't made |> the default for the Ctrl+Shift+C keyboard shortcut, despite their using it in e.g. the tidyverse documentation and R4DS. #Rstats

23.02.2026 12:05 👍 9 🔁 2 💬 1 📌 0

Thanks, will do!

19.02.2026 09:06 👍 0 🔁 0 💬 0 📌 0
Bootstrap p-values and confidence intervals for regression models

Looks really nice! Is there an option to print confidence intervals instead of standard errors (the former being more informative)? If you'd be interested in adding bootstrap p-values/CIs as an option, I'd be happy to assist in integrating it with {boot.pval} (mthulin.github.io/boot.pval/ar...)

18.02.2026 08:30 👍 1 🔁 0 💬 1 📌 0
Quick start – Model to Meaning

I've been playing around with {marginaleffects} in some projects lately, and I really like it. Lots of useful stuff in there! If you work with regression models and haven't checked it out already, I strongly recommend that you do so: marginaleffects.com/bonus/get_st...
#Rstats #Databs

13.02.2026 11:51 👍 15 🔁 0 💬 1 📌 0

How about penguins |> subset(select = c("island", "bill_len")) |> subset(island == "Biscoe" & bill_len > 55)

02.02.2026 21:00 👍 2 🔁 0 💬 0 📌 0
Modern Statistics with R Modern Statistics with R

I cover both base and the tidyverse in Modern Statistics with R (expect for plotting, where I focus on ggplot2 and only briefly mention base): www.modernstatisticswithr.com

02.02.2026 18:15 👍 2 🔁 0 💬 0 📌 0

Gave the first lecture in my introductory statistics for biologist course yesterday, so this should come in handy. 😀 Thanks for sharing!

21.01.2026 09:34 👍 2 🔁 0 💬 1 📌 0
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

Will you incorporate LLMs and AI prompting into the course in the future? No. Why won’t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

In that post, it warns that you cannot use it as a beginner: …to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.) It’s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting. You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

09.12.2025 20:17 👍 331 🔁 99 💬 14 📌 31

This is so useful. I usually add custom information to the box shown when hovering, using the text geom. An example can be found here: www.modernstatisticswithr.com/eda.html#det... #Rstats #statsky #databs

09.12.2025 07:13 👍 6 🔁 1 💬 0 📌 0
Preview
Data Visualisation Gallery Gallery of data visualisations created by Nicola Rennie.

One of the things that has been on my to do list for a very long time, is building a gallery of all of the charts I've made across #TidyTuesday, #30DayChartChallenge, #30DayMapChallenge, and other miscellaneous projects 📊

And it's finally here!

Link: nrennie.rbind.io/viz-gallery/

#DataViz #RStats

26.11.2025 13:32 👍 111 🔁 18 💬 6 📌 3

"The difference between the groups is 1.1-2.6 measured using something that is a bit like the median but not quite the median" 😉

21.11.2025 15:21 👍 3 🔁 0 💬 0 📌 0

They test different hypotheses though, so the Wilcoxon test isn't a like-for-like replacement for the t-test. A bootstrap t-test is my go-to method for tests about means (using {boot.pval}). It has the added benefit of providing confidence intervals, unlike the Wilcoxon test.

21.11.2025 13:39 👍 3 🔁 0 💬 1 📌 0

Students often ask: “Is this model good enough?”
My reply: “For what?” AUC, precision, F1—none of them matter unless you know what decision you're informing. Always tie metrics to action.

#DataScience #MachineLearning #AI #RStats

20.11.2025 00:57 👍 7 🔁 2 💬 0 📌 0

It's great to use with pipes! Then everything goes from left to right.

19.11.2025 12:56 👍 1 🔁 0 💬 0 📌 0