I have been struggling to set a truncated prior in bambi with a non-zero central tendency. I have posted a reprex with data at this stackoverflow link if anyone can offer support. #bambi stackoverflow.com/questions/79...
I have been struggling to set a truncated prior in bambi with a non-zero central tendency. I have posted a reprex with data at this stackoverflow link if anyone can offer support. #bambi stackoverflow.com/questions/79...
Each sim could have 100k rows with 5 columns. From each model fit I would save a row with 10 model stats. Simulation volume varies but typically 4000 to 10000. Python and R are the languages.
There are scenarios I would tackle that involved fitting models to orders of magnitude more datasets, but I'm running into the limits of my mental model on how simulation studies effectively scale, and that prompted the initial post.
I simulate 10s of thousands of datasets using likelihood functions from numpy or base R, and then I fit competing models to each dataset and save key model stats. I use the sampling distribution of model stats to compare the business impact of adopting different models for decision making.
For an aspiring simulation study developer, what resources would you recommend to get your head around designing for scalable simulation studies and effectively using PySpark/Databricks and/or AWS Sagemaker to scale large sim studies (cc: @jdlong.cerebralmastication.com, @jordannafa.bsky.social)
Neat approach to using Bluesky to thread blog comments
@phdemetri.bsky.social great post! You set a parameter for true_probability for A & B in simulating future samples. So you're computing the Pr(change your mind | a 3% real effect). Why not base the future simulations and Pr(change your mind) upon a prior distribution rather than a parameter?
yes
Thank you both, I will check these out!
@jordannafa.bsky.social or @solomonkurz.bsky.social can you provide any leads?
I've been churning a lot about how to use expected value of information as a stopping criteria for AB tests. Most of the literature assumes there is some cost associated with sampling, but in the context of an AB test, the only cost is the opportunity cost of delaying rollout of the best variant.
# create a .Rprofile in your project root directory & add this:
.First <- function() {
r_files <- list.files("R", pattern = "\\.R$", full.names = TRUE)
# Source each file
for (file in r_files) {
source(file)
}
cat("All scripts in", "R", "have been executed.\n")
}
Just learned that you can source an R/ directory in your project by default every time you open the project, and have all of those project specific functions available by default! #rstats
Iβm curious if anything came out of this line of inquiry
Iβll be at MIT CODE this week if any of you would like to meet in person.
Is there an R or Python package with API to historic weather data? I am trying to access records of daily temp highs & lows in San Diego microclimates, so I'm hoping to gather records from as local stations as possible rather than averages for all of san diego. #rstats #Python #climatescience
take the model matrix and mean-center of all predictors. (even the dummy-coded factors). here is that what that looks like for `mpg ~ wt + factor(cyl)`. brms fits the regression model on centered data and so the prior for the intercept is on the mean when everything is mean centered
serious question: how do you coach people to become strategic thinkers?
or maybe even more basic, how do you teach them to categorize and process info to recognize patterns?
Both books have this theme about the futility of non-strategic action. Perhaps cementing that belief is step one.
I tried to pursue this question during my PhD. Itβs a tough nut for a variety of reasons, but I really enjoyed Richard Rumeltβs Good Strategy Bad Strategy and Ericcsonβs Peak. Strategy is hard work and most people avoid it. Expertise is pattern recognition and that takes a lot of time and feedback.
Have others seen stats that try to express type M and type D errors simultaneously?
Defining practical accuracy as % of posterior that is directionally accurate but less than a 50% overestimate was pure shooting from the hip. For small effects e.g., 0.25% lift, this would be rather narrow (0>theta>0.375%) and for large effects e.g., 6% lift, the range would be broader (0>theta>9%).
The benefits of informed priors for decision making. First plot demonstrates shrinkage from an informed prior in a growing sample. Second plot tries to quantify accuracy gains for decision making. I'm thinking about expressing "practical accuracy" as defined in the subtitle. Feedback very welcome.
Appreciate everyoneβs feedback!
Sharing to my future self Ben Bolkerβs amazing syntax table for lme4 and brms models bbolker.github.io/mixedmodels-...
I will stop optimizing this now
Great post and great paper. The baseball example is really illuminating because it shows how partially pooled estimates from early in the season predict end-of-season batting average better than raw estimates from early season.
Trying to convey the value of partial pooling for colleagues who chase noise-prone subgroup analyses. Imagine I know two people who have also spent a lot of time on this. Have either of you found visualizations that intuitively convey the phenomena? @solomonkurz.bsky.social & @jordannafa.bsky.social
@gkountourides.bsky.social I remembered your question about better collaboration through git. I have been reading raps-with-r.dev and think you would find it relevant.