doi.org/10.1111/jep....
Statistical reform spreads around the world and across disciplines: Beyond the p Value Dichotomy: Alternatives for Statistical InferenceβA Critical Review
doi.org/10.1111/jep....
Statistical reform spreads around the world and across disciplines: Beyond the p Value Dichotomy: Alternatives for Statistical InferenceβA Critical Review
thenewstatistics.com/itns/2026/03...
We want the best research methodsβof courseβbut should also consider philosophy, ethics, global heating and health.
If you have an idea for a piece for this series, let me know!
Check out the entire collection of Improving Your Neuroscience here: www.eneuro.org/collection/i...
Great papers for you and your trainees on confirmation bias, dealing with non-normal data, teaching rigor, using simulations to plan research and more.
The eNeuro "Improving Your Neuroscience" series continues with a two-part primer on experimental design and blocking from Penelope Reynolds. These new guides will help you up your design game. #stats #neuroscience
www.eneuro.org/content/13/2...
www.eneuro.org/content/13/2...
3 of the researchers were participants... I know I wouldn't have been able to resist, either! Suit me up with my custom-printed barn-owl ears, please!
Why did they do this? To find out if the human auditory system is flexible enough to learn to localize sounds vertically, the way barn owls do. The answer: partly! But honestly, I'm just excited that they *did* this.
Figure from a research paper showing custom-printed barn-owl molds fitted to a human research participant. Don't worry if you can't see them -- they are not nearly as cool as the phrase 'barn-owl ears' would have you expecting.
Scientists created prosthetic barn-owl ears and got human participants to wear them for 3-59 days.
Yes, people walked around with barn-owl ears for up to 2 months *for science*.
This is the type of news I need today to restore a bit of my faith in humanity.
www.biorxiv.org/content/10.6...
Because traditional NHST doesn't allow the skeptic to win, we've maligned the need to do so: uninteresting, just mistakes, can't prove a negative, etc. But this is just rationalizing our way into accepting the limitations of the tool. Fruitful science must be able to rule in and rule out effects!
"near 0" findings have been just essential for science as "not 0" findings. Pasteur, for example, showed that quantity of bacteria formed under sterile conditions is near 0, disproving the theory of spontaneous generation.
So, just tests with results that we like?
I think you are confusing "effect is near 0" with uninteresting. Not true! Near 0 is interesting (and vital to publish) when a leading theory predicted an effect. It's interesting (and vital) to producing parsimonious models.
Personally, I think we need all valid studies published. What we need (I think) is to think more critically about quality/validityβ it is not about p value obtained or even ci widthβ it is evaluated from construct validity, positive and negative controls, design features, overlapping evidence, etc.
Can you clarify this a bit? Meaning, you donβt think wide cis (unclear) should be published but narrow ones (clear) should be? Or something else?
A valid testing regime has the possible outcomes: yes, no, unclear. It makes no sense for a scientific literature to have only βyesβ results. We support this regime with jaundiced ideas about null results that on reflection are poor epistemology.
I really donβt understand the jaundiced view of validly estimating no effect (βnull resultsβ). Donβt detectives work by ruling out suspects? Isnβt the greater part of learning weakening unhelpful associations? Donβt we accept Popperβs view that science advances through falsification?
You run a study because you believe z is an important factor, your results estimate is is negligible.. that is new and important knowledge. Theory A and Theory B make contrasting predictions (large effect, no effect), data estimating a negligible difference helps arbitrate between these.
Judgement of quality/validity should not be based on if we like the results. Crap methods/ideas can diminish effects leading to p > .05 or inflate effects producing p < .05. Many studies do belong in the file drawer, but that p value generated is not a valid guide.
Sterling, 1959 updated. We still arenβt sharing science in a scientific manner. With 65 years of knowing the problem, how are we still struggling to address it?
I havenβt! Seems like a cool idea, though likely a good bit harder than p, df, and test stat correspondence.. but would be cool!
For Python: github.com/ACCLAB/dabestr
For R: github.com/ACCLAB/DABES...
So excited to see DaBest 2.0 is out: get bootstrapped estimation statistics for simple through complex designs, all with beautiful visualization, available in R and Python.
Check it out!
Pre-print describing new features for complex designs: www.biorxiv.org/content/10.6...
#stats
Looking forward to reading this retrospective on the promise and perils of organizing an RRR with in-person data collected.
From: @aggieerin.bsky.social , @psforscher.bsky.social , and others.
#stats
doi.org/10.1111/spc3...
Comparing registrations to published papers is essential to research integrity - and almost no one does it routinely because it's slow, messy, and time-demanding.
RegCheck was built to help make this process easier.
Today, we launch RegCheck V2.
π§΅
regcheck.app
esci 1.0.9 is now on CRAN.
No big changes, just a couple of bug fixes and some compatibility changes for statpsych 1.9. Still a great easy way to get effect sizes and fair tests for many common designs.
cran.r-project.org/web/packages...
#stats
thenewstatistics.com/itns/2026/01...
A statistics textbook for the AI era
thenewstatistics.com/itns/2026/01...
The Faculty for Undergraduate Teaching Workshop is back this summer, July16-19, right outside of Chicago. A great conference with great people. Registration and abstract submission are open⦠it is going to be great!
#neuroscience
www.funfaculty.org/conference_2...
Proud to have played a small part in this as a data-collection site. And so proud of @luis-a-gomez.bsky.social , who organized this project at our university as an undegrad and has launched to great success in a clinical psych PhD program at Purdue.
The stereotype-threat effect among women was virtually null and not significant (N = 1275, g = 0.04, SE = 0.08, 95% CI [-0.11; 0.20]), and considerably smaller than the original study (N = 45, d = -0.82; 95% CI [-1.45; -0.18]).
Another threat to stereotype threat?
An RRR (in press, AMPPS) shows the ~0 impact of stereotype threat on female math performance.
Fantastic, arduous work from leads Andrea Stoevenbelt, Paulette C. Flore, and Jelte Wicherts (and DU alum @luis-a-gomez.bsky.social) #stats
osf.io/preprints/ps...
And, of course, each barrier raised to junk will have some false positives or harms, so the more content-based protections added, the marginally worse off the good-faith scientists.
Maybe we need some type of reputational scoring system for individuals, institutions, and journals.