Bullshit Bench V2
new: 100 questions across several domains
- Anthropic & Qwen still on top
- Reasoning seems to hurt
- New models are *not* better than old (except Claude)
- Seems to be independent of domain
github.com/petergpt/bul...
Bullshit Bench V2
new: 100 questions across several domains
- Anthropic & Qwen still on top
- Reasoning seems to hurt
- New models are *not* better than old (except Claude)
- Seems to be independent of domain
github.com/petergpt/bul...
Sakana has developed a way to, if I understand correctly, instantly generate LORAs on demand from long texts or documents
arxiv.org/abs/2506.06105
arxiv.org/abs/2602.15902
Trump has been in office for one year. We at @nature.com did a deep dive looking at the administration's disruption of science in numbers.
Take a lookβthe numbers are staggering. By me, @dangaristo.bsky.social, Jeff Tollefson, @kimay.bsky.social, & help from @noamross.net @scott-delaney.bsky.social
This line graph illustrates the percentage change in agency staff levels from the previous year for nine major U.S. federal scientific and health organizations between the fiscal years 2016 and 2025. The agencies tracked include the CDC, Department of Energy, EPA, FDA, NASA, NIH, NIST, NOAA, and NSF. For the majority of the timeline between 2016 and 2023, the agencies show relatively stable fluctuations, generally staying within a range of +5% to -5% change per year. However, there is a dramatic and uniform plummet starting in the 2024β25 period. Every agency depicted shows a sharp downward trajectory, with staffing losses ranging from approximately -15% to over -25%. The Environmental Protection Agency (EPA) shows the most significant decline, dropping to roughly -26%, while the National Institute of Standards and Technology (NIST) shows the least severe but still substantial drop at approximately -15%.
This is the most astonishing graph of what the Trump regime has done to US science. They have destroyed the federal science workforce across the board. The negative impacts on Americans will be felt for generations, and the US might never be the same again.
www.nature.com/immersive/d4...
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
Oh wow, deepseek is starting to make serious progress on LLMs that offload memory to external storage: github.com/deepseek-ai/...
Schematic depicting cortical-subcortical interactions during multi-task learning
Excited to see our paper with @mwcole.bsky.social finally out in peer-reviewed form @natcomms.nature.com! We examine how the human brain learns new tasks and optimizes representations over practiceβ¦1/n
Did you know that AI can figure out its own way to learn, and that its way is better than one designed by humans? Read more in a @nature.com N&V (and the original paper is in the comment) π§ͺ www.nature.com/articles/d41...
Our work with @pawa-pawa.bsky.social is out in Nature Machine Intelligence! The choice of activation function affects the representations, dynamics, and circuit solutions that emerge in RNNs trained on cognitive tasks. Activation matters!
www.nature.com/articles/s42...
(repost welcome) The Generative Model Alignment team at IBM Research is looking for next summer interns! Two candidates for two topics
π°Reinforcement Learning environments for LLMs
πSpeculative and non-auto regressive generation for LLMs
interested/curious? DM or email ramon.astudillo@ibm.com
Michael X Cohen on why he left academia/neuroscience.
mikexcohen.substack.com/p/why-i-left...
Nature research paper: Arousal as a universal embedding for spatiotemporal brain dynamics
go.nature.com/4nMUgYz
Labβs latest is out in Imaging Neuroscience, led by Kirsten Peterson: βRegularized partial correlation provides reliable functional connectivity estimates while correcting for widespread confoundingβ, where we demonstrate a major improvement to standard fMRI functional connectivity (correlation) 1/n
Formalizing AI computation in terms of algorithmic complexity can offer a formal way to quantify AI systems while offering a principled foundation to build more algorithmically capable systems in the future.
Blog: research.ibm.com/blog/ai-algo...β¨arXiv: arxiv.org/abs/2411.05943
While using AI models to generate code is commonplace these days, we still do not fully understand the limits of the complexity of the code these models can formulate.
3/n
Using circuits to formalize algorithmic problems for AI models (e.g., depth as time complexity, size as space complexity), we can quantify the complexity of circuit computations (algorithmic complexity) an AI model can perform.
2/n
What complexity of algorithms can AI compute? In a new paper with colleagues at IBM Research, we explore how circuit complexity theory can help quantify the degree of algorithmic generalization in AI systems. www.nature.com/articles/s42...
@natmachintell.nature.com
#ML #AI #MLSky
1/n
Mental health research is at a turning pointβbreakthroughs can transform lives, but only with bold action, investment, and open collaboration. The time for action is now. Read our full statement here: childmind.org/blog/can-sci...
Out today in Nature Machine Intelligence!
From childhood on, people can create novel, playful, and creative goals. Models have yet to capture this ability. We propose a new way to represent goals and report a model that can generate human-like goals in a playful setting... 1/N
New preprint! Ziyan and I explore how task order impacts continual learning in neural networks and how to optimize it. Our analysis highlights two key principles for better task sequencing.
Check it out: arxiv.org/pdf/2502.03350
The entire website for the NIH Office of Research on Women's Health (ORWH) is very nearly stripped bare. This is so, so devastating. orwh.od.nih.gov/research/fun...
New paper out! π¨ π° With @batuhanerkat.bsky.social, John McClure, @hussainyk1.bsky.social, @polacklab.bsky.social we reveal how discretized representations in V1 predict suboptimal orientation discrimination. π§ͺπ§ π This work reconciles neuro and psychometric curves
www.nature.com/articles/s41...
New paper in @brain1878.bsky.social: Healthy people under S-ketamine, an NMDAR antagonist, and people living with schizophrenia, a disorder associated with NMDAR hypofunction, spend more time in an external mode of perception - where noisy sensory signals override knowledge about the world.
The origin of color categories | PNAS www.pnas.org/doi/10.1073/...
Check our latest in which we leverage shape metrics to compare neural geometry across regions, sessions or subjects and how their differences predict behavior.
w/ Nejatbakhsh, Duong, @sarah-harvey.bsky.social, Brincat, @siegellab.bsky.social, @earlkmiller.bsky.social & @itsneuronal.bsky.social
Paper shows very small LLMs can match or beat larger ones through 'deep thinking' - evaluating different solution paths - and other tricks. Their 7B model beats o1-preview on complex math by exploring 64 different solutions & picking the best one.
Test-time compute paradigm seems really fruitful.
New results for a new year! βLinking neural population formatting to functionβ describes our modern take on an old question: how can we understand the contribution of a brain area to behavior?
www.biorxiv.org/content/10.1...
π§ π©π»βπ¬π§ͺπ§΅
#neuroskyence
1/
And relatedly, Felix wrote a good piece on the stress and anxiety currently affecting many people who work in AI due to the current climate in the industry:
docs.google.com/document/d/1...
If only more folks in AI were gentle and introspective like this...
What was the most important machine learning paper in 2024?
My Famous Deep Learning Papers list (that I use in teaching) does not include any new ideas from the last year.
papers.baulab.info
Which single new paper would you add?
Some of my thoughts on OpenAI's o3 and the ARC-AGI benchmark
aiguide.substack.com/p/did-openai...