New paper! The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. We give a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? π§΅ arxiv.org/abs/2602.11246
17.02.2026 16:37
π 43
π 14
π¬ 1
π 2
New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
17.10.2025 16:29
π 21
π 7
π¬ 1
π 4
I am on the job market this year! My research advances methods for reliable machine learning from real-world data, with a focus on healthcare. Happy to chat if this is of interest to you or your department/team.
14.10.2025 15:45
π 27
π 12
π¬ 2
π 4
How Chatbots and AI Are Already Transforming Kids' Classrooms
Educators across the country are bringing chatbots into their lesson plans. Will it help kids learn or is it just another doomed ed-tech fad?
I've been working for many months on this article on Silicon Valley's under-the-radar role in bringing AI into schools across the US. I really hope you'll read it βΒ here's a gift link βΒ but I'll tell you some of the highlights in this thread. (1/x)
02.09.2025 16:31
π 111
π 59
π¬ 5
π 10
π¨ New postdoc position in our lab at Berkeley EECS! π¨
(please reshare)
We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!
More info in thread
1/3
22.08.2025 14:11
π 22
π 12
π¬ 1
π 3
What a crossover!
19.08.2025 00:40
π 2
π 0
π¬ 0
π 0
This is great, & there's clear analogy to the burgeoning mechanism design community for AI alignment: who is providing RLHF votes? Do their preferences reflect yours? Discussions about social choice and collective constitutions are interesting, but "what and who is in the data" is just as important.
18.08.2025 19:43
π 4
π 0
π¬ 0
π 0
This is amazing
16.08.2025 18:19
π 0
π 0
π¬ 0
π 0
They're in their move fast and break things era π
06.08.2025 03:57
π 2
π 0
π¬ 0
π 0
Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that r...
This take emerged organically from just how well our method on SAEs for hypothesis generation (HypotheSAEs) performed, which surprised all of us!
See the paper arxiv.org/abs/2506.23845
Thanks @kennypeng.bsky.social, Jon, @emmapierson.bsky.social, @nkgarg.bsky.social for another nice collaboration.
05.08.2025 16:31
π 3
π 0
π¬ 0
π 0
This capability of discovering unknown concepts opens many opportunities for applied machine learning. We can design better whitebox predictors, better audit high-stakes models for bias, and generate hypotheses for CSS research. More broadly, SAEs can help bridge the "prediction-explanation" gap.
05.08.2025 16:31
π 1
π 0
π¬ 1
π 0
These tasks lie in contrast to probing, where we're trying to predict the presence of a *known* concept; and steering, where we're trying to include a *known* concept in an LLM output. SAEs lose to simple baselines on these tasks. (2 good papers on this: "AxBench" and Kantamneni, Engels et al. 2025)
05.08.2025 16:31
π 0
π 0
π¬ 1
π 0
How do we reconcile our view with recent negative results? Our key distinction is that SAEs are useful when you don't know what you're looking for: how does my text classifier predict which headlines will go viral? How does my LLM perform addition? These are "unknown unknowns".
05.08.2025 16:31
π 0
π 0
π¬ 1
π 0
π’New POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. π§΅
05.08.2025 16:31
π 41
π 4
π¬ 1
π 3
Heat map showing that more accurate models have more correlated errors.
Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. π§΅(1/7)
arxiv.org/abs/2506.07962
03.07.2025 12:54
π 50
π 7
π¬ 1
π 2
Individual experiences and collective evidence
Jessica Dai on theory for the world as it could be
@jessica.bsky.social on individual reporting as a means to build collective knowledge.
24.06.2025 14:46
π 8
π 2
π¬ 1
π 0
ARR question: If I submit to a cycle, how long do those reviews "last"? e.g. if I submit to the July cycle but can't go to AACL, can I commit my July reviews to the conference associated with the next (October) cycle? @aclrollingreview.bsky.social
17.06.2025 21:14
π 2
π 1
π¬ 1
π 0
A gif explaining the value of test-time augmentation to conformal classification. The video begins with an illustration of TTA reducing the size of the predicted set of classes for a dog image, and goes on to explain that this is because TTA promotes the true class's predicted probability to be higher, even when it's predicted to be unlikely.
New work π: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful.
In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing coverage.
14.06.2025 15:00
π 22
π 6
π¬ 3
π 1
I would like to spend up to 5-10 hours to learn about basic macroeconomics (I know it's maybe fake, but setting that aside for a moment...). Does anyone have any recommendations?
05.06.2025 23:29
π 0
π 0
π¬ 0
π 0
Huge congrats, Marianne!!
05.06.2025 17:32
π 1
π 0
π¬ 1
π 0
I find that I've actually gone out of my way to stop using bullet points in reviews now because Any Review With Bullet Points is a Bot π₯²
27.05.2025 22:03
π 1
π 0
π¬ 0
π 0
People love to hate on the transition 3-pointer as evidence of how the 3 has ruined basketball, but I think it's usually just the right play... if you have numbers in transition, your teammate can easily get a putback off a miss, so might as well try the 3
10.05.2025 20:09
π 1
π 0
π¬ 0
π 0
We'll present HypotheSAEs at ICML this summer! π
Draft: arxiv.org/abs/2502.04382
We're continuing to cook up new updates for our Python package: github.com/rmovva/Hypot...
(Recently, "Matryoshka SAEs", which help extract coarse and granular concepts without as much hyperparameter fiddling.)
05.05.2025 21:27
π 10
π 2
π¬ 1
π 0
So awesome, congrats Lucy!!! π§
05.05.2025 21:24
π 3
π 0
π¬ 0
π 0
These Warriors are old, tired and in trouble as Game 7 looms against Rockets
They're not done yet. Maybe a legendary performance awaits on Sunday. But the Warriors look like they're out of gas and out of answers.
Yesterday's Game 6 was depressing, and this article precisely delineated the reasons why. And sometimes, a precise retelling of what you're feeling is all you need to feel better. www.nytimes.com/athletic/633... @thompsonscribe.bsky.social
03.05.2025 22:50
π 2
π 0
π¬ 0
π 0
Did you take the hot air balloon pic?!
03.05.2025 17:57
π 1
π 0
π¬ 1
π 0
Check out Erica's nice work. They not only develop a well-grounded model for disparities in disease progression, but also conduct experiments with real NYP cardiology data! (Anyone who works in healthcare knows how much of a feat it is to use data other than MIMIC)
01.05.2025 17:10
π 4
π 0
π¬ 0
π 0