Raj Movva's Avatar

Raj Movva

@rajmovva

NLP, ML & society, healthcare. PhD student at Berkeley, previously CS at MIT. https://rajivmovva.com/

247
Followers
131
Following
47
Posts
22.11.2024
Joined
Posts Following

Latest posts by Raj Movva @rajmovva

Post image

New paper! The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. We give a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? 🧡 arxiv.org/abs/2602.11246

17.02.2026 16:37 πŸ‘ 43 πŸ” 14 πŸ’¬ 1 πŸ“Œ 2
Preview
Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes The rapid integration of AI writing tools into online platforms raises critical questions about their impact on content production and outcomes. We leverage a unique natural experiment on Change$.$org...

Excited to share a new working paper!

What happened when Change.org integrated an AI writing tool into their platform? We provide causal evidence that petition text changed significantly while outcomes did not improve. 1/

arxiv.org/abs/2511.13949

01.12.2025 20:32 πŸ‘ 53 πŸ” 18 πŸ’¬ 4 πŸ“Œ 6
Preview
Why Can’t the N.B.A. Move On from Its Old Stars? Even as the league drastically evolves, the narratives around it are still orbiting its aging icons.

another banger from @louisathomas.bsky.social

www.newyorker.com/sports/sport...

28.10.2025 00:08 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.

17.10.2025 16:29 πŸ‘ 21 πŸ” 7 πŸ’¬ 1 πŸ“Œ 4

I am on the job market this year! My research advances methods for reliable machine learning from real-world data, with a focus on healthcare. Happy to chat if this is of interest to you or your department/team.

14.10.2025 15:45 πŸ‘ 27 πŸ” 12 πŸ’¬ 2 πŸ“Œ 4
Preview
How Chatbots and AI Are Already Transforming Kids' Classrooms Educators across the country are bringing chatbots into their lesson plans. Will it help kids learn or is it just another doomed ed-tech fad?

I've been working for many months on this article on Silicon Valley's under-the-radar role in bringing AI into schools across the US. I really hope you'll read it β€”Β here's a gift link β€”Β but I'll tell you some of the highlights in this thread. (1/x)

02.09.2025 16:31 πŸ‘ 111 πŸ” 59 πŸ’¬ 5 πŸ“Œ 10
Post image

🚨 New postdoc position in our lab at Berkeley EECS! 🚨

(please reshare)

We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!

More info in thread

1/3

22.08.2025 14:11 πŸ‘ 22 πŸ” 12 πŸ’¬ 1 πŸ“Œ 3

What a crossover!

19.08.2025 00:40 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This is great, & there's clear analogy to the burgeoning mechanism design community for AI alignment: who is providing RLHF votes? Do their preferences reflect yours? Discussions about social choice and collective constitutions are interesting, but "what and who is in the data" is just as important.

18.08.2025 19:43 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This is amazing

16.08.2025 18:19 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

They're in their move fast and break things era πŸ™ƒ

06.08.2025 03:57 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that r...

This take emerged organically from just how well our method on SAEs for hypothesis generation (HypotheSAEs) performed, which surprised all of us!

See the paper arxiv.org/abs/2506.23845

Thanks @kennypeng.bsky.social, Jon, @emmapierson.bsky.social, @nkgarg.bsky.social for another nice collaboration.

05.08.2025 16:31 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

This capability of discovering unknown concepts opens many opportunities for applied machine learning. We can design better whitebox predictors, better audit high-stakes models for bias, and generate hypotheses for CSS research. More broadly, SAEs can help bridge the "prediction-explanation" gap.

05.08.2025 16:31 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

These tasks lie in contrast to probing, where we're trying to predict the presence of a *known* concept; and steering, where we're trying to include a *known* concept in an LLM output. SAEs lose to simple baselines on these tasks. (2 good papers on this: "AxBench" and Kantamneni, Engels et al. 2025)

05.08.2025 16:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How do we reconcile our view with recent negative results? Our key distinction is that SAEs are useful when you don't know what you're looking for: how does my text classifier predict which headlines will go viral? How does my LLM perform addition? These are "unknown unknowns".

05.08.2025 16:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“’New POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts

Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧡

05.08.2025 16:31 πŸ‘ 41 πŸ” 4 πŸ’¬ 1 πŸ“Œ 3
Preview
Annotation alignment: Comparing LLM and human annotations of conversational safety Rajiv Movva, Pang Wei Koh, Emma Pierson. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

Nice work! Cool to see that item difficulty predicts human-llm disagreement. We also studied similar questions with the DICES dataset: aclanthology.org/2024.emnlp-m...

12.07.2025 15:26 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Heat map showing that more accurate models have more correlated errors.

Heat map showing that more accurate models have more correlated errors.

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧡(1/7)

arxiv.org/abs/2506.07962

03.07.2025 12:54 πŸ‘ 50 πŸ” 7 πŸ’¬ 1 πŸ“Œ 2
Preview
Individual experiences and collective evidence Jessica Dai on theory for the world as it could be

@jessica.bsky.social on individual reporting as a means to build collective knowledge.

24.06.2025 14:46 πŸ‘ 8 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

ARR question: If I submit to a cycle, how long do those reviews "last"? e.g. if I submit to the July cycle but can't go to AACL, can I commit my July reviews to the conference associated with the next (October) cycle? @aclrollingreview.bsky.social

17.06.2025 21:14 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
A gif explaining the value of test-time augmentation to conformal classification. The video begins with an illustration of TTA reducing the size of the  predicted set of classes for a dog image, and goes on to explain that this is because TTA promotes the true class's predicted probability to be higher, even when it's predicted to be unlikely.

A gif explaining the value of test-time augmentation to conformal classification. The video begins with an illustration of TTA reducing the size of the predicted set of classes for a dog image, and goes on to explain that this is because TTA promotes the true class's predicted probability to be higher, even when it's predicted to be unlikely.

New work πŸŽ‰: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful.

In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing coverage.

14.06.2025 15:00 πŸ‘ 22 πŸ” 6 πŸ’¬ 3 πŸ“Œ 1

I would like to spend up to 5-10 hours to learn about basic macroeconomics (I know it's maybe fake, but setting that aside for a moment...). Does anyone have any recommendations?

05.06.2025 23:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Huge congrats, Marianne!!

05.06.2025 17:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I find that I've actually gone out of my way to stop using bullet points in reviews now because Any Review With Bullet Points is a Bot πŸ₯²

27.05.2025 22:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

People love to hate on the transition 3-pointer as evidence of how the 3 has ruined basketball, but I think it's usually just the right play... if you have numbers in transition, your teammate can easily get a putback off a miss, so might as well try the 3

10.05.2025 20:09 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We'll present HypotheSAEs at ICML this summer! πŸŽ‰
Draft: arxiv.org/abs/2502.04382

We're continuing to cook up new updates for our Python package: github.com/rmovva/Hypot...

(Recently, "Matryoshka SAEs", which help extract coarse and granular concepts without as much hyperparameter fiddling.)

05.05.2025 21:27 πŸ‘ 10 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

So awesome, congrats Lucy!!! πŸ§€

05.05.2025 21:24 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
These Warriors are old, tired and in trouble as Game 7 looms against Rockets They're not done yet. Maybe a legendary performance awaits on Sunday. But the Warriors look like they're out of gas and out of answers.

Yesterday's Game 6 was depressing, and this article precisely delineated the reasons why. And sometimes, a precise retelling of what you're feeling is all you need to feel better. www.nytimes.com/athletic/633... @thompsonscribe.bsky.social

03.05.2025 22:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Did you take the hot air balloon pic?!

03.05.2025 17:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Check out Erica's nice work. They not only develop a well-grounded model for disparities in disease progression, but also conduct experiments with real NYP cardiology data! (Anyone who works in healthcare knows how much of a feat it is to use data other than MIMIC)

01.05.2025 17:10 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0