Isabelle Lee's Avatar

Isabelle Lee

@wordscompute

ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup; interpretability & training & reasoning iglee.me

2,362
Followers
529
Following
48
Posts
07.12.2023
Joined
Posts Following

Latest posts by Isabelle Lee @wordscompute

Post image

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧡

23.02.2026 15:38 πŸ‘ 21 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1

Excited to share our new dataset, FOL-Traces!

We introduce a large-scale dataset of programmatically verified FOL reasoning traces for studying structured logical inference + process fidelity.

Happy to hear thoughts from others working on reasoning in LLMs!

Check it out here πŸ‘‡

12.02.2026 00:00 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
fol-traces/fol-traces Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

paper: arxiv.org/abs/2505.14932
dataset: huggingface.co/datasets/fo...
work w/ @sarahliaw.bsky.social and Dani Yogatama

If you want to chat about interpretability & training dynamics & reasoning and munch on mezzes, come hang out with me in Rabat πŸ‡²πŸ‡¦πŸ™ƒ
9/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I wanted to study reasoning acquisition in training by complexity + process fidelity but wasn't able to find a dataset. So we built one that's rigorously annotated and large enough to train a small LM. Now I’m excited about what we can do with it
8/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Bar graph displaying the accuracy percentages of various models on two-step prediction tasks, with distinct colors for each step.

Bar graph displaying the accuracy percentages of various models on two-step prediction tasks, with distinct colors for each step.

Bar chart displaying accuracy percentages for various models across three complexity thresholds: 10-19, 20-29, and 30+.

Bar chart displaying accuracy percentages for various models across three complexity thresholds: 10-19, 20-29, and 30+.

a harder task- last step prediction: Β¬(Β¬Sunny(x) ∧ Breezy(x)) ↔ [MASK] or last two step prediction. Most LLMs only achieve <50% accuracy on both tasks.

(n.b. since FOL is verifiable, we define correct as any generation that's equivalent to expression.)
7/9

11.02.2026 17:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Table displaying model accuracies for components, operators, and predicates prediction tasks, with various metrics for performance evaluation.

Table displaying model accuracies for components, operators, and predicates prediction tasks, with various metrics for performance evaluation.

e.g. masked prediction. we mask an operator randomly and have LLMs guess: Β¬(Β¬Sunny(x) ∧ Breezy(x)) ↔ (Sunny(x) [MASK] Breezy(x)). LLMs are correct ~45.7% on average:
6/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

...resulting in a bunch of reasoning traces that are verifiably correct with measurable programmatic complexity. And we find that they're very hard for LLMs!

Let's consider an example w/ de Morgan's law: Β¬(Β¬Sunny(x) ∧ Breezy(x)) ↔ (Sunny(x) ∨ Breezy(x))
5/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Flowchart illustrating the process of using First-Order Logic, integrating human rules, symbolic generation, and LLM instantiation for reasoning examples.

Flowchart illustrating the process of using First-Order Logic, integrating human rules, symbolic generation, and LLM instantiation for reasoning examples.

So how do we strike a balance? We propose using First-Order Logic (FOL) as a middle ground. We
1. programmatically, randomly generate a bunch of FOL expressions
2. progressively simplify them, verifying their equivalence
3. chain them together
4. NL instantiate them w/ LLMs
4/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We mostly interface with LLMs with words but evaluating NL reasoning is messy. On the other hand, something like math reasoning gives us concrete, objectively correct answers. But it’s narrow/doesn’t look like NL.
3/9

11.02.2026 17:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

There are many evals and benchmarks in this field, but natural language (NL) reasoning is tricky--meaning depends on context (commonsense), shared assumptions (pragmatics), and what’s unsaid (abduction). Pattern shortcuts/heuristics β‰  logical inference.
2/9

11.02.2026 17:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Title highlights "FOL-Traces," a dataset for evaluating logical reasoning in language models, emphasizing rigorous testing and performance metrics.

Title highlights "FOL-Traces," a dataset for evaluating logical reasoning in language models, emphasizing rigorous testing and performance metrics.

New dataset πŸ—‚οΈ coming to #eacl

What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions!
1/9

11.02.2026 17:17 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
gemini summarized my google search when i was tryna look for an anti-new years resolution blog post. it says in highlight: "Approximately 80% to 88% of New Year's resolutions fail by mid-February, ..."

gemini summarized my google search when i was tryna look for an anti-new years resolution blog post. it says in highlight: "Approximately 80% to 88% of New Year's resolutions fail by mid-February, ..."

one of my new years "considerations" is to be less silent #onhere. so i guess i'll be #here and maybe also #there til february 15th

11.02.2026 17:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

If you're interested in interpretability driven evaluations, I'd love to hear from you! And stay tuned for more work from us :)

11.02.2026 17:07 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Really excited to receive Coefficient Giving's Technical AI Safety Research Grant via Berkeley Existential Risk Initiative w/
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.

11.02.2026 17:07 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧡

01.07.2025 15:41 πŸ‘ 82 πŸ” 31 πŸ’¬ 2 πŸ“Œ 5

weve reached that point in this submission cycle, no amount of coffee will do πŸ˜žπŸ™‚β€β†”οΈπŸ˜ž

09.05.2025 23:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

INCOMING

29.03.2025 04:58 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
a leaf falls on moo deng the pygmy hippo , blocking her vision

a leaf falls on moo deng the pygmy hippo , blocking her vision

moo deng is upset presumably because she can’t see!

moo deng is upset presumably because she can’t see!

titled: peer review

29.03.2025 04:58 πŸ‘ 7 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
CDS building which looks like a jenga tower

CDS building which looks like a jenga tower

Life update: I'm starting as faculty at Boston University
@bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!

27.03.2025 02:24 πŸ‘ 244 πŸ” 13 πŸ’¬ 35 πŸ“Œ 7

or if you're awesome and happen to be in sf, also message me

15.03.2025 01:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

pls message me if you wanna meet up for coffee and chat about ai/physics/llms/interpretability

15.03.2025 01:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

really excited to be headed to OFC in SF! so excited to revisit optical physics πŸ˜€

15.03.2025 01:42 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 1
Post image

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?

Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns.

Read on πŸ”Žβ¬

11.03.2025 07:13 πŸ‘ 8 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0

i use the same template and need help getting a butterfly button help

05.03.2025 02:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New paper–accepted as *spotlight* at #ICLR2025! πŸ§΅πŸ‘‡

We show a competition dynamic between several algorithms splits a toy model’s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.

16.02.2025 18:57 πŸ‘ 32 πŸ” 5 πŸ’¬ 2 πŸ“Œ 1
Preview
Out-of-Sync β€˜Loners’ May Secretly Protect Orderly Swarms Studies of collective behavior usually focus on how crowds of organisms coordinate their actions. But what if the individuals that don’t participate have just as much to tell us?

Starlings move in undulating curtains across the sky. Forests of bamboo blossom at once. But some individuals don’t participate in these mystifying synchronized behaviors β€” and scientists are learning that they may be as important as those that do.

15.02.2025 16:46 πŸ‘ 33 πŸ” 10 πŸ’¬ 2 πŸ“Œ 2
Preview
Paper page - Fully Autonomous AI Agents Should Not be Developed Join the discussion on this paper page

New piece out!
We explain why Fully Autonomous Agents Should Not be Developed, breaking β€œAI Agent” down into its components & examining through ethical values.
With @evijit.io, @giadapistilli.com and @sashamtl.bsky.social
huggingface.co/papers/2502....

06.02.2025 09:56 πŸ‘ 139 πŸ” 48 πŸ’¬ 4 πŸ“Œ 11
Preview
The Poetry Fan Who Taught an LLM to Read and Write DNA | Quanta Magazine By treating DNA as a language, Brian Hie’s β€œChatGPT for genomes” could pick up patterns that humans can’t see, accelerating biological design.

Brian Hie harnessed the powerful parallels between DNA and human language to create an AI tool that interprets genomes. Read his conversation with Ingrid Wickelgren: www.quantamagazine.org/the-poetry-f...

05.02.2025 16:00 πŸ‘ 40 πŸ” 14 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

How do tokens evolve as they are processed by a deep Transformer?

With JosΓ© A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

31.01.2025 16:56 πŸ‘ 95 πŸ” 16 πŸ’¬ 2 πŸ“Œ 0

it’s finally raining in la:)

26.01.2025 19:20 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0