Ho and I also made a longer video with a voice-over if it's useful to anyone.
๐
Ho and I also made a longer video with a voice-over if it's useful to anyone.
๐
If you are interested in the library, you can check out the corresponding thread below:
bsky.app/profile/anto...
Or the GitHub directly: github.com/FOR-sight-ai...
๐ฅSuper excited to share our new demo website for ๐ชInterpreto!
๐ผ๏ธIt is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.
๐ฎPlay with it: for-sight-ai.github.io/interpreto-d...
We will keep improving it, so stay tuned!
I also did a thread to present the library quickly:
bsky.app/profile/anto...
Pleasently surprised to see our blog post trending on HuggingFace ๐ค
Well, @fannyjrd.bsky.social did a great job! ๐
If you missed it, check it out: huggingface.co/blog/Fannyjr...
It's a didactic presentation of our new library: ๐ช Interpreto:
github.com/FOR-sight-ai...
It was an honor to be part of this awesome project! Interpreto is a great up-and-coming tool for concept-based interpretability analyses of NLP models, check it out!
๐ Iโm thrilled to announce the release of Interpreto: a user-friendly, open-source toolbox to make NLP model interpretability accessible, practical, and rigorous.
github.com/FOR-sight-ai...
๐งต1/5
๐ฆYou can find the library on GitHub: github.com/FOR-sight-ai...
๐Access the documentation: for-sight-ai.github.io/interpreto/
โฌDownload with pip: `uv pip install interpreto`
๐ฐLook at our paper: arxiv.org/abs/2512.097...
๐ค Check our Huggingface blog post: huggingface.co/blog/Fannyjr...
8/8
๐ฅThe amazing team: @fannyjrd.bsky.social, Thomas Mullor, @gsarti.com, Frรฉdรฉric Boisnard, Corentin Friedrich, Charlotte Claye, Franรงois Hooft, and Raphaรซl Bernas!!
๐And to the supporters: IRT Saint Exupery, ANITI, @centralesupelec.bsky.social, DEEL.ai and FOR projects.
7/8
You can do all these steps in interpreto using a wide range of methods.
Check out the documentation for more details: for-sight-ai.github.io/interpreto/a...
Or the tutorials:
- for-sight-ai.github.io/interpreto/n...
- for-sight-ai.github.io/interpreto/n...
6/8
For concepts, there are 4 steps:
1. Split the model and get activations. (wraps `nnsight` @ndif-team.bsky.social)
2. Find patterns in activations (SAEs...) (wraps `overcomplete` @thomasfel.bsky.social )
3. Interpret the concepts
4. Estimate concepts' contributions to the output
5. Evaluate
5/8
๐กInterpreto provides concept-based explanations (post-hoc unsupervised), part of the Mechanistic Interpretability field. Concepts answer:
โWhat higher-level features exist inside the modelโs hidden space, and how do they affect outputs?
4/8
โฌ๏ธExample on the AG News dataset.
๐ฅ We implement the classic attribution methods. Both `ForSequenceClassification` and `ForCausalLM`.
There are both perturbation-based โก๏ธ and gradient-based methods ๐. About 10 methods globally.
๐There are two metrics.
๐น๐ท๐ฆYou can fix the granularity of explanations.
3/8
๐โก๏ธ๐ฅThe goal of the library is to bridge the gap between practitioners applying interpretability methods and the SOTA.
๐The library is still in active development. Hence, we welcome your feedback and contributions. ๐ค
๐๐จ Raise an issue, open a PR, or contact us.
2/8
๐ฅI am super excited for the official release of an open-source library we've been working on for about a year!
๐ชinterpreto is an interpretability toolbox for HF language models๐ค. In both generation and classification!
Why do you need it, and for what?
1/8 (links at the end)
If you use GMail, AI (Gemini) was turned on yesterday by default and now scans all of your content for machine learning. To turn off, go to Settings>General and scroll down. Uncheck the box for "Smart features."
There's other "Smart" add-ons as well, but that's the one that reads your content.
๐ณ๏ธ๐ ๐๐ฃ๐ฉ๐ค ๐ฉ๐๐ ๐๐๐๐๐๐ฉ ๐๐ช๐ก๐ก โ ๐๐๐ง๐ฉ ๐ (๐๐๐๐ก ๐ผ๐ผ ๐ก๐๐๐๐๐๐๐ค)
๐๐ป ๐ถ๐ป๐๐ฒ๐ฟ๐ฝ๐ฟ๐ฒ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐ฑ๐ฒ๐ฒ๐ฝ ๐ฑ๐ถ๐๐ฒ ๐ถ๐ป๐๐ผ ๐๐๐ก๐ข๐๐ฎ, one of visionโs most important foundation models.
And today is Part I, buckle up, we're exploring some of its most charming features. :)
expressing appreciation for this scientific diagram
Can it be biased by people answering randomly.
If you have like 1 person over 5 answering randomly on the other guessing correctly, wouldn't you obtain your blue curve?
Want the full story behind the poster? ๐
I broke down the methodology and results here ๐
๐ฅ I am super excited to be presenting a poster at #ACL2025 in Vienna next week! ๐
This is my first big conference!
๐
Tuesday morning, 10:30โ12:00, during Poster Session 2.
๐ฌ If you're around, feel free to message me. I would be happy to connect, chat, or have a drink!
๐จ New preprint! ๐จ
Everyone loves causal interp. Itโs coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?
๐ฅConSim has been accepted to the #ACL2025 main conference!
๐ Thanks again to my amazing co-authors: @alon_jacovi, Agustin Picard, @VictorBoutin, and @Fannyjrd_.
Work done in DEEL and FOR from IRT St Exupรฉry and @ANITI_Toulouse.
See you in Vienna ๐
For more information, check out my last post:
BlackboxNLP is back! ๐ฅ
Happy to be part of the organizing team for this year, and super excited for our new shared task using the excellent MIB Benchmark, check it out! blackboxnlp.github.io/2025/task/
๐ Our Actionable Interpretability workshop has been accepted to #ICML2025! ๐
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io
@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social
Paper submission deadline: May 9th!
The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."
Hundreds of international students have just received an email telling them their visas have been revoked.
The โjustificationโ is campus activism or social media posts.
timesofindia.indiatimes.com/world/us/hun...
Can we understand the mechanisms of a frontier AI model?
๐ Blog post: www.anthropic.com/research/tra...
๐งช "Biology" paper: transformer-circuits.pub/2025/attribu...
โ๏ธ Methods paper: transformer-circuits.pub/2025/attribu...
Featuring basic multi-step reasoning, planning, introspection and more!
Jawdropping.
You would expect this in a dictatorship, not the United States.
This country is unrecognizable.
What will be the linchpin for AI dominance?
Read our NSF/OSTP recommendations written with Goodfire's Tom McGrath tommcgrath.github.io, Transluce's Sarah Schwettmann cogconfluence.com, MIT's Dylan Hadfield-Menell @dhadfieldmenell.bsky.social
TLDR; Dominance comes from **interpretability** ๐งต โ๏ธ
An assembly of 18 European companies, labs, and universities have banded together to launch ๐ช๐บ EuroBERT!
It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.
Details in ๐งต