Antonin Pochรฉ's Avatar

Antonin Pochรฉ

@antoninpoche

PhD Student doing XAI for NLP at @ANITI_Toulouse, IRIT, and IRT Saint Exupery. ๐Ÿ› ๏ธ Interpreto & Xplique library development team member. https://antoninpoche.github.io/

72
Followers
107
Following
26
Posts
23.01.2025
Joined
Posts Following

Latest posts by Antonin Pochรฉ @antoninpoche

Video thumbnail

Ho and I also made a longer video with a voice-over if it's useful to anyone.

๐Ÿ”Š

04.03.2026 10:10 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

If you are interested in the library, you can check out the corresponding thread below:
bsky.app/profile/anto...

Or the GitHub directly: github.com/FOR-sight-ai...

04.03.2026 10:10 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ”ฅSuper excited to share our new demo website for ๐Ÿช„Interpreto!

๐Ÿ–ผ๏ธIt is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.

๐ŸŽฎPlay with it: for-sight-ai.github.io/interpreto-d...

We will keep improving it, so stay tuned!

04.03.2026 10:10 ๐Ÿ‘ 8 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I also did a thread to present the library quickly:

bsky.app/profile/anto...

23.01.2026 13:52 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Pleasently surprised to see our blog post trending on HuggingFace ๐Ÿค—

Well, @fannyjrd.bsky.social did a great job! ๐Ÿš€

If you missed it, check it out: huggingface.co/blog/Fannyjr...

It's a didactic presentation of our new library: ๐Ÿช„ Interpreto:
github.com/FOR-sight-ai...

23.01.2026 13:51 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

It was an honor to be part of this awesome project! Interpreto is a great up-and-coming tool for concept-based interpretability analyses of NLP models, check it out!

21.01.2026 04:20 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
GitHub - FOR-sight-ai/interpreto: ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

๐ŸŽ‰ Iโ€™m thrilled to announce the release of Interpreto: a user-friendly, open-source toolbox to make NLP model interpretability accessible, practical, and rigorous.
github.com/FOR-sight-ai...
๐Ÿงต1/5

20.01.2026 17:32 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - FOR-sight-ai/interpreto: ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

๐Ÿ“ฆYou can find the library on GitHub: github.com/FOR-sight-ai...

๐Ÿ“šAccess the documentation: for-sight-ai.github.io/interpreto/

โฌDownload with pip: `uv pip install interpreto`

๐Ÿ“ฐLook at our paper: arxiv.org/abs/2512.097...

๐Ÿค— Check our Huggingface blog post: huggingface.co/blog/Fannyjr...

8/8

20.01.2026 16:10 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐Ÿ”ฅThe amazing team: @fannyjrd.bsky.social, Thomas Mullor, @gsarti.com, Frรฉdรฉric Boisnard, Corentin Friedrich, Charlotte Claye, Franรงois Hooft, and Raphaรซl Bernas!!

๐Ÿ™And to the supporters: IRT Saint Exupery, ANITI, @centralesupelec.bsky.social, DEEL.ai and FOR projects.

7/8

20.01.2026 16:10 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Overview - Interpreto Interpretability Toolkit for LLMs

You can do all these steps in interpreto using a wide range of methods.

Check out the documentation for more details: for-sight-ai.github.io/interpreto/a...

Or the tutorials:

- for-sight-ai.github.io/interpreto/n...
- for-sight-ai.github.io/interpreto/n...

6/8

20.01.2026 16:10 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

For concepts, there are 4 steps:

1. Split the model and get activations. (wraps `nnsight` @ndif-team.bsky.social)

2. Find patterns in activations (SAEs...) (wraps `overcomplete` @thomasfel.bsky.social )

3. Interpret the concepts

4. Estimate concepts' contributions to the output

5. Evaluate

5/8

20.01.2026 16:10 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿ’กInterpreto provides concept-based explanations (post-hoc unsupervised), part of the Mechanistic Interpretability field. Concepts answer:

โ”What higher-level features exist inside the modelโ€™s hidden space, and how do they affect outputs?

4/8

โฌ‡๏ธExample on the AG News dataset.

20.01.2026 16:10 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ”ฅ We implement the classic attribution methods. Both `ForSequenceClassification` and `ForCausalLM`.

There are both perturbation-based โžก๏ธ and gradient-based methods ๐Ÿ”. About 10 methods globally.

๐Ÿ“ŠThere are two metrics.

๐Ÿ”น๐Ÿ”ท๐ŸŸฆYou can fix the granularity of explanations.

3/8

20.01.2026 16:10 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐ŸŽ“โžก๏ธ๐Ÿ‘ฅThe goal of the library is to bridge the gap between practitioners applying interpretability methods and the SOTA.

๐Ÿš€The library is still in active development. Hence, we welcome your feedback and contributions. ๐Ÿค—

๐Ÿ‘‹๐Ÿ“จ Raise an issue, open a PR, or contact us.

2/8

20.01.2026 16:10 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿ”ฅI am super excited for the official release of an open-source library we've been working on for about a year!

๐Ÿช„interpreto is an interpretability toolbox for HF language models๐Ÿค—. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)

20.01.2026 16:03 ๐Ÿ‘ 20 ๐Ÿ” 9 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 3

If you use GMail, AI (Gemini) was turned on yesterday by default and now scans all of your content for machine learning. To turn off, go to Settings>General and scroll down. Uncheck the box for "Smart features."

There's other "Smart" add-ons as well, but that's the one that reads your content.

20.11.2025 17:32 ๐Ÿ‘ 10768 ๐Ÿ” 8014 ๐Ÿ’ฌ 326 ๐Ÿ“Œ 787
Video thumbnail

๐Ÿ•ณ๏ธ๐Ÿ‡ ๐™„๐™ฃ๐™ฉ๐™ค ๐™ฉ๐™๐™š ๐™๐™–๐™—๐™—๐™ž๐™ฉ ๐™ƒ๐™ช๐™ก๐™ก โ€“ ๐™‹๐™–๐™ง๐™ฉ ๐™„ (๐‘ƒ๐‘Ž๐‘Ÿ๐‘ก ๐ผ๐ผ ๐‘ก๐‘œ๐‘š๐‘œ๐‘Ÿ๐‘Ÿ๐‘œ๐‘ค)

๐—”๐—ป ๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ฟ๐—ฒ๐˜๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ฒ๐—ฒ๐—ฝ ๐—ฑ๐—ถ๐˜ƒ๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐——๐—œ๐—ก๐—ข๐˜ƒ๐Ÿฎ, one of visionโ€™s most important foundation models.

And today is Part I, buckle up, we're exploring some of its most charming features. :)

14.10.2025 21:00 ๐Ÿ‘ 36 ๐Ÿ” 12 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Post image

expressing appreciation for this scientific diagram

05.10.2025 20:55 ๐Ÿ‘ 50 ๐Ÿ” 7 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 0

Can it be biased by people answering randomly.

If you have like 1 person over 5 answering randomly on the other guessing correctly, wouldn't you obtain your blue curve?

22.09.2025 18:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Want the full story behind the poster? ๐ŸŽ‰
I broke down the methodology and results here ๐Ÿ‘‡

25.07.2025 15:38 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐Ÿ”ฅ I am super excited to be presenting a poster at #ACL2025 in Vienna next week! ๐ŸŒ

This is my first big conference!

๐Ÿ“… Tuesday morning, 10:30โ€“12:00, during Poster Session 2.

๐Ÿ’ฌ If you're around, feel free to message me. I would be happy to connect, chat, or have a drink!

25.07.2025 15:37 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿšจ New preprint! ๐Ÿšจ

Everyone loves causal interp. Itโ€™s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?

10.07.2025 14:30 ๐Ÿ‘ 63 ๐Ÿ” 12 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 2

๐Ÿ”ฅConSim has been accepted to the #ACL2025 main conference!

๐Ÿ™ Thanks again to my amazing co-authors: @alon_jacovi, Agustin Picard, @VictorBoutin, and @Fannyjrd_.

Work done in DEEL and FOR from IRT St Exupรฉry and @ANITI_Toulouse.

See you in Vienna ๐Ÿ“…

For more information, check out my last post:

16.05.2025 08:45 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

BlackboxNLP is back! ๐Ÿ’ฅ

Happy to be part of the organizing team for this year, and super excited for our new shared task using the excellent MIB Benchmark, check it out! blackboxnlp.github.io/2025/task/

15.05.2025 08:24 ๐Ÿ‘ 6 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐ŸŽ‰ Our Actionable Interpretability workshop has been accepted to #ICML2025! ๐ŸŽ‰
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

31.03.2025 16:59 ๐Ÿ‘ 42 ๐Ÿ” 16 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 3
The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."

The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."

Hundreds of international students have just received an email telling them their visas have been revoked.

The โ€˜justificationโ€™ is campus activism or social media posts.

timesofindia.indiatimes.com/world/us/hun...

29.03.2025 14:11 ๐Ÿ‘ 5657 ๐Ÿ” 3086 ๐Ÿ’ฌ 208 ๐Ÿ“Œ 610
On the Biology of a Large Language Model

Can we understand the mechanisms of a frontier AI model?

๐Ÿ“ Blog post: www.anthropic.com/research/tra...
๐Ÿงช "Biology" paper: transformer-circuits.pub/2025/attribu...
โš™๏ธ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

27.03.2025 18:18 ๐Ÿ‘ 125 ๐Ÿ” 28 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 3
Post image

Jawdropping.

You would expect this in a dictatorship, not the United States.

This country is unrecognizable.

20.03.2025 02:11 ๐Ÿ‘ 18712 ๐Ÿ” 7691 ๐Ÿ’ฌ 1405 ๐Ÿ“Œ 825
Post image

What will be the linchpin for AI dominance?

Read our NSF/OSTP recommendations written with Goodfire's Tom McGrath tommcgrath.github.io, Transluce's Sarah Schwettmann cogconfluence.com, MIT's Dylan Hadfield-Menell @dhadfieldmenell.bsky.social

TLDR; Dominance comes from **interpretability** ๐Ÿงต โ†˜๏ธ

16.03.2025 13:57 ๐Ÿ‘ 21 ๐Ÿ” 8 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

An assembly of 18 European companies, labs, and universities have banded together to launch ๐Ÿ‡ช๐Ÿ‡บ EuroBERT!

It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

Details in ๐Ÿงต

10.03.2025 09:43 ๐Ÿ‘ 80 ๐Ÿ” 20 ๐Ÿ’ฌ 5 ๐Ÿ“Œ 1