NicolΓ² & Mingyang: Can we understand which circuits emerge in small models and reasoning-tuned systems, and how do they compare with default systems? Are there methods that generalize better across all tasks?
NicolΓ² & Mingyang: Can we understand which circuits emerge in small models and reasoning-tuned systems, and how do they compare with default systems? Are there methods that generalize better across all tasks?
Q: What's next for interpretability benchmarks? Michal: People sitting together and planning how to extend tests to multimodal, diverse contexts. @michaelwhanna.bsky.social: For circuit finding, integrating sparse features circuits could help us better understand our models.
NicolΓ² & Mingyang: Starting to explore notebooks and public libraries can be very helpful in gaining early intuitions about what's promising.
@michaelwhanna.bsky.social: Don't try to read everything. Find Qs you really care about, and go a level deeper to answer meaningful questions.
Q: How would one go about approaching interpretability research these days? Michal: "When things don't work out of the box, it's a sign to double down and find out why. Negative results are important!"
@danaarad.bsky.social: As deep learning research converges on similar architectures for different modalities, it will be interesting to determine which interpretability method will remain useful across various models and tasks.
@michaelwhanna.bsky.social, NicolΓ² & Mingyang: Counterfactuals in minimal settings can be helpful, but they do not capture the whole story. Extending current methods to long contexts, and finding practical applications in safety-related areas are exciting challenges ahead.
Michal: Mechanistic interpretability has heavily focused on toy tasks and text-only models. The next step is scaling to more complex tasks that involve real-world reasoning.
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! π Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, NicolΓ² Brunello and Mingyang Wang!
Next up: Kentaro Ozeki presenting "Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives" aclanthology.org/2025.blackbo...
After a productive poster session, BlackboxNLP returns with the second keynote "Memorization: Myth or Mystery?" by @vernadankers.bsky.social!
Nadav Shani is giving the first oral presentation of the day: Language Dominance in Multilingual Large Language Models. Find the paper here: aclanthology.org/2025.blackbo...
Next up: Circuit-Tracer: A New Library for Finding Feature Circuits presented by @michaelwhanna.bsky.social! Paper: aclanthology.org/2025.blackbo...
I'll be presenting this work at @blackboxnlp.bsky.social in Suzhou, happy to chat there or here if you are interested !
Nov 9, @blackboxnlp.bsky.social , 11:00-12:00 @ Hall C β Interpreting Language Models Through Concept Descriptions: A Survey (Feldhus & Kopf) @lkopf.bsky.social
ποΈ aclanthology.org/2025.blackbo...
bsky.app/profile/nfel...
Quanshi Zhang is giving the first keynote of the day: Can Neural Network Interpretability Be the Key to Breaking Through Scaling Law Limitations in Deep Learning?
BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!
π’ Call for Papers! π’
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
π
Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
π More details: blackboxnlp.github.io/2025/call/
Writing your technical report for the MIB shared task?
Take a look at the task page for guidelines and tips!
The report deadline was also extended to August 10th!
Note that this is a final extension. We look forward to reading your reports! βοΈ
Just 5 days left to submit your method to the MIB Shared Task at #BlackboxNLP!
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
Results + technical report deadline: August 8, 2025
Full task details: blackboxnlp.github.io/2025/task/
With the new extended deadline, there's still plenty of time to submit your method to the MIB Shared Task!
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization π‘
Results deadline extended by one week!
Following requests from participants, weβre extending the MIB Shared Task submission deadline by one week.
ποΈ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
π Technical report guidelines are out!
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP!
If you're working on:
π§ Circuit discovery
π Feature attribution
π§ͺ Causal variable localization
nowβs the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
Are you attending ICML? π
I'm sadly not, but if you are, you should check out the MIB πΆοΈposter at 11AM: icml.cc/virtual/2025...
The benchmark is used as the shared task at this year's
@blackboxnlp.bsky.social (blackboxnlp.github.io/2025/task/) - there's still time to participate π
β³ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with @emnlpmeeting.bsky.social
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
Have you started working on your submission for the MIB shared task yet? Tell us what youβre exploring!
New featurization methods?
Circuit pruning?
Better feature attribution?
We'd love to hear about it π
ποΈ Deadline: August 1
π Full task details: blackboxnlp.github.io/2025/task/
π¬ Join the discussion: discord.gg/n5uwjQcxPR