Come visit our poster tomorrow at #NeurIPS2025 in Mexico City! We will be presenting this work 1-4 pm CST.
Come visit our poster tomorrow at #NeurIPS2025 in Mexico City! We will be presenting this work 1-4 pm CST.
βοΈπ²π½ Next Wednesday (Dec 3), 1β4 p.m. CST, Iβll be presenting Manipulating Feature Visualizations with Gradient Slingshots at NeurIPS 2025 in Mexico City!
Feature Visualization has long been a staple interpretability tool. Our work shows itβs far from reliable! π¨
Now accepted at #NeurIPS2025 :)
We will be presenting the paper at #ACL2025NLP π¦πΉ. Feel free to stop by the poster to say hello!
π
29/07 (Tue) 10:30-12:00
π Hall 4/5
#NLProc #interpretability #XAI #mechinterp #MLSky
We supports multiple LLM providers and locally hosted LLMs. For more details, check out our paper! arxiv.org/abs/2502.16994. This project was led by @brunibrun.bsky.social, Aakriti Jain & @golimblevskaia.bsky.social, and helped by Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social & me.
FADE quantifies the causes of mismatch of feature-to-description alignment and highlights challenges of current methods, such as various failure modes, how SAE features are more difficult to describe compared to MLP, and interpretability of feature descriptions across layers.
Autointerp provides us descriptions of LLMs features, but how it is evaluated varies from one setting to another. We propose FADE, a framework that enables standardized, automatic evaluation of alignment between features and autointerp descriptions across various metrics.
π When do neurons encode multiple concepts?
We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.
π Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538
π§΅ (1/7)
Title of the paper, with a colourful "playpen" logo
π¨ New pre-print! (Well, new & much improved version in any case.) π¨
If you're interested in LLM post-training techniques and in how to make LLMs better "language users", read this thread, introducing the "LM Playpen".
Thanks for sharing! We are looking into the works you suggested and plan to discuss them in the next revision of this paper :)
Have had enough of the fake "sources" "cited" by ChatGPT? We have the solution in the form of low-cost causal citations for LLMs.
Go check this out! arxiv.org/abs/2505.15807
Thanks to my amazing co-authors
@pkhdipraja.bsky.social,
@reduanachtibat.bsky.social , Thomas Wiegand and Wojciech Samek!
Many thanks to my amazing co-authors: @reduanachtibat.bsky.social, Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social !
#NLProc #interpretability #XAI #mechinterp #MLSky
Building on the gained insights, we present a probe to track for knowledge provenance during inference and show where it is localized within the input prompt. Our attempt shows promising results, with >94% ROC AUC and >84% localization accuracy.
4/4
We analyze how in-context heads can specialize to understand instructions (task heads) and retrieve relevant information (retrieval heads). Together with parametric heads, we investigate their causal roles by extracting function vectors or modifying their weights.
3/
Using interpretability tools, we discover that heads important for RAG can be categorized into two: parametric heads that encode relational knowledge and in-context heads that are responsible for processing information in the prompt.
2/
ICL allows LLMs to adapt to new tasks and at the same time enables them to access external knowledge through RAG. How does the latter work?
TL;DR we find that certain attention heads perform various, distinct operations on the input prompt for QA!
arxiv.org/abs/2505.15807
1/
The University of Potsdam invites applications for 5 postdoc positions, incl. Cognitive Sciences, incl. NLP (esp. cognitive).
These are fairly independent research positions that will allow the candidate to build their own profile. Dln June 2nd.
Details: tinyurl.com/pd-potsdam-2...
#NLProc #AI π€π§
Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? SΓΈren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)
*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...
π£π£ Wanna be an Area Chair or a Reviewer for @aclmeeting.bsky.social or know someone who would?
Nominations and self-nominations go here π
docs.google.com/forms/d/e/1F...
can you please add me? Thanks!
Hi, would love to be added :)