Patrick Kahardipraja (@pkhdipraja)

Come visit our poster tomorrow at #NeurIPS2025 in Mexico City! We will be presenting this work 1-4 pm CST.

03.12.2025 20:46 👍 0 🔁 0 💬 0 📌 0

✈️🇲🇽 Next Wednesday (Dec 3), 1–4 p.m. CST, I’ll be presenting Manipulating Feature Visualizations with Gradient Slingshots at NeurIPS 2025 in Mexico City!

Feature Visualization has long been a staple interpretability tool. Our work shows it’s far from reliable! 🚨

29.11.2025 16:38 👍 9 🔁 4 💬 1 📌 0

Now accepted at #NeurIPS2025 :)

24.09.2025 10:02 👍 3 🔁 1 💬 0 📌 0

We will be presenting the paper at #ACL2025NLP 🇦🇹. Feel free to stop by the poster to say hello!

📅 29/07 (Tue) 10:30-12:00
📍 Hall 4/5

#NLProc #interpretability #XAI #mechinterp #MLSky

16.07.2025 13:25 👍 0 🔁 0 💬 0 📌 0

FADE: Why Bad Descriptions Happen to Good Features Recent advances in mechanistic interpretability have highlighted the potential of automating interpretability pipelines in analyzing the latent representations within LLMs. While they may enhance our ...

We supports multiple LLM providers and locally hosted LLMs. For more details, check out our paper! arxiv.org/abs/2502.16994. This project was led by @brunibrun.bsky.social, Aakriti Jain & @golimblevskaia.bsky.social, and helped by Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social & me.

16.07.2025 13:25 👍 0 🔁 0 💬 1 📌 0

FADE quantifies the causes of mismatch of feature-to-description alignment and highlights challenges of current methods, such as various failure modes, how SAE features are more difficult to describe compared to MLP, and interpretability of feature descriptions across layers.

16.07.2025 13:25 👍 0 🔁 0 💬 1 📌 0

Autointerp provides us descriptions of LLMs features, but how it is evaluated varies from one setting to another. We propose FADE, a framework that enables standardized, automatic evaluation of alignment between features and autointerp descriptions across various metrics.

16.07.2025 13:25 👍 2 🔁 1 💬 1 📌 0

🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)

19.06.2025 15:18 👍 37 🔁 12 💬 1 📌 3

Title of the paper, with a colourful "playpen" logo

🚨 New pre-print! (Well, new & much improved version in any case.) 🚨
If you're interested in LLM post-training techniques and in how to make LLMs better "language users", read this thread, introducing the "LM Playpen".

29.05.2025 20:40 👍 14 🔁 5 💬 3 📌 0

Thanks for sharing! We are looking into the works you suggested and plan to discuss them in the next revision of this paper :)

28.05.2025 19:28 👍 1 🔁 0 💬 0 📌 0

Have had enough of the fake "sources" "cited" by ChatGPT? We have the solution in the form of low-cost causal citations for LLMs.

Go check this out! arxiv.org/abs/2505.15807

Thanks to my amazing co-authors
@pkhdipraja.bsky.social,
@reduanachtibat.bsky.social , Thomas Wiegand and Wojciech Samek!

28.05.2025 14:50 👍 8 🔁 3 💬 1 📌 0

Many thanks to my amazing co-authors: @reduanachtibat.bsky.social, Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social !

#NLProc #interpretability #XAI #mechinterp #MLSky

26.05.2025 16:01 👍 2 🔁 0 💬 0 📌 0

Building on the gained insights, we present a probe to track for knowledge provenance during inference and show where it is localized within the input prompt. Our attempt shows promising results, with >94% ROC AUC and >84% localization accuracy.

4/4

26.05.2025 16:01 👍 0 🔁 0 💬 1 📌 0

We analyze how in-context heads can specialize to understand instructions (task heads) and retrieve relevant information (retrieval heads). Together with parametric heads, we investigate their causal roles by extracting function vectors or modifying their weights.

3/

26.05.2025 16:01 👍 0 🔁 0 💬 1 📌 0

Using interpretability tools, we discover that heads important for RAG can be categorized into two: parametric heads that encode relational knowledge and in-context heads that are responsible for processing information in the prompt.

2/

26.05.2025 16:01 👍 0 🔁 0 💬 1 📌 0

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation Large language models are able to exploit in-context learning to access external knowledge beyond their training data through retrieval-augmentation. While promising, its inner workings remain unclear...

ICL allows LLMs to adapt to new tasks and at the same time enables them to access external knowledge through RAG. How does the latter work?

TL;DR we find that certain attention heads perform various, distinct operations on the input prompt for QA!

arxiv.org/abs/2505.15807

1/

26.05.2025 16:01 👍 1 🔁 1 💬 1 📌 2

The University of Potsdam invites applications for 5 postdoc positions, incl. Cognitive Sciences, incl. NLP (esp. cognitive).

These are fairly independent research positions that will allow the candidate to build their own profile. Dln June 2nd.

Details: tinyurl.com/pd-potsdam-2...

#NLProc #AI 🤖🧠

21.05.2025 15:53 👍 2 🔁 2 💬 0 📌 0

NeurIPS participation in Europe We seek to understand if there is interest in being able to attend NeurIPS in Europe, i.e. without travelling to San Diego, US. In the following, assume that it is possible to present accepted papers ...

Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? Søren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)

30.03.2025 18:04 👍 280 🔁 160 💬 6 📌 12

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

10.03.2025 18:14 👍 522 🔁 296 💬 23 📌 83

Adaptive Computation Time for Recurrent Neural Networks This paper introduces Adaptive Computation Time (ACT), an algorithm that allows recurrent neural networks to learn how many computational steps to take between receiving an input and emitting an outpu...

Adaptive computation? arxiv.org/abs/1603.08983

08.01.2025 09:36 👍 0 🔁 0 💬 0 📌 0

Volunteer to join ACL 2025 Programme Committee Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for variou...

📣📣 Wanna be an Area Chair or a Reviewer for @aclmeeting.bsky.social or know someone who would?

Nominations and self-nominations go here 👇

docs.google.com/forms/d/e/1F...

06.12.2024 06:01 👍 15 🔁 10 💬 0 📌 1

can you please add me? Thanks!

26.11.2024 15:44 👍 2 🔁 0 💬 0 📌 0

Hi, would love to be added :)

19.11.2024 09:28 👍 1 🔁 0 💬 0 📌 0

Patrick Kahardipraja

Latest posts by Patrick Kahardipraja @pkhdipraja