Nishant Subramani @ ACL (@nsubramani23)

Every Language Model Has a Forgery-Resistant Signature The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying...

We discovered that language models leave a natural "signature" on their API outputs that's extremely hard to fake. Here's how it works 🔍

📄 arxiv.org/abs/2510.14086 1/

17.10.2025 17:59 👍 86 🔁 23 💬 4 📌 6

At @colmweb.org all week 🥯🍁! Presenting 3 mechinterp + actionable interp papers at @interplay-workshop.bsky.social

1. BERTology in the Modern World w/ @bearseascape.bsky.social
2. MICE for CATs
3. LLM Microscope w/ Jiarui Liu, Jivitesh Jain, @monadiab77.bsky.social

Reach out to chat! #COLM2025

06.10.2025 22:08 👍 9 🔁 2 💬 0 📌 0

Excited to be attending NEMI in Boston today to present 🐁 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools and co-moderate the model steering and control roundtable! Come find me to connect and chat about steering and actionable interp

22.08.2025 12:28 👍 2 🔁 0 💬 0 📌 0

At #ACL2025 in Vienna 🇦🇹 till next Saturday! Love to chat about anything #interpretability 🔎, understanding model internals 🔬, and finding yummy vegan food 🥬

25.07.2025 21:53 👍 5 🔁 0 💬 0 📌 0

At #ICML2025 🇨🇦 till Sunday! Love to chat about #interpretability, understanding model internals, and finding yummy vegan food in Vancouver 🥬🍜

14.07.2025 17:33 👍 5 🔁 0 💬 0 📌 0

Congrats 🥳🥳🥳🥳

13.06.2025 19:08 👍 1 🔁 0 💬 0 📌 0

🚨New #interpretability paper with @nsubramani23.bsky.social: 🕵️ Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

04.06.2025 17:19 👍 1 🔁 1 💬 1 📌 1

🚨 Check out our new #interpretability paper: 🕵🏽 Model Internal Sleuthing led by the amazing @bearseascape.bsky.social who is an undergrad at @scsatcmu.bsky.social @ltiatcmu.bsky.social

04.06.2025 17:41 👍 4 🔁 1 💬 0 📌 0

Excited to announce that I started at @googleresearch.bsky.social on the cloud team as a student researcher last month working with Hamid Palangi on actionable #interpretability 🔍 to build better tool using #agents ⚒️🤖

02.06.2025 16:35 👍 4 🔁 0 💬 0 📌 0

Presenting this today at the poster session at #NAACL2025!

Come chat about interpretability, trustworthiness, and tool-using agents!

🗓️ - Thursday May 1st (today)
📍 - Hall 3
🕑 - 200-330pm

01.05.2025 15:28 👍 2 🔁 0 💬 0 📌 0

At #NAACL2025 🌵till Sunday! Love to chat about interpretability, understanding model internals, and finding vegan food 🥬

30.04.2025 15:03 👍 3 🔁 0 💬 0 📌 0

Come to our poster in Albuquerque on Thursday 2-330pm in the interpretability & analysis section!

Paper: aclanthology.org/2025.naacl-l...
Code (coming soon): github.com/microsoft/mi...

🧵/🧵

29.04.2025 13:41 👍 1 🔁 0 💬 0 📌 0

MICE 🐭:
🎯 - significantly beats baselines on expected tool-calling utility, especially in high risk scenarios
✅ - matches expected calibration error of baselines
✅ - is sample efficient
✅ - generalizes zeroshot to unseen tools

5/🧵

29.04.2025 13:41 👍 0 🔁 0 💬 1 📌 0

Calibration is not sufficient: both an oracle and a model that just predicts the base rate are perfectly calibrated🤦🏽‍♂️

We develop a new metric expected tool-calling utility 🛠️to measure the utility of deciding whether or not to execute a tool call via a confidence score!

4/🧵

29.04.2025 13:41 👍 0 🔁 0 💬 1 📌 0

We propose 🐭 MICE to better assess confidence when calling tools:

1️⃣ decode from each intermediate layer of an LM
2️⃣ compute similarity scores between each layer’s generation and the final output.
3️⃣ train a probabilistic classifier on these features

3/🧵

29.04.2025 13:41 👍 0 🔁 0 💬 1 📌 0

1️⃣ Tool-using agents need to be useful and safe as they take actions in the world
2️⃣ Language models are poorly calibrated

🤔 Can we use model internals to better calibrate language models to make tool-using agents safer and more useful?

2/🧵

29.04.2025 13:41 👍 0 🔁 0 💬 1 📌 0

🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧵

29.04.2025 13:41 👍 12 🔁 8 💬 1 📌 2

Congrats!!

24.04.2025 04:30 👍 1 🔁 0 💬 0 📌 0

Congrats! 🥳

27.03.2025 03:10 👍 1 🔁 0 💬 0 📌 0

A Test So Hard No AI System Can Pass It — Yet The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.

Have these people met … society? Read a book? Listened to music? Regurgitating esoteric facts isn’t intelligence.

This is more like humanity’s last stand at jeopardy

www.nytimes.com/2025/01/23/t...

25.01.2025 18:15 👍 50 🔁 13 💬 3 📌 2

👍🏽 looks good to me!

14.12.2024 01:27 👍 1 🔁 0 💬 0 📌 0

👏🏽 Intro

💼 PhD student @ltiatcmu.bsky.social

📜 My research is in model interpretability 🔎, understanding the internals of LLMs to build more controllable and trustworthy systems

🫵🏽 If you are interested in better understanding of language technology or model interpretability, let's connect!

10.12.2024 15:53 👍 7 🔁 0 💬 1 📌 0

🙋🏽

21.11.2024 14:25 👍 2 🔁 0 💬 0 📌 0

🙋🏽

19.11.2024 14:45 👍 1 🔁 0 💬 0 📌 0

1) I'm working on using intermediate model generations with LLMs to better calibrate tool using agents ⚒️🤖 than the probabilities themselves! Turns out you can 🥳

2) There's gotta be a nice geometric understanding of what's going on within LLMs when we tune them 🤔

18.11.2024 00:11 👍 3 🔁 0 💬 0 📌 0

Love to be added too!

17.11.2024 17:44 👍 1 🔁 0 💬 0 📌 0

Utah is hiring tenure-track/tenured faculty & a priority area is NLP!

Please reach out over email if you have questions about the school and Salt Lake City, happy to share my experience so far.

utah.peopleadmin.com/postings/154...

27.10.2023 17:48 👍 4 🔁 3 💬 0 📌 0

Nishant Subramani @ ACL

Latest posts by Nishant Subramani @ ACL @nsubramani23