🚨2 PhD positions with me @amlab.bsky.social on learning causally grounded concepts 🚨
Are you interested in improving the #interpretability #robustness and #safety of AI by integrating #causal reasoning? Join us in beautiful Amsterdam 🇳🇱🌷🚲
Deadline: 20 April
www.academictransfer.com/en/jobs/3593...
Congratulations Qi for your PhD defense 🎓 on Sports #MotionAnalysis from monocular video and 3D #HPE. Addressing the lack of high-quality datasets, he leverages #DeepLearning representations to ensure #interpretability, contributing to #XAI
📖 theses.hal.science/tel-05291306/
📽️ youtu.be/F5_wZGvdCaM
Introducing Steerling-8B by Guide Labs: A groundbreaking interpretable LLM that traces every token to its training data, enhancing transparency in AI. #AI #MachineLearning #Interpretability Link: thedailytechfeed.com/guide-labs-l...
We thank Andreas for his contributions to the Lab and wish him all the best for his future!
#NLP #PhDDefense #ComputationalArgumentation #Reliability #Interpretability #UKPLab #TUDarmstadt #UniTuebingen #LLMs #NLProc
Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work
awesomeagents.ai/news/guide-labs-steerlin...
#OpenSource #Interpretability #GuideLabs
AI continuity isn’t emergence it’s design.
Beyond AGI IV: Continuity of Intelligence shows how memory, alignment, and control intertwine to sustain stable cognition in modern systems.
Structure defines persistence; regulation defines thought.
doi.org/10.5281/zeno...
#AIAlignment #Interpretability
Я измерил «личность» 6 open-source LLM (7B-9B), заглянув в их hidden states. Вот что получилось У LLM есть устойчивый стиль отве...
#LLM #alignment #hidden #states #personality #temperament #RLHF #open-source #mechanistic #interpretability
Origin | Interest | Match
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective
Zubair Bashir, Bhavik Chandna, Procheta Sen
Action editor: Chris Maddison
https://openreview.net/forum?id=EpQ2CBJTjD
#biases #bias #interpretability
B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg
Action editor: Yingce Xia
https://openreview.net/forum?id=c180UH8Dg8
#explanations #explainability #interpretability
Miriam's research focuses on #trustworthiness in machine learning, particularly #fairness and #interpretability, with a growing emphasis on challenges emerging in the era of large language models.
Watch/listen to the full episode 🎧
YouTube: youtu.be/rSC7L5WikcE?...
Spotify: open.spotify.com/episode/37YB...
Apple: podcasts.apple.com/ca/podcast/a...
Paper: arxiv.org/abs/2504.17993
#WiAIR #WomenInAI #AIResearch #LLMs #AISafety #Interpretability (8/8🧵)
simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity
LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)
Nils’ research interests span model #explainability and #interpretability, text evaluation metrics, interactivity and dialogue, and biomedical NLP.
Как «думает» ИИ: гроккаем разреженные автоэнкодеры (SAE) В этой статье разберём исследование от компании Anthro...
#Сезон #ИИ #в #разработке #LLM #interpretable #ml #interpretability #interpretable #AI #искусственный
Origin | Interest | Match
Looking forward to the next “Theory of Interpretable AI” seminar on January 15, where Chhavi Yadav will present "ExpProof"! A fresh take on trustworthy explanations for confidential ML models using Zero-Knowledge Proofs. Feel free to join! #interpretability #Crypto
tverven.github.io/tiai-seminar/
New paper introduces Gnosis, a lightweight self-awareness mechanism that lets frozen LLMs predict correctness by inspecting internal circuits (hidden states & attention).
📄 arXiv: 2512.20578
#AI #LLMs #MachineLearning #SelfAwareness #Interpretability #AIAlignment #NeurIPS #ICLR #DeepLearning
winbuzzer.com/2025/12/19/g...
Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits
#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging
winbuzzer.com/2025/12/19/g...
Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits
#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging
AI's Philosophical Tech Challenge - Dean W Ball on 80000 Hours
#interpretability #ai #courtroom
Gregory's work focuses on #interpretability of language models, with a particular interest in in-context learning, retrieval, and retrieval-augmented generation (#RAG). Gregory aims to uncover how these models operate internally to make them more efficient and safer.
Open Problems in Mechanistic Interpretability
Lee Sharkey, Bilal Chughtai, Joshua Batson et al.
Action editor: Sarath Chandar
https://openreview.net/forum?id=91H76m9Z94
#interpretability #ai #mechanistic
Some of the #NeurIPS 2025 papers our lab contributed to. Curious, please reach out to Thomas Dooms who is on site or just drop us an email
#Interpretability #explainability #MechInterp #XAI #AI #ML
#sqIRL #IDLab #UAntwerp
Kudos to Thomas and the involved collaborators for the solid contributions to the field.
Curious about our work, have a look at our website: sqirllab.github.io/
#Interpretability #mechinterp #compinterp #xai #AI #ML
#sqIRL #UAntwerp #IDLab
At the MI workshop (spotlight), we show how Bilinear Autoencoders ease the analysis of neural representations through their decomposition into polynomial latents.
Paper and the cool demos at tdooms.github.io/research/bae
#Interpretability #mechinterp #compinterp #xai #AI #ML
#UAntwerp #IDLab
Have a look at the work our lab will be presenting at #NeurIPS '25.
On the main track, SimpleStories, a dataset full of simple yet diverse stories which has the potential of becoming the MNIST for language.
openreview.net/pdf?id=sVh3e...
#Interpretability #mechinterp #xai #AI #ML
#sqIRL #UAntwerp
We just launched a #linkedin page. Please help us spread the word and share it with people that might be interested.
linkedin.com/company/sqir...
#RepresentationLearning #interpretability #explainability #XAI #mechinterp #AI #ML #sqIRL #ComputerVision #HSI #IDLab #UAntwerp
Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI. Unlike most open weight models these are notable for including the full training data, training process and ...
#ai #generative-ai #llms #interpretability […]
[Original post on simonwillison.net]
Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI . Unlike most open weight models these are notable for including the full training data, training process and...
#ai #generative-ai #llms #interpretability #pelican-riding-a-bicycle #llm-reasoning #ai2 […]