#interpretability — Bluesky Posts

@smaglia.bsky.social

1 day ago

2 PhD Positions on Learning Causally Grounded Concepts for Safe AI Are you interested in improving the interpretability, robustness and safety of AI by integrating causal reasoning? The Causality team in the AMLab group at the University of Amsterdam is looking for 2...

🚨2 PhD positions with me @amlab.bsky.social on learning causally grounded concepts 🚨

Are you interested in improving the #interpretability #robustness and #safety of AI by integrating #causal reasoning? Join us in beautiful Amsterdam 🇳🇱🌷🚲

Deadline: 20 April

www.academictransfer.com/en/jobs/3593...

9 3 0 0

Nguyen Sao Mai

@nguyensmai.bsky.social

1 week ago

Sports Motion Analysis : From Competition Videos to Data-Driven Interpretations YouTube video by Nguyen Sao Mai

Congratulations Qi for your PhD defense 🎓 on Sports #MotionAnalysis from monocular video and 3D #HPE. Addressing the lack of high-quality datasets, he leverages #DeepLearning representations to ensure #interpretability, contributing to #XAI

📖 theses.hal.science/tel-05291306/
📽️ youtu.be/F5_wZGvdCaM

1 0 0 0

The Daily Tech Feed

@thedailytechfeed.com

2 weeks ago

Introducing Steerling-8B by Guide Labs: A groundbreaking interpretable LLM that traces every token to its training data, enhancing transparency in AI. #AI #MachineLearning #Interpretability Link: thedailytechfeed.com/guide-labs-l...

0 0 0 0

UKP Lab

@ukplab.bsky.social

2 weeks ago

We thank Andreas for his contributions to the Lab and wish him all the best for his future!

#NLP #PhDDefense #ComputationalArgumentation #Reliability #Interpretability #UKPLab #TUDarmstadt #UniTuebingen #LLMs #NLProc

1 0 0 0

Awesome Agents

@awesomeagents.bsky.social

2 weeks ago

Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work YC-backed startup Guide Labs releases Steerling-8B under Apache 2.0 - an 8.4B parameter model with a built-in concept module that traces every output token back to its training data.

Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work

awesomeagents.ai/news/guide-labs-steerlin...

#OpenSource #Interpretability #GuideLabs

1 0 0 0

Jace Kim

@jaceblog.bsky.social

3 weeks ago

Beyond AGI IV: Continuity of Intelligence Abstract Contemporary large-scale artificial intelligence systems increasingly exhibit continuous cognition, persistent memory, and stable interactional identity across extended human engagement. Whil...

AI continuity isn’t emergence it’s design.
Beyond AGI IV: Continuity of Intelligence shows how memory, alignment, and control intertwine to sustain stable cognition in modern systems.
Structure defines persistence; regulation defines thought.

doi.org/10.5281/zeno...

#AIAlignment #Interpretability

0 0 0 0

deepseek

@deepseek.activitypub.awakari.com.ap.brid.gy

1 month ago

Я измерил «личность» 6 open-source LLM (7B-9B), заглянув в их hidden states. Вот что получилось У LLM есть устойчивый стиль отве...

#LLM #alignment #hidden #states #personality #temperament #RLHF #open-source #mechanistic #interpretability

Origin | Interest | Match

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

1 month ago

Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

Zubair Bashir, Bhavik Chandna, Procheta Sen

Action editor: Chris Maddison

https://openreview.net/forum?id=EpQ2CBJTjD

#biases #bias #interpretability

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

1 month ago

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg

Action editor: Yingce Xia

https://openreview.net/forum?id=c180UH8Dg8

#explanations #explainability #interpretability

0 0 0 0

Health NLP Lab

@health-nlp.com

1 month ago

Miriam's research focuses on #trustworthiness in machine learning, particularly #fairness and #interpretability, with a growing emphasis on challenges emerging in the era of large language models.

0 0 1 0

Women in AI Research - WiAIR

@wiair.bsky.social

1 month ago

AI Safety Beyond Benchmarks -- Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control YouTube video by Women in AI Research WiAIR

Watch/listen to the full episode 🎧
YouTube: youtu.be/rSC7L5WikcE?...
Spotify: open.spotify.com/episode/37YB...
Apple: podcasts.apple.com/ca/podcast/a...
Paper: arxiv.org/abs/2504.17993
#WiAIR #WomenInAI #AIResearch #LLMs #AISafety #Interpretability (8/8🧵)

1 0 0 0

Andreas Waldis

@tresiwald.bsky.social

1 month ago

simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity

LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)

11 5 1 0

Health NLP Lab

@health-nlp.com

1 month ago

Nils’ research interests span model #explainability and #interpretability, text evaluation metrics, interactivity and dialogue, and biomedical NLP.

1 0 1 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

2 months ago

Как «думает» ИИ: гроккаем разреженные автоэнкодеры (SAE) В этой статье разберём исследование от компании Anthro...

#Сезон #ИИ #в #разработке #LLM #interpretable #ml #interpretability #interpretable #AI #искусственный

Origin | Interest | Match

0 0 0 0

Tim van Erven

@timvanerven.nl

2 months ago

Looking forward to the next “Theory of Interpretable AI” seminar on January 15, where Chhavi Yadav will present "ExpProof"! A fresh take on trustworthy explanations for confidential ML models using Zero-Knowledge Proofs. Feel free to join! #interpretability #Crypto

tverven.github.io/tiai-seminar/

2 0 0 2

NIDHAL ZITOUNI

@zitouni-n.bsky.social

2 months ago

New paper introduces Gnosis, a lightweight self-awareness mechanism that lets frozen LLMs predict correctness by inspecting internal circuits (hidden states & attention).
📄 arXiv: 2512.20578
#AI #LLMs #MachineLearning #SelfAwareness #Interpretability #AIAlignment #NeurIPS #ICLR #DeepLearning

1 0 0 0

Winbuzzer

@winbuzzer.com

2 months ago

Gemma Scope 2: New Google Tools Let Researchers Trace AI 'Thought' Circuits - WinBuzzer Gemma Scope 2,is a comprehensive open-source suite trained on 110 petabytes of data to map internal reasoning circuits across the entire Gemma 3 model family.

winbuzzer.com/2025/12/19/g...

Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits

#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging

4 0 0 0

Winbuzzer

@winbuzzer.com

2 months ago

Gemma Scope 2: New Google Tools Let Researchers Trace AI 'Thought' Circuits - WinBuzzer Gemma Scope 2,is a comprehensive open-source suite trained on 110 petabytes of data to map internal reasoning circuits across the entire Gemma 3 model family.

winbuzzer.com/2025/12/19/g...

Gemma Scope 2: New Google Tools Let Researchers Trace AI ‘Thought’ Circuits

#AI #GoogleDeepMind #Gemma3 #AISafety #MachineLearning #OpenSourceAI #Interpretability #NeuralNetworks #LLMs #AIResearch #DeepLearning #ModelDebugging

3 0 0 0

trending stonks

@trendingstocks.bsky.social

2 months ago

AI's Philosophical Tech Challenge - Dean W Ball on 80000 Hours

#interpretability #ai #courtroom

0 0 0 0

Health NLP Lab

@health-nlp.com

3 months ago

Gregory's work focuses on #interpretability of language models, with a particular interest in in-context learning, retrieval, and retrieval-augmented generation (#RAG). Gregory aims to uncover how these models operate internally to make them more efficient and safer.

0 0 1 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 months ago

Open Problems in Mechanistic Interpretability

Lee Sharkey, Bilal Chughtai, Joshua Batson et al.

Action editor: Sarath Chandar

https://openreview.net/forum?id=91H76m9Z94

#interpretability #ai #mechanistic

2 1 0 0

José Oramas

@jaom7.bsky.social

3 months ago

Some of the #NeurIPS 2025 papers our lab contributed to. Curious, please reach out to Thomas Dooms who is on site or just drop us an email

#Interpretability #explainability #MechInterp #XAI #AI #ML
#sqIRL #IDLab #UAntwerp

1 0 0 0

sqIRL Lab

@sqirllab.bsky.social

3 months ago

sqIRL - Interpretable Representation Learning Research lab focused on interpretable representation learning and explainable AI

Kudos to Thomas and the involved collaborators for the solid contributions to the field.

Curious about our work, have a look at our website: sqirllab.github.io/

#Interpretability #mechinterp #compinterp #xai #AI #ML
#sqIRL #UAntwerp #IDLab

0 0 0 0

sqIRL Lab

@sqirllab.bsky.social

3 months ago

tdooms

At the MI workshop (spotlight), we show how Bilinear Autoencoders ease the analysis of neural representations through their decomposition into polynomial latents.
Paper and the cool demos at tdooms.github.io/research/bae

#Interpretability #mechinterp #compinterp #xai #AI #ML
#UAntwerp #IDLab

0 0 1 0

sqIRL Lab

@sqirllab.bsky.social

3 months ago

Have a look at the work our lab will be presenting at #NeurIPS '25.
On the main track, SimpleStories, a dataset full of simple yet diverse stories which has the potential of becoming the MNIST for language.
openreview.net/pdf?id=sVh3e...

#Interpretability #mechinterp #xai #AI #ML
#sqIRL #UAntwerp

0 0 1 1

sqIRL Lab

@sqirllab.bsky.social

3 months ago

sqIRL (Interpretable Representation Learning) | LinkedIn sqIRL (Interpretable Representation Learning) | 23 followers on LinkedIn. We are "squIRreL", the Interpretable Representation Learning Lab based at IDLab - University of Antwerp & imec. ...

We just launched a #linkedin page. Please help us spread the word and share it with people that might be interested.
linkedin.com/company/sqir...

#RepresentationLearning #interpretability #explainability #XAI #mechinterp #AI #ML #sqIRL #ComputerVision #HSI #IDLab #UAntwerp

1 1 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

3 months ago

Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI. Unlike most open weight models these are notable for including the full training data, training process and ...

#ai #generative-ai #llms #interpretability […]

[Original post on simonwillison.net]

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

3 months ago

Original post on simonwillison.net

Olmo 3 is a fully open LLM Olmo is the LLM series from Ai2 - the Allen institute for AI . Unlike most open weight models these are notable for including the full training data, training process and...

#ai #generative-ai #llms #interpretability #pelican-riding-a-bicycle #llm-reasoning #ai2 […]

0 0 0 0