#VISxAI IS BACK!! ๐ค๐
Submit your interactive โexplainablesโ and โexplorablesโ that visualize, interpret, and explain AI. #IEEEVIS
๐ Deadline: July 30, 2025
visxai.io
#VISxAI IS BACK!! ๐ค๐
Submit your interactive โexplainablesโ and โexplorablesโ that visualize, interpret, and explain AI. #IEEEVIS
๐ Deadline: July 30, 2025
visxai.io
Iโll be at #CHI2025 ๐ธ
If you are excited about interpretability and human-AI alignment โ letโs chat!
And come see Abstraction Alignment โฌ๏ธ in the Explainable AI paper session on Monday at 4:20 JST
Check out Abstraction Alignment at #CHI2025!
๐Paper: arxiv.org/abs/2407.12543
๐ปDemo: vis.mit.edu/abstraction-...
๐ฅVideo: www.youtube.com/watch?v=cLi9...
๐Project: vis.mit.edu/pubs/abstrac...
With Hyemin (Helen) Bang, @henstr.bsky.social, and @arvind.bsky.social
Abstraction Alignment reframes alignment around conceptual relationships, not just concepts.
It helps us audit models, datasets, and even human knowledge.
I'm excited to explore ways to ๐ extract abstractions from models and ๐ฅ align them to individual users' perspectives.
Abstraction Alignment works on datasets too!
Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.
This mirrors real-world updates to medical abstractions โ showing how models can help us rethink human knowledge.
Two examples of Abstraction Alignment applied to a language model.
Language models often prefer specific answers even at the cost of performance.
But Abstraction Alignment reveals that the concepts an LM considers are often abstraction-aligned, even when itโs wrong.
This helps separate surface-level errors from deeper conceptual misalignment.
A screenshot of the Abstraction Alignment interface.
And we packaged Abstraction Alignment and its metrics into an interactive interface so YOU can explore it!
๐https://vis.mit.edu/abstraction-alignment/
Aggregating Abstraction Alignment helps us understand a modelโs global behavior.
We developed metrics to support this:
โ๏ธ Abstraction match โ most aligned concepts
๐ก Concept co-confusion โ frequently confused concepts
๐บ๏ธ Subgraph preference โ preference for abstraction levels
Abstraction Alignment compares model behavior to human abstractions.
By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.
E.g., confusing oaks๐ณ with palms๐ด is more aligned than confusing oaks๐ณ with sharks๐ฆ.
Interpretability identifies models' learned concepts (wheels ๐).
But human reasoning is built on abstractions โ relationships between concepts that help us generalize (wheels ๐โ car ๐).
To measure alignment, we must test if models learn human-like concepts AND abstractions.
An overview of Abstraction Alignment, including its authors and links to the paper, demo, and code.
#CHI2025 paper on humanโAI alignment!๐งต
Models can learn the right concepts but still be wrong in how they relate them.
โจAbstraction Alignmentโจevaluates whether models learn human-aligned conceptual relationships.
It reveals misalignments in LLMs๐ฌ and medical datasets๐ฅ.
๐ arxiv.org/abs/2407.12543
Hey Julian โ thank you so much for putting this together! My research is on interpretability and Iโd love to be added.