Home New Trending Search
About Privacy Terms
#
#SparseAutoencoders
Posts tagged #SparseAutoencoders on Bluesky
Interpretability vs Utility in Sparse Autoencoders for LLM Steering

Interpretability vs Utility in Sparse Autoencoders for LLM Steering

90 SAEs on three LLMs gave a modest rank‑correlation (tau‑b ≈ 0.298) between interpretability and steering, and Delta Token Confidence boosted performance by ~52.5%. Read more: getnews.me/interpretability-vs-util... #sparseautoencoders #llmsteering

0 0 0 0
Sparse Autoencoders Reveal Biases in GPT Model Trained on Austen

Sparse Autoencoders Reveal Biases in GPT Model Trained on Austen

A GPT‑style model trained only on Jane Austen’s novels was examined with sparse autoencoders, revealing neurons linked to gender, class and duty, highlighting bias. Read more: getnews.me/sparse-autoencoders-reve... #gpt #sparseautoencoders #austen

1 0 0 0
Sparse Autoencoder Top‑1 Steering Improves Language Model Reasoning

Sparse Autoencoder Top‑1 Steering Improves Language Model Reasoning

Researchers showed that steering a language model with the top‑1 latent from a sparse autoencoder and signal improves math reasoning, matching MAD baseline. 24 Sep 2025. Read more: getnews.me/sparse-autoencoder-top-1... #sparseautoencoders #reasoning

0 0 0 0
AbsTopK Improves Sparse Autoencoders for Bidirectional Features

AbsTopK Improves Sparse Autoencoders for Bidirectional Features

AbsTopK, a sparse autoencoder that keeps both positive and negative activations, lets one unit capture opposite concepts. Posted on arXiv (2510.00404) Oct 2025. Read more: getnews.me/abstopk-improves-sparse-... #abstopk #sparseautoencoders

0 0 0 0
OrtSAE: Orthogonal Sparse Autoencoders Boost Feature Discovery

OrtSAE: Orthogonal Sparse Autoencoders Boost Feature Discovery

OrtSAE finds about 9% more distinct features and reduces feature absorption by roughly 65% in tests released September 2025, while adding only minimal computational overhead. getnews.me/ortsae-orthogonal-sparse... #ortsae #sparseautoencoders

0 0 0 0
Safe‑SAIL framework maps safety risks in large language models

Safe‑SAIL framework maps safety risks in large language models

Researchers introduced Safe‑SAIL, a framework using Sparse Autoencoders to locate safety‑related neurons in large language models, released as a public audit toolkit. Read more: getnews.me/safe-sail-framework-maps... #safesail #sparseautoencoders

0 0 0 0
Sparse Autoencoders Improve Vision Model Interpretability

Sparse Autoencoders Improve Vision Model Interpretability

Sparse autoencoders applied to vision models produce meaningful features, improve out‑of‑distribution performance and enable semantic steering of diffusion image generation. Read more: getnews.me/sparse-autoencoders-impr... #sparseautoencoders #visionai

0 0 0 0
Preview
Automated Framework Enhances Neural Network Interpretability With Scalable Explanations Researchers developed an automated system using large language models to interpret millions of features in sparse autoencoders, making deep neural networks more understandable and scalable.

🚀🔍🤖 Automated Framework Enhances Neural Network Interpretability With Scalable Explanations www.azoai.com/news/2024102... #AI #MachineLearning #DeepLearning #NeuralNetworks #ModelInterpretability #SparseAutoencoders #LLMs #DataScience #Explainability #Research @arxiv-stat-ml.bsky.social

0 0 0 0