Gonçalo Paulo's Avatar

Gonçalo Paulo

@goncalo-paulo

Interpretability researcher at @eleutherai.bsky.social

123
Followers
98
Following
2
Posts
27.11.2024
Joined
Posts Following

Latest posts by Gonçalo Paulo @goncalo-paulo

Are the codeforces elo results not interesting?

25.12.2024 22:20 👍 0 🔁 0 💬 1 📌 0

We just updated the ArXiv version!

04.12.2024 17:34 👍 3 🔁 0 💬 0 📌 0
Post image

*Automatically Interpreting Millions of Features in LLMs*
by @norabelrose.bsky.social et al.

An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.

arxiv.org/abs/2410.13928

27.11.2024 14:58 👍 27 🔁 6 💬 0 📌 2