Guy Dar (@guydar) — bluesky.baby

All in all, it's hard to say how practically feasible it is to obtain without substantial leakage. Fortunately, there are many free parameters that can be tweaked here, and many variants to consider.

12.01.2025 16:52 👍 0 🔁 0 💬 0 📌 0

This allows an (approximate) causal variant of training data attribution -- understanding which data points contributed to the emergence of a capability!

12.01.2025 16:52 👍 0 🔁 0 💬 1 📌 0

A major advantage of this method over other methods is that it allows ⏳"time travel"⏳
Because we can trace which params were influenced by a data point, we can ablate or manipulate them!

12.01.2025 16:51 👍 0 🔁 0 💬 1 📌 0

The idea is related to locality-sensitive hashing (LSH) that sends similar vectors to close buckets. We train the model with a dropout mask that depends on the semantics of the input ("semantic dropout masks") to accomplish that.

12.01.2025 16:51 👍 0 🔁 0 💬 1 📌 0

In this work, I present a *sketch* of an idea around this. Instead of allocating inputs to rigid groups, we aim for fuzzy membership, such that semantically similar inputs update related subsets of the parameters.

12.01.2025 16:50 👍 0 🔁 0 💬 1 📌 0

For example, gradient routing partitions data points into disjoint groups and updates only a certain region in the network for each group. This method, as well as others, is limited to a predefined set of localizations.

12.01.2025 16:50 👍 0 🔁 0 💬 1 📌 0

Localization By Design via Semantic Dropout Masks Sketch of an idea for a novel and stronger localization by design

🚧 New blopost!! 🚧

📝 "Localization by design via semantic dropout masks"

Many recent works try to localize model behaviors to params and intervene upon them. Acknowledging how hard it is to do after training, several works have tried to train models that allow localization.

12.01.2025 16:49 👍 0 🔁 0 💬 1 📌 0

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with Amit Elhelo 🧵 (1/10)

18.12.2024 17:55 👍 60 🔁 13 💬 1 📌 0

Guy Dar

Latest posts by Guy Dar @guydar