All in all, it's hard to say how practically feasible it is to obtain without substantial leakage. Fortunately, there are many free parameters that can be tweaked here, and many variants to consider.
All in all, it's hard to say how practically feasible it is to obtain without substantial leakage. Fortunately, there are many free parameters that can be tweaked here, and many variants to consider.
This allows an (approximate) causal variant of training data attribution -- understanding which data points contributed to the emergence of a capability!
A major advantage of this method over other methods is that it allows ⏳"time travel"⏳
Because we can trace which params were influenced by a data point, we can ablate or manipulate them!
The idea is related to locality-sensitive hashing (LSH) that sends similar vectors to close buckets. We train the model with a dropout mask that depends on the semantics of the input ("semantic dropout masks") to accomplish that.
In this work, I present a *sketch* of an idea around this. Instead of allocating inputs to rigid groups, we aim for fuzzy membership, such that semantically similar inputs update related subsets of the parameters.
For example, gradient routing partitions data points into disjoint groups and updates only a certain region in the network for each group. This method, as well as others, is limited to a predefined set of localizations.
🚧 New blopost!! 🚧
📝 "Localization by design via semantic dropout masks"
Many recent works try to localize model behaviors to params and intervene upon them. Acknowledging how hard it is to do after training, several works have tried to train models that allow localization.
What's in an attention head? 🤯
We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨
A new preprint with Amit Elhelo 🧵 (1/10)