Paper: openreview.net/forum?id=Tgc...
Code: github.com/dilyabareeva...
Paper: openreview.net/forum?id=Tgc...
Code: github.com/dilyabareeva...
Huge thanks to my fantastic co-authors Marina MC Hรถhne, Alexander Warnecke, @lpirch.bsky.social, Klaus-Robert Mรผller, @rieck.mlsec.org, @slapuschkin.bsky.social, @kirillbykov.bsky.social, and to the UMI Lab, @aifraunhoferhhi.bsky.social, @xai-berlin.bsky.social and @bifold.berlin for the support!
Our lightweight adversarial fine-tuning attack lets you bend a feature to visualize any arbitrary concept. Off-manifold, we impose a hyperbolic activation landscape with its optimum at the target, while preserving on-distribution activations through a weighted two-term loss. ๐ต๏ธโโ๏ธ
โ๏ธ๐ฒ๐ฝ Next Wednesday (Dec 3), 1โ4 p.m. CST, Iโll be presenting Manipulating Feature Visualizations with Gradient Slingshots at NeurIPS 2025 in Mexico City!
Feature Visualization has long been a staple interpretability tool. Our work shows itโs far from reliable! ๐จ
Sadly, I wasnโt able to make it to NeurIPS this year. For anyone attending, check out our quanda poster at the ATTRIB workshop tomorrow (Saturday) from 3 to 4:30 pm, presented by Galip รmit Yolcu and Anna Hedstrรถm!
GitHub: github.com/dilyabareeva...
Paper: arxiv.org/abs/2410.07158