Dilyara Bareeva's Avatar

Dilyara Bareeva

@dilya

PhD Candidate in Interpretability @FraunhoferHHI | ๐Ÿ“Berlin, Germany dilyabareeva.github.io

784
Followers
500
Following
5
Posts
17.01.2024
Joined
Posts Following

Latest posts by Dilyara Bareeva @dilya

Manipulating Feature Visualizations with Gradient Slingshots Feature Visualization (FV) is a widely used technique for interpreting concepts learned by Deep Neural Networks (DNNs), which synthesizes input patterns that maximally activate a given feature....

Paper: openreview.net/forum?id=Tgc...
Code: github.com/dilyabareeva...

29.11.2025 16:38 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Huge thanks to my fantastic co-authors Marina MC Hรถhne, Alexander Warnecke, @lpirch.bsky.social, Klaus-Robert Mรผller, @rieck.mlsec.org, @slapuschkin.bsky.social, @kirillbykov.bsky.social, and to the UMI Lab, @aifraunhoferhhi.bsky.social, @xai-berlin.bsky.social and @bifold.berlin for the support!

29.11.2025 16:38 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Our lightweight adversarial fine-tuning attack lets you bend a feature to visualize any arbitrary concept. Off-manifold, we impose a hyperbolic activation landscape with its optimum at the target, while preserving on-distribution activations through a weighted two-term loss. ๐Ÿ•ต๏ธโ€โ™€๏ธ

29.11.2025 16:38 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

โœˆ๏ธ๐Ÿ‡ฒ๐Ÿ‡ฝ Next Wednesday (Dec 3), 1โ€“4 p.m. CST, Iโ€™ll be presenting Manipulating Feature Visualizations with Gradient Slingshots at NeurIPS 2025 in Mexico City!

Feature Visualization has long been a staple interpretability tool. Our work shows itโ€™s far from reliable! ๐Ÿšจ

29.11.2025 16:38 ๐Ÿ‘ 9 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - dilyabareeva/quanda: A toolkit for quantitative evaluation of data attribution methods. A toolkit for quantitative evaluation of data attribution methods. - dilyabareeva/quanda

Sadly, I wasnโ€™t able to make it to NeurIPS this year. For anyone attending, check out our quanda poster at the ATTRIB workshop tomorrow (Saturday) from 3 to 4:30 pm, presented by Galip รœmit Yolcu and Anna Hedstrรถm!

GitHub: github.com/dilyabareeva...
Paper: arxiv.org/abs/2410.07158

13.12.2024 08:01 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0