Women in AI Research - WiAIR (@wiair)

A single color in a prompt can change an LLM's prediction.

As Hila Gonen notes:
Likes yellow → school bus driver
Likes red → firefighter

Seen similar prompt sensitivity in LLMs?

#WiAIR_podcast 🎙️: youtu.be/Lsq3UzM8wIg

09.03.2026 18:02 👍 0 🔁 0 💬 0 📌 0

Happy International Women's Day, and happy birthday to #WiAIR!

#wiair_podcast

09.03.2026 02:58 👍 1 🔁 0 💬 0 📌 0

Does Liking Yellow Make You a School Bus Driver? Hidden Failures in LLMs, with Dr. Hila Gonen YouTube video by Women in AI Research WiAIR

By reusing embeddings already produced during generation, OMNIGUARD is ≈120× faster than the fastest baseline in their evaluation.
🎬YouTube: www.youtube.com/watch?v=Lsq3...
🎙️Spotify: open.spotify.com/show/51RJNlZ...
🍎Apple Podcasts: podcasts.apple.com/ca/podcast/w...
📄Paper: arxiv.org/pdf/2505.23856

06.03.2026 19:45 👍 0 🔁 0 💬 0 📌 0

OMNIGUARD shows strong harmfulness classification across 73 languages (including low-resource & cipher languages) and extends moderation to image and audio prompts. It is also sample-efficient, achieving strong performance with far less training data than some baselines. (4/5 🧵)

06.03.2026 19:43 👍 0 🔁 0 💬 1 📌 0

Key idea: U-Score identifies model layers whose embeddings align across languages or modalities (text↔translations, image↔captions, audio↔transcripts). A lightweight classifier trained on these embeddings generalizes across multilingual and multimodal settings. (3/5 🧵)

06.03.2026 19:43 👍 0 🔁 0 💬 1 📌 0

OMNIGUARD detects harmful prompts using internal representations of LLMs and MLLMs, without requiring a separate guard model. The classifier operates on embeddings from the base model, making the approach efficient but requiring access to internal representations. (2/5 🧵)

06.03.2026 19:43 👍 0 🔁 0 💬 1 📌 0

✨ How can we reliably detect harmful prompts across languages, images, and audio?
In our latest #WiAIR episode, we host Dr. Hila Gonen to discuss “OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities”. (1/5 🧵)

06.03.2026 19:42 👍 0 🔁 0 💬 1 📌 0

🎙️ 𝐍𝐞𝐰 #𝐖𝐢𝐀𝐈𝐑 𝐄𝐩𝐢𝐬𝐨𝐝𝐞 𝐎𝐮𝐭!

In the new #WiAIRpodcast episode with Hila Gonen, we talk about semantic leakage, interventional analysis of LLMs, and the line between bias, hallucination, and leakage.

📷 YouTube: youtu.be/Lsq3UzM8wIg

04.03.2026 18:52 👍 1 🔁 1 💬 0 📌 0

Subscribe on Youtube and never miss a new #WiAIR_podcast episode!
youtu.be/KKHu_BP5Mac

02.03.2026 17:04 👍 1 🔁 0 💬 0 📌 0

Remember "Lipstick on a Pig", where they showed that many embedding debiasing methods don't remove bias, just hide it.
In the upcoming #WiAIR episode, I speak with its author Hila Gonen about taking this further into LLMs: semantic leakage and other hidden failures.

02.03.2026 17:04 👍 1 🔁 0 💬 1 📌 0

🎙️ Our next #WiAIR_podcast guest: Hila Gonen!

Assistant Professor @cs.ubc.ca, she works at the intersection of NLP & ML, aiming to make LLMs responsible, reliable, and fair across languages and socio-demographic groups.

Stay tuned 🎧 www.youtube.com/@WomeninAIRe...

27.02.2026 17:00 👍 2 🔁 1 💬 0 📌 0

Reasoning traces look like explanations but are they? Letitia Parcalabescu argues that in reasoning LLMs anything leading to the right answer gets reinforced, even incoherent or emoji-filled traces.

🎬 Dive deeper in the full #WiAIR_podcast episode: youtube.com/watch?v=gzQi...

25.02.2026 21:43 👍 0 🔁 0 💬 0 📌 0

Faithfulness and Hallucinations in Reasoning Models, with Dr. Letitia Parcalabescu YouTube video by Women in AI Research WiAIR

🎧 Dive deeper in our conversation!
🎬 YouTube: www.youtube.com/watch?v=gzQi...
🎙️ Spotify: open.spotify.com/episode/5BmX...
🍎 Apple Podcasts: podcasts.apple.com/ca/podcast/f...
📄 Paper: arxiv.org/pdf/2512.11614

23.02.2026 17:29 👍 0 🔁 0 💬 0 📌 0

Across Llama-3.2 and Qwen3 models on SQuAD2.0, HotpotQA, and TriviaQA, the method preserves accuracy while improving groundedness, robustness, retriever performance, and reaching EIFcond ≥ 0.3. (4/5 🧵)

23.02.2026 17:29 👍 0 🔁 0 💬 1 📌 0

This training induces completeness, soundness, and emergent reject behavior without annotated unanswerable data. The authors derive mutual-information lower bounds and introduce the Explained Information Fraction (EIF). (3/5 🧵)

23.02.2026 17:29 👍 1 🔁 0 💬 1 📌 0

The paper reframes RAG as an interactive proof system. A generator (Arthur) is trained against supportive evidence from Merlin and adversarial context from Morgana, using ATMAN-based masking. (2/5 🧵)

23.02.2026 17:28 👍 0 🔁 0 💬 1 📌 0

✨ Can we give formal, information-theoretic guarantees against hallucinations in RAG systems?

In our latest #WiAIR episode, we host Dr. Letitia Parcalabescu to discuss "Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols". (1/5 🧵)

23.02.2026 17:19 👍 0 🔁 0 💬 1 📌 0

Faithfulness and Hallucinations in Reasoning Models, with Dr. Letitia Parcalabescu YouTube video by Women in AI Research WiAIR

🎬 Watch or listen to the full episode:

YouTube ▶️ youtu.be/gzQiDCG_j7A?...
Spotify 🎙 open.spotify.com/episode/5BmX...
Apple 🎧 podcasts.apple.com/ca/podcast/f...

#WiAIR #WomenInAI #AIResearch (8/8🧵)

18.02.2026 17:42 👍 0 🔁 0 💬 0 📌 0

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? Vision and language model (VLM) decoders are currently the best-performing architectures on multimodal tasks. Next to answers, they are able to produce natural language explanations, either in post-ho...

📄 Read the paper:
arxiv.org/abs/2404.18624 (7/8🧵)

18.02.2026 17:40 👍 0 🔁 0 💬 1 📌 0

⚠️ Why this matters
Chain-of-Thought explanations increase image usage — but they do not guarantee faithful reasoning.
A convincing explanation is not the same as a faithful one. (6/8🧵)

18.02.2026 17:39 👍 0 🔁 0 💬 1 📌 0

📊 Benchmark reality check.
On VALSE, decoders achieve strong results in easier pairwise settings, but still struggle with several linguistic phenomena in harder settings.
High scores don’t automatically mean strong multimodal grounding. (5/8🧵)

18.02.2026 17:39 👍 0 🔁 0 💬 1 📌 0

🔄 Self-consistency remains limited.
Using CC-SHAP, the paper shows that many VLM decoders are less self-consistent than LLMs — meaning the inputs driving the answer are not always the same as those driving the explanation. (4/8🧵)

18.02.2026 17:39 👍 0 🔁 0 💬 1 📌 0

🖼 Explanations use images more.
When generating explanations — especially in Chain-of-Thought (CoT) — image contributions increase significantly compared to answer generation.
Models rely more on visual signals when explaining than when answering. (3/8🧵)

18.02.2026 17:38 👍 1 🔁 0 💬 1 📌 0

📌 Key takeaways:

Answers are largely text-driven. Across VQA, GQA, MSCOCO & VALSE, tested 7B VLM decoders rely much more on text tokens than on image patches when generating answers.
Multimodal ≠ equally multimodal. (2/8🧵)

18.02.2026 17:38 👍 0 🔁 0 💬 1 📌 0

🧠 Do Vision & Language Decoders Use Images and Text Equally?
In our latest episode, we speak with Letitia Parcalabescu about her ICLR 2025 paper examining how vision–language *decoder* models use images and text — and how self-consistent their explanations really are. (1/8🧵)

18.02.2026 17:37 👍 2 🔁 1 💬 1 📌 1

That could be a perfect summary of our last episode with Letitia on #WiAIR_podcast.

Make sure you don't miss our interview:
🎬 YouTube: youtu.be/gzQiDCG_j7A

18.02.2026 02:08 👍 0 🔁 0 💬 0 📌 0

Faithfulness and Hallucinations in Reasoning Models, with Dr. Letitia Parcalabescu YouTube video by Women in AI Research WiAIR

If you want the full discussion on faithfulness, consistency & reasoning models — watch or listen below 👇
🎬 YouTube: youtu.be/gzQiDCG_j7A?...
🎙 Spotify: open.spotify.com/episode/5BmX...
🎧 Apple Podcasts: podcasts.apple.com/ca/podcast/f...
📄 Paper: aclanthology.org/anthology-fi... (8/8🧵)

13.02.2026 17:41 👍 0 🔁 0 💬 0 📌 0

They also propose CC-SHAP, a fine-grained metric comparing how input tokens contribute to the answer vs. the explanation — offering a more detailed self-consistency signal. 🧩✨ (7/8🧵)

13.02.2026 17:40 👍 0 🔁 0 💬 1 📌 0

CCB evaluates 11 open LLMs across 5 tasks, enabling direct comparison.
A key finding: different tests often disagree on the same model. ⚖️🔍 (6/8🧵)

13.02.2026 17:39 👍 0 🔁 0 💬 1 📌 0

To study this systematically, the authors introduce the Comparative Consistency Bank (CCB) — a unified benchmark evaluating multiple consistency/faithfulness tests under the same setup. 📚🧪 (5/8🧵)

13.02.2026 17:39 👍 0 🔁 0 💬 1 📌 0

Women in AI Research - WiAIR

Latest posts by Women in AI Research - WiAIR @wiair