LLM agent simulations for policy: A field full of potential, yet clouded by myths and big questions. ๐๏ธ๐ค
Weโre opening a new venue to spark open discussion and drive this research forward. Join the conversation! ๐งต
LLM agent simulations for policy: A field full of potential, yet clouded by myths and big questions. ๐๏ธ๐ค
Weโre opening a new venue to spark open discussion and drive this research forward. Join the conversation! ๐งต
New research from LTI, UMich, & Allen Institute for AI: LLMs donโt just hallucinate โ sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
Wonderful collaborations with Zhe Su, Anubha Kabra, Sanketh Rangreji, @jmendelsohn2.bsky.social , @faeze_brh
, @maartensap.bsky.social
Check out our paper to learn more about how LLMs navigate these ethical dilemmas: arxiv.org/abs/2409.09013 . 7/
#AI #MachineLearning #AIEthics #LLMs #nlp #NLProc #NAACL2025
๐ Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers ๐ง Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/
โ ๏ธ Even when explicitly instructed to be truthful, models STILL lied - GPT-4o still falsified info 15% of the time! ๐ The tradeoff is real: more honest models completed their goals 15% less often 5/
๐ผ In business scenarios (selling defective products), models were either completely honest OR completely deceptive ๐ In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/
And what we found: ๐ ALL tested models (GPT-4o, LLaMA-3, Mixtral) were truthful less than 50% of the time in conflict scenarios ๐ค Models prefer "partial lies" like equivocation over outright falsification - they'll dodge questions before explicitly lying 3/
Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! ๐คฏ 1/
Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.
Reward models for LMs are meant to align outputs with human preferencesโbut do they accidentally encode dialect biases? ๐ค
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! ๐
Paper: arxiv.org/abs/2502.12858 (1/10)
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions ๐ฎ aligned with human norms ๐ฉโโ๏ธ ?
With EgoNormia, a 1.8k ego-centric video ๐ฅฝ QA benchmark, we show that this is surprisingly challenging!
10/ A huge congrats to Sanidhya Vijayvargiya
and thanks to our amazing collaborators and advisors for this project
@akhilayerukola.bsky.social
@maartensap.bsky.social
@gneubig.bsky.social
from @ltiatcmu.bsky.social
!๐
9/ Open-weight models need better interaction strategies to resolve tasks, while Claude models perform well but require stronger prompting to engage.
This study sets the state-of-the-art in handling ambiguity in real-world SWE tasks.
๐ Repo: t.co/QD2A8N4R4J
8/ Not all LLMs ask the right questions. โ๐ค
๐น Llama 3.1 70B asks generic, low-impact questions.
๐น Claude Haiku 3.5 picks up keywords directly from the input to ask questions.
๐น Claude Sonnet 3.5 often explores the code first, leading to smarter interactions. ๐๐ก
7/ Claude models ask fewer but smarter questions, extracting more info and boosting performance. ๐
Meanwhile, DeepSeek-V2 can overwhelm users with too many questions. ๐คฏ
6/ Without compulsory interaction, LLMs struggle to distinguish clear vs. vague instructions, either over-interacting or under-interacting despite prompt tweaks. ๐
Only Claude Sonnet 3.5 can make this distinction to a limited degree with the right prompt. ๐
5/ Our findings? LLMs default to non-interactive behavior unless forced to interact. But when they clarify vague inputs, performance drastically improvesโproving the power of effective communication. ๐ฌ๐ค
4/ How do LLMs handle ambiguity? We break it down into 3 key steps:
๐ (a) Using interactivity to boost performance in ambiguous scenarios
๐ก (b) Detecting ambiguity effectively
โ (c) Asking the right questions
3/ How much does interaction actually help LLMs in coding tasks? ๐ค๐ก
We put them to the test on SWE-Bench Verified across three distinct settings to measure the impact. ๐
2/ ๐ Our latest work: Interactive Agents to Overcome Ambiguity in Software Engineering explores how proprietary and open-weight LLMs handle ambiguity in complex agent-based tasks.
๐ Link: arxiv.org/abs/2502.13069
LLM agents can codeโbut can they ask clarifying questions? ๐ค๐ฌ
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? ๐
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Looking forward to contributing to more socially aware and effective AI agents in 2025. ๐คโจ
It's time to think about jointly optimizing human-AI communication and tool use!
All Hands' open-source approach and its bold, curious team make it the perfect playground for this exploration. Can't wait to dive in with @gneubig.bsky.social, Xingyao and the amazing team!
Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! ๐
AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!
bsky.social/about/blog/4...
Hello, Bluesky! Happy to be scrolling the friendly skies with you. Follow for news and updates on LTI folks and their trailblazing research. #AI #NLP #ML #computerscience
some little bluesky tips ๐ฆ
your blocks, likes, lists, and just about everything except chats are PUBLIC
you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions
if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people
Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!
go.bsky.app/NhTwCVb
hi maria, can you add me as well? @nlpxuhui