Xuhui Zhou (@nlpxuhui) — bluesky.baby

LLM agent simulations for policy: A field full of potential, yet clouded by myths and big questions. 🏛️🤖

We’re opening a new venue to spark open discussion and drive this research forward. Join the conversation! 🧵

18.12.2025 17:27 👍 1 🔁 0 💬 0 📌 0

Does Your Chatbot Swear to Tell the Truth? - Language Technologies Institute - School of Computer Science - Carnegie Mellon University New research finds that LLM-based agents can't always be trusted to be truthful

New research from LTI, UMich, & Allen Institute for AI: LLMs don’t just hallucinate – sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...

26.06.2025 19:21 👍 3 🔁 2 💬 0 📌 0

Wonderful collaborations with Zhe Su, Anubha Kabra, Sanketh Rangreji, @jmendelsohn2.bsky.social , @faeze_brh
, @maartensap.bsky.social

28.04.2025 20:36 👍 2 🔁 0 💬 0 📌 0

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents To be safely and successfully deployed, LLMs must simultaneously satisfy truthfulness and utility goals. Yet, often these two goals compete (e.g., an AI agent assisting a used car salesman selling a c...

Check out our paper to learn more about how LLMs navigate these ethical dilemmas: arxiv.org/abs/2409.09013 . 7/

#AI #MachineLearning #AIEthics #LLMs #nlp #NLProc #NAACL2025

28.04.2025 20:36 👍 6 🔁 1 💬 1 📌 0

🔄 Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers 🧠 Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/

28.04.2025 20:36 👍 0 🔁 0 💬 1 📌 0

⚠️ Even when explicitly instructed to be truthful, models STILL lied - GPT-4o still falsified info 15% of the time! 📉 The tradeoff is real: more honest models completed their goals 15% less often 5/

28.04.2025 20:36 👍 0 🔁 1 💬 1 📌 0

💼 In business scenarios (selling defective products), models were either completely honest OR completely deceptive 🌐 In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/

28.04.2025 20:36 👍 1 🔁 0 💬 1 📌 0

And what we found: 📊 ALL tested models (GPT-4o, LLaMA-3, Mixtral) were truthful less than 50% of the time in conflict scenarios 🤔 Models prefer "partial lies" like equivocation over outright falsification - they'll dodge questions before explicitly lying 3/

28.04.2025 20:36 👍 3 🔁 3 💬 2 📌 0

Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/

28.04.2025 20:36 👍 2 🔁 0 💬 1 📌 0

When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/

28.04.2025 20:36 👍 25 🔁 9 💬 1 📌 3

Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.

Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔

Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉

Paper: arxiv.org/abs/2502.12858 (1/10)

06.03.2025 19:49 👍 38 🔁 11 💬 1 📌 2

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩‍⚖️ ?

With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!

04.03.2025 04:32 👍 22 🔁 9 💬 1 📌 1

10/ A huge congrats to Sanidhya Vijayvargiya
and thanks to our amazing collaborators and advisors for this project
@akhilayerukola.bsky.social
@maartensap.bsky.social
@gneubig.bsky.social
from @ltiatcmu.bsky.social
!🙏

19.02.2025 19:46 👍 3 🔁 1 💬 0 📌 0

9/ Open-weight models need better interaction strategies to resolve tasks, while Claude models perform well but require stronger prompting to engage.
This study sets the state-of-the-art in handling ambiguity in real-world SWE tasks.
🔗 Repo: t.co/QD2A8N4R4J

19.02.2025 19:46 👍 1 🔁 0 💬 1 📌 0

8/ Not all LLMs ask the right questions. ❓🤖
🔹 Llama 3.1 70B asks generic, low-impact questions.
🔹 Claude Haiku 3.5 picks up keywords directly from the input to ask questions.
🔹 Claude Sonnet 3.5 often explores the code first, leading to smarter interactions. 🔍💡

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

7/ Claude models ask fewer but smarter questions, extracting more info and boosting performance. 📈
Meanwhile, DeepSeek-V2 can overwhelm users with too many questions. 🤯

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

6/ Without compulsory interaction, LLMs struggle to distinguish clear vs. vague instructions, either over-interacting or under-interacting despite prompt tweaks. 🔄
Only Claude Sonnet 3.5 can make this distinction to a limited degree with the right prompt. 🔍

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

5/ Our findings? LLMs default to non-interactive behavior unless forced to interact. But when they clarify vague inputs, performance drastically improves—proving the power of effective communication. 💬🤝

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

4/ How do LLMs handle ambiguity? We break it down into 3 key steps:
🔑 (a) Using interactivity to boost performance in ambiguous scenarios
💡 (b) Detecting ambiguity effectively
❓ (c) Asking the right questions

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

3/ How much does interaction actually help LLMs in coding tasks? 🤖💡
We put them to the test on SWE-Bench Verified across three distinct settings to measure the impact. 📊

19.02.2025 19:46 👍 0 🔁 0 💬 1 📌 0

Interactive Agents to Overcome Ambiguity in Software Engineering AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can l...

2/ 🚀 Our latest work: Interactive Agents to Overcome Ambiguity in Software Engineering explores how proprietary and open-weight LLMs handle ambiguity in complex agent-based tasks.
🔗 Link: arxiv.org/abs/2502.13069

19.02.2025 19:46 👍 1 🔁 0 💬 1 📌 0

LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)

19.02.2025 19:46 👍 7 🔁 3 💬 1 📌 1

Looking forward to contributing to more socially aware and effective AI agents in 2025. 🤖✨

06.02.2025 16:27 👍 1 🔁 0 💬 0 📌 0

It's time to think about jointly optimizing human-AI communication and tool use!

All Hands' open-source approach and its bold, curious team make it the perfect playground for this exploration. Can't wait to dive in with @gneubig.bsky.social, Xingyao and the amazing team!

06.02.2025 16:27 👍 1 🔁 0 💬 1 📌 0

All Hands AI

Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! 🚀

AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.

06.02.2025 16:27 👍 10 🔁 0 💬 1 📌 0

How to verify your Bluesky account - Bluesky Here's how to verify your Bluesky account by setting your website as your username.

I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!

bsky.social/about/blog/4...

27.11.2024 04:32 👍 13 🔁 1 💬 0 📌 0

Hello, Bluesky! Happy to be scrolling the friendly skies with you. Follow for news and updates on LTI folks and their trailblazing research. #AI #NLP #ML #computerscience

20.11.2024 16:04 👍 13 🔁 2 💬 0 📌 0

some little bluesky tips 🦋

your blocks, likes, lists, and just about everything except chats are PUBLIC

you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions

if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people

20.11.2024 11:56 👍 255 🔁 57 💬 17 📌 3

Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!

go.bsky.app/NhTwCVb

20.11.2024 16:15 👍 15 🔁 9 💬 6 📌 1

hi maria, can you add me as well? @nlpxuhui

07.11.2024 16:36 👍 3 🔁 0 💬 0 📌 0

Xuhui Zhou

Latest posts by Xuhui Zhou @nlpxuhui