Xuhui Zhou's Avatar

Xuhui Zhou

@nlpxuhui

PhD student @ltiatcmu.bsky.social. Previously, @ai2.bsky.social, @uwnlp.bsky.social, @appleinc.bsky.social, @ucberkeleyofficial.bsky.social; Social Intelligence in language +X. He/Him.๐Ÿณ

1,835
Followers
146
Following
23
Posts
07.11.2024
Joined
Posts Following

Latest posts by Xuhui Zhou @nlpxuhui

LLM agent simulations for policy: A field full of potential, yet clouded by myths and big questions. ๐Ÿ›๏ธ๐Ÿค–

Weโ€™re opening a new venue to spark open discussion and drive this research forward. Join the conversation! ๐Ÿงต

18.12.2025 17:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Does Your Chatbot Swear to Tell the Truth? - Language Technologies Institute - School of Computer Science - Carnegie Mellon University New research finds that LLM-based agents can't always be trusted to be truthful

New research from LTI, UMich, & Allen Institute for AI: LLMs donโ€™t just hallucinate โ€“ sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...

26.06.2025 19:21 ๐Ÿ‘ 3 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Wonderful collaborations with Zhe Su, Anubha Kabra, Sanketh Rangreji, @jmendelsohn2.bsky.social , @faeze_brh
, @maartensap.bsky.social

28.04.2025 20:36 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents To be safely and successfully deployed, LLMs must simultaneously satisfy truthfulness and utility goals. Yet, often these two goals compete (e.g., an AI agent assisting a used car salesman selling a c...

Check out our paper to learn more about how LLMs navigate these ethical dilemmas: arxiv.org/abs/2409.09013 . 7/

#AI #MachineLearning #AIEthics #LLMs #nlp #NLProc #NAACL2025

28.04.2025 20:36 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿ”„ Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers ๐Ÿง  Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/

28.04.2025 20:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

โš ๏ธ Even when explicitly instructed to be truthful, models STILL lied - GPT-4o still falsified info 15% of the time! ๐Ÿ“‰ The tradeoff is real: more honest models completed their goals 15% less often 5/

28.04.2025 20:36 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿ’ผ In business scenarios (selling defective products), models were either completely honest OR completely deceptive ๐ŸŒ In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/

28.04.2025 20:36 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

And what we found: ๐Ÿ“Š ALL tested models (GPT-4o, LLaMA-3, Mixtral) were truthful less than 50% of the time in conflict scenarios ๐Ÿค” Models prefer "partial lies" like equivocation over outright falsification - they'll dodge questions before explicitly lying 3/

28.04.2025 20:36 ๐Ÿ‘ 3 ๐Ÿ” 3 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Post image

Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/

28.04.2025 20:36 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! ๐Ÿคฏ 1/

28.04.2025 20:36 ๐Ÿ‘ 25 ๐Ÿ” 9 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 3
Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.

Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.

Reward models for LMs are meant to align outputs with human preferencesโ€”but do they accidentally encode dialect biases? ๐Ÿค”

Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! ๐ŸŽ‰

Paper: arxiv.org/abs/2502.12858 (1/10)

06.03.2025 19:49 ๐Ÿ‘ 38 ๐Ÿ” 11 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2
Video thumbnail

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions ๐ŸŽฎ aligned with human norms ๐Ÿ‘ฉโ€โš–๏ธ ?

With EgoNormia, a 1.8k ego-centric video ๐Ÿฅฝ QA benchmark, we show that this is surprisingly challenging!

04.03.2025 04:32 ๐Ÿ‘ 22 ๐Ÿ” 9 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

10/ A huge congrats to Sanidhya Vijayvargiya
and thanks to our amazing collaborators and advisors for this project
@akhilayerukola.bsky.social
@maartensap.bsky.social
@gneubig.bsky.social
from @ltiatcmu.bsky.social
!๐Ÿ™

19.02.2025 19:46 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

9/ Open-weight models need better interaction strategies to resolve tasks, while Claude models perform well but require stronger prompting to engage.
This study sets the state-of-the-art in handling ambiguity in real-world SWE tasks.
๐Ÿ”— Repo: t.co/QD2A8N4R4J

19.02.2025 19:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

8/ Not all LLMs ask the right questions. โ“๐Ÿค–
๐Ÿ”น Llama 3.1 70B asks generic, low-impact questions.
๐Ÿ”น Claude Haiku 3.5 picks up keywords directly from the input to ask questions.
๐Ÿ”น Claude Sonnet 3.5 often explores the code first, leading to smarter interactions. ๐Ÿ”๐Ÿ’ก

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

7/ Claude models ask fewer but smarter questions, extracting more info and boosting performance. ๐Ÿ“ˆ
Meanwhile, DeepSeek-V2 can overwhelm users with too many questions. ๐Ÿคฏ

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

6/ Without compulsory interaction, LLMs struggle to distinguish clear vs. vague instructions, either over-interacting or under-interacting despite prompt tweaks. ๐Ÿ”„
Only Claude Sonnet 3.5 can make this distinction to a limited degree with the right prompt. ๐Ÿ”

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

5/ Our findings? LLMs default to non-interactive behavior unless forced to interact. But when they clarify vague inputs, performance drastically improvesโ€”proving the power of effective communication. ๐Ÿ’ฌ๐Ÿค

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

4/ How do LLMs handle ambiguity? We break it down into 3 key steps:
๐Ÿ”‘ (a) Using interactivity to boost performance in ambiguous scenarios
๐Ÿ’ก (b) Detecting ambiguity effectively
โ“ (c) Asking the right questions

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

3/ How much does interaction actually help LLMs in coding tasks? ๐Ÿค–๐Ÿ’ก
We put them to the test on SWE-Bench Verified across three distinct settings to measure the impact. ๐Ÿ“Š

19.02.2025 19:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Interactive Agents to Overcome Ambiguity in Software Engineering AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can l...

2/ ๐Ÿš€ Our latest work: Interactive Agents to Overcome Ambiguity in Software Engineering explores how proprietary and open-weight LLMs handle ambiguity in complex agent-based tasks.
๐Ÿ”— Link: arxiv.org/abs/2502.13069

19.02.2025 19:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

LLM agents can codeโ€”but can they ask clarifying questions? ๐Ÿค–๐Ÿ’ฌ
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? ๐Ÿš€

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)

19.02.2025 19:46 ๐Ÿ‘ 7 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

Looking forward to contributing to more socially aware and effective AI agents in 2025. ๐Ÿค–โœจ

06.02.2025 16:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

It's time to think about jointly optimizing human-AI communication and tool use!

All Hands' open-source approach and its bold, curious team make it the perfect playground for this exploration. Can't wait to dive in with @gneubig.bsky.social, Xingyao and the amazing team!

06.02.2025 16:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
All Hands AI

Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! ๐Ÿš€

AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.

06.02.2025 16:27 ๐Ÿ‘ 10 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
How to verify your Bluesky account - Bluesky Here's how to verify your Bluesky account by setting your website as your username.

I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!

bsky.social/about/blog/4...

27.11.2024 04:32 ๐Ÿ‘ 13 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Hello, Bluesky! Happy to be scrolling the friendly skies with you. Follow for news and updates on LTI folks and their trailblazing research. #AI #NLP #ML #computerscience

20.11.2024 16:04 ๐Ÿ‘ 13 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

some little bluesky tips ๐Ÿฆ‹

your blocks, likes, lists, and just about everything except chats are PUBLIC

you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions

if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people

20.11.2024 11:56 ๐Ÿ‘ 255 ๐Ÿ” 57 ๐Ÿ’ฌ 17 ๐Ÿ“Œ 3

Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!

go.bsky.app/NhTwCVb

20.11.2024 16:15 ๐Ÿ‘ 15 ๐Ÿ” 9 ๐Ÿ’ฌ 6 ๐Ÿ“Œ 1

hi maria, can you add me as well? @nlpxuhui

07.11.2024 16:36 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0