Xinpeng Wang's Avatar

Xinpeng Wang

@xinpeng

PhD student @LMU. Eval & LLM Alignment. https://xinpeng-wang.github.io/

61
Followers
97
Following
1
Posts
19.11.2024
Joined
Posts Following

Latest posts by Xinpeng Wang @xinpeng

Post image

Reunion in Singapore!πŸ‡ΈπŸ‡¬ @barbaraplank.bsky.social, @xinpeng.bsky.social, who's currently on a research stay at NYU, and Chengzhi are presenting their work at @iclr-conf.bsky.social

24.04.2025 08:34 πŸ‘ 19 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0

Upcoming ICLR 2025 paper: βœ‚οΈ Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

We propose a surgical & flexible approach to mitigate false refusal in LLMs with minimal effect on performance and inference cost

led by @xinpeng.bsky.social (1/2)

15.04.2025 21:37 πŸ‘ 10 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2
The hand-drawn sign from three years ago.

The hand-drawn sign from three years ago.

πŸŽ‰MaiNLP is turning 3 today!πŸŽ‚πŸ₯³ We’ve grown a lot since @barbaraplank.bsky.social started this group with nothing but three aspiring researches and a hand-drawn sign on the door. Huge thanks to all the amazing people who have joined or visited us since. Here’s to many more years of exciting research!πŸš€

01.04.2025 10:40 πŸ‘ 20 πŸ” 9 πŸ’¬ 1 πŸ“Œ 2
Post image

I’m thrilled to share that our paper on mitigating false refusal in language models has been accepted to ICLR 2025 @iclr-conf.bsky.social!

arxiv.org/abs/2410.03415

Joint work with chengzhi, @paul-rottger.bsky.social, @barbaraplank.bsky.social.

23.01.2025 21:34 πŸ‘ 8 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0