Qingcheng Zeng (@qcznlp)

This paper is accepted at EACL 2026 main conference. Read and let me know how you think!

04.01.2026 21:36 👍 4 🔁 0 💬 0 📌 0

4️⃣ Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?

The first jump into Science of Science! We systematically investigated the NLP4SG landscape and quantified the proportion of work addressing social good concerns both within and beyond the ACL community. Preprint coming soon!

20.08.2025 20:47 👍 2 🔁 0 💬 0 📌 0

MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abili...

3️⃣ MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

By far, the most comprehensive multilingual benchmark for evaluating LLMs. Qwen 3 2507 is using this benchmark to evaluate multilingual ability!

Paper 3️⃣: arxiv.org/abs/2503.10497

20.08.2025 20:47 👍 0 🔁 0 💬 1 📌 0

Thinking Out Loud: Do Reasoning Models Know When They're Right? Large reasoning models (LRMs) have recently demonstrated impressive capabilities in complex reasoning tasks by leveraging increased test-time computation and exhibiting behaviors reminiscent of human-...

(3) Instruct models show much higher refusal rates than reasoning models. And reasoning models only show minimal accuracy in additional attempts.
(4) Thinking with images helps SO much in VLMs' calibration!

Paper1️⃣: arxiv.org/abs/2504.06564
Paper2️⃣: arxiv.org/abs/2505.20236

20.08.2025 20:47 👍 1 🔁 0 💬 1 📌 0

...whether reasoning models or vision language models express their confidence in a calibrated manner. Our findings are:
(1) SFT reasoning models usually lead to better calibration in in-distribution settings, and worse calibration in OOD settings.
(2) RL could help improve(recover) a bit.
...

20.08.2025 20:47 👍 0 🔁 0 💬 1 📌 0

Four papers accepted at the #EMNLP2025 main conference!
1️⃣ Thinking Out Loud: Do Reasoning Models Know When They’re Right?
2️⃣ Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models

In these two papers, we look into...

20.08.2025 20:47 👍 10 🔁 3 💬 1 📌 0

My bsky "Discover" tab is full of kittens and puppies. That makes my day.

11.08.2025 19:45 👍 2 🔁 0 💬 0 📌 0

Our work on pragmatic competence in LLMs was accepted for PragLM@COLM 2025. Preprint: arxiv.org/abs/2505.18497. Hope we can have someone in Montreal to tell you how much we love this work!

29.07.2025 18:45 👍 5 🔁 2 💬 0 📌 1

Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility Suet-Ying Lam, Qingcheng Zeng, Jingyi Wu, Rob Voigt. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2025.

I just gave a virtual presentation at ACL 2025 on our work about the production–interpretation asymmetry in reference processing in LLMs. If you’re into computational psycholinguistics or the LLMs x cognitive science space, give it a read!
aclanthology.org/2025.acl-sho...
@robvoigt.bsky.social

29.07.2025 16:21 👍 9 🔁 3 💬 0 📌 0

Thanks for reading!! Any feedback will be greatly appreciated if you happen to have🫡

21.07.2025 03:16 👍 0 🔁 0 💬 0 📌 0

Fascinating work! If you're open to talk, I’d love to chat sometime about the broader potential of LLMs in social science.

07.05.2025 20:04 👍 1 🔁 0 💬 0 📌 0

Qingcheng Zeng

Latest posts by Qingcheng Zeng @qcznlp