This paper is accepted at EACL 2026 main conference. Read and let me know how you think!
This paper is accepted at EACL 2026 main conference. Read and let me know how you think!
4οΈβ£ Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?
The first jump into Science of Science! We systematically investigated the NLP4SG landscape and quantified the proportion of work addressing social good concerns both within and beyond the ACL community. Preprint coming soon!
3οΈβ£ MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
By far, the most comprehensive multilingual benchmark for evaluating LLMs. Qwen 3 2507 is using this benchmark to evaluate multilingual ability!
Paper 3οΈβ£: arxiv.org/abs/2503.10497
(3) Instruct models show much higher refusal rates than reasoning models. And reasoning models only show minimal accuracy in additional attempts.
(4) Thinking with images helps SO much in VLMs' calibration!
Paper1οΈβ£: arxiv.org/abs/2504.06564
Paper2οΈβ£: arxiv.org/abs/2505.20236
...whether reasoning models or vision language models express their confidence in a calibrated manner. Our findings are:
(1) SFT reasoning models usually lead to better calibration in in-distribution settings, and worse calibration in OOD settings.
(2) RL could help improve(recover) a bit.
...
Four papers accepted at the #EMNLP2025 main conference!
1οΈβ£ Thinking Out Loud: Do Reasoning Models Know When Theyβre Right?
2οΈβ£ Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models
In these two papers, we look into...
My bsky "Discover" tab is full of kittens and puppies. That makes my day.
Our work on pragmatic competence in LLMs was accepted for PragLM@COLM 2025. Preprint: arxiv.org/abs/2505.18497. Hope we can have someone in Montreal to tell you how much we love this work!
I just gave a virtual presentation at ACL 2025 on our work about the productionβinterpretation asymmetry in reference processing in LLMs. If youβre into computational psycholinguistics or the LLMs x cognitive science space, give it a read!
aclanthology.org/2025.acl-sho...
@robvoigt.bsky.social
Thanks for reading!! Any feedback will be greatly appreciated if you happen to haveπ«‘
Fascinating work! If you're open to talk, Iβd love to chat sometime about the broader potential of LLMs in social science.