Marcel Bollmann's Avatar

Marcel Bollmann

@marcel.bollmann.me

Associate professor at @liu.se πŸ‡ΈπŸ‡ͺ, site development lead for @aclanthology.org, editor-in-chief at @nejlt.bsky.social. Mildly obscure #NLP researcher. I like coffee and board games. 🏠 https://marcel.bollmann.me/

1,717
Followers
354
Following
132
Posts
10.11.2024
Joined
Posts Following

Latest posts by Marcel Bollmann @marcel.bollmann.me

Should I start listing python 3.12 as a co-author? why not?

04.03.2026 12:18 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
This is a reminder that the meta-reviews are due TODAY 4 March AoE.

Please remember that if you don't sumit your meta-reviews on time you might be considered highly irresponsible, which means your co-authored papers may be desk rejected and you may become ineligible from committing to *CL conferences or (re-)submitting any work on the the next ARR cycle.

This is a reminder that the meta-reviews are due TODAY 4 March AoE. Please remember that if you don't sumit your meta-reviews on time you might be considered highly irresponsible, which means your co-authored papers may be desk rejected and you may become ineligible from committing to *CL conferences or (re-)submitting any work on the the next ARR cycle.

I don't understand why the ACL/ARR organizers think this is an appropriate way to communicate with area chairs... The peer review system is collapsing, and the whole system of science as we know it requires a rethink. But let's be kind to eachother in the process and not forget what we are here for.

04.03.2026 13:20 πŸ‘ 10 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0

Claude: For each example I can do a web search and then make a LLM call with the results...

Me: Why an LLM call? Can't you just figure it out yourself?

Claude: You're right, I am the LLM!

17.02.2026 00:15 πŸ‘ 44 πŸ” 7 πŸ’¬ 2 πŸ“Œ 2

What a ridiculous timeline we are living in

12.02.2026 21:21 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ™ I invited you through ARR!

11.02.2026 14:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(Technically due on Saturday, but who wants to work then? πŸ˜‰)

11.02.2026 13:12 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🚨 Emergency reviewer needed for ARR Resources and Evaluation track! Please ping me if you could review one paper by Friday. Topic is AI hallucinations, broadly speaking.

11.02.2026 11:10 πŸ‘ 1 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

God, something is wrong in the way ARR is handling stuff, I keep receiving message from pp almost begging to be removed from the author list bc they can't keep up with the reviewing load, especially on a paper outside their area of expertise, and they don't want their student to be desk rejected

10.02.2026 22:08 πŸ‘ 7 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Good old times. @raghavian.bsky.social

10.02.2026 10:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
When AIs Talk To Themselves Drop me a πŸ’‘ on the linkedin post if this is interesting. OpenClaw/Clawdbot/Moltbot have sprung into view this week.Β  Now they are having large-scale conversations and collaborations with each …

I strongly recommend this blog post by Ben Vigoda: www.benvigoda.com/2026/02/01/w...

03.02.2026 06:50 πŸ‘ 19 πŸ” 7 πŸ’¬ 1 πŸ“Œ 3

Today, the ACL Anthology switched to a new system for how author pages work. From now on, ORCID iDs will be the main mechanism for matching papers to the correct author. 🧡‡️

26.01.2026 14:23 πŸ‘ 9 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0

How have we as a society still not internalized that critical data should never live in just a single place?!

23.01.2026 08:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Trying to learn to be better at that.

30.11.2025 21:08 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

I’ve spent the last two days looking at this message at least 30 times, I’m getting ready to sue for psychological distress at this point.

28.11.2025 10:19 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

This is terrifying.

"[AI agents] can... infer a researcher's latent hypotheses and produce data that artificially confirms them."

...

"We can no longer trust that survey responses are coming from real people" -@seanjwestwood.bsky.social

18.11.2025 21:03 πŸ‘ 312 πŸ” 121 πŸ’¬ 7 πŸ“Œ 17

Wordle 1Β 609 2/6*

⬜🟨⬜⬜⬜
🟩🟩🟩🟩🟩

My mind is clearly dominated by weird words.

14.11.2025 22:47 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“’ Open Positions at the Uppsala NLP Group! πŸ“’

Postdoc opportunity β€” also open to recent or soon-to-be PhD graduates (within 1–2 months).
uu.varbi.com/en/what:job/...

04.11.2025 10:29 πŸ‘ 5 πŸ” 6 πŸ’¬ 0 πŸ“Œ 1

Every time.

14.10.2025 11:57 πŸ‘ 450 πŸ” 30 πŸ’¬ 6 πŸ“Œ 0

I often long for a place to just post whimsical personal updates for friends, but that kind of place doesn’t exist anymore. In my personal bubble, social media has long become too fragmented and/or abandoned for that purpose.

03.10.2025 13:34 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Your dataset looks very cool, but I don't understand why you say β€œno Arabizi-specific metric or resource exists for our dialect selection”? When you contacted me, it seemed to me that you were aware of my work on Arabizi (e.g., [1,2], not to mention the cross-lingual work with Maltese [2] or character-based language models for Arabizi [4]). One of the crucial points of this work was also to propose translations into French from Algerian Arabizi, which could have helped you use a ground truth for your translation models. I'll be honest with you, I find it extremely discouraging to see that pioneering work in the processing of a language with such limited resources as Algerian Arabic dialect is not cited, even though it has been published in the major conference in the field and the data is freely available (unlike the vast majority of dialectal resources for Arabic). If even colleagues working on the same language don't find it necessary to cite us, what's the point of investing so much time and money in this type of work?

In short, I hope your work doesn't encounter the same pitfalls.


[1] https://www.aclweb.org/anthology/2020.acl-main.107.pdf
[2] https://arxiv.org/abs/2306.14866
[3] https://arxiv.org/abs/2005.00318
[4] https://arxiv.org/abs/2110.13658

(deepL translated, from French)

Your dataset looks very cool, but I don't understand why you say β€œno Arabizi-specific metric or resource exists for our dialect selection”? When you contacted me, it seemed to me that you were aware of my work on Arabizi (e.g., [1,2], not to mention the cross-lingual work with Maltese [2] or character-based language models for Arabizi [4]). One of the crucial points of this work was also to propose translations into French from Algerian Arabizi, which could have helped you use a ground truth for your translation models. I'll be honest with you, I find it extremely discouraging to see that pioneering work in the processing of a language with such limited resources as Algerian Arabic dialect is not cited, even though it has been published in the major conference in the field and the data is freely available (unlike the vast majority of dialectal resources for Arabic). If even colleagues working on the same language don't find it necessary to cite us, what's the point of investing so much time and money in this type of work? In short, I hope your work doesn't encounter the same pitfalls. [1] https://www.aclweb.org/anthology/2020.acl-main.107.pdf [2] https://arxiv.org/abs/2306.14866 [3] https://arxiv.org/abs/2005.00318 [4] https://arxiv.org/abs/2110.13658 (deepL translated, from French)

Just found out that yet another paper on North African Arabizi didn't find our work worth citing. they even wrote "No Arabizi-specific metric or resource exists for our dialect selection Β». We were the first to release an annotated dataset for this dialect, published at acl and shit. Discouraging.

03.10.2025 09:04 πŸ‘ 18 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0

πŸ“„ New article published:

β€œControlling Language and Style of Multi-lingual Generative Language Models with Control Vectors” by Julius Leino & Jussi Karlgren

nejlt.ep.liu.se/article/view...

29.09.2025 14:01 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
I'm conducting research on how ACL's peer reviewing policies impact NLP research quality, career trajectories, and inclusivity within our community. Your insightsβ€”whether you're a seasoned reviewer, early-career researcher, or anywhere in betweenβ€”are invaluable.
The survey takes 7-10 minutes and covers topics like review quality, reviewer assignment, and accessibility barriers. All responses are confidential and will help inform evidence-based improvements to our peer review processes.

I'm conducting research on how ACL's peer reviewing policies impact NLP research quality, career trajectories, and inclusivity within our community. Your insightsβ€”whether you're a seasoned reviewer, early-career researcher, or anywhere in betweenβ€”are invaluable. The survey takes 7-10 minutes and covers topics like review quality, reviewer assignment, and accessibility barriers. All responses are confidential and will help inform evidence-based improvements to our peer review processes.

I'm conducting research on how ACL's peer review policies impact NLP research quality, career trajectories, and inclusivity within our community. I am running a survey, which would take around 7-10 mins to complete: forms.cloud.microsoft/e/j2jr9nH3X0

I would really appreciate insights from y'all!

25.09.2025 14:23 πŸ‘ 6 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0

remove the label

18.09.2025 14:33 πŸ‘ 5 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“’Life updateπŸ“’

πŸ₯³I'm excited to share that I've started as a postdoc at Uppsala University NLP @uppsalanlp.bsky.social, working with Joakim Nivre on topics related to constructions and multilinguality!

πŸ™Many thanks to the Walter Benjamin Programme of the DFG for making this possible.

15.09.2025 15:10 πŸ‘ 29 πŸ” 2 πŸ’¬ 3 πŸ“Œ 1

I need a gym without any people at all, that would motivate me

15.09.2025 11:23 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 πŸ‘ 303 πŸ” 106 πŸ’¬ 6 πŸ“Œ 23
Post image

Never ask a man his age, a woman her salary, or GPT-5 whether a seahorse emoji exists

06.09.2025 13:08 πŸ‘ 2102 πŸ” 423 πŸ’¬ 95 πŸ“Œ 79
Post image

OpenAI is discovering what every social media company has also discovered: content moderation is hard and AI content moderation is also hard.

04.09.2025 01:48 πŸ‘ 73 πŸ” 9 πŸ’¬ 3 πŸ“Œ 5

Why does every social media feed eventually end up looking like:

[outrageous thing happening in the US]
[extremely polarizing AI take]
[random semi-funny meme]
[shocking thing happened to person I don't know]
[yet another reason climate change is worse than we thought]

It's so emotionally tiring.

29.08.2025 15:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Idk who needs to hear this, but you do *not* need to glaze the reviewers of your papers when you respond to their feedback.

Be grateful, sure, but don't wax poetic about how insightful and magical their farts are.

It's just professional correspondence, my guy.

29.08.2025 10:33 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0