Rik van Noord (@rikvannoord)

I guess in the end the main problem of conference reviewing is that conferences limit acceptance at X% and that >X% of authors truly believe their paper should be accepted.

Review quality is also declining to the point of being ridiculous, but this problem will always remain.

04.12.2025 19:06 👍 1 🔁 0 💬 0 📌 0

Someone please change the fact that you have to answer no less than 7 questions about knowledge of author identity in the ARR reviewing form. #NLPproc

01.09.2025 14:24 👍 3 🔁 0 💬 0 📌 0

Author responses are so funny, really. They must be one of the weirdest texts out there. Like on the surface the authors are so polite and friendly, while also making it completely obvious that this is all for show and that they hate my guts.

#EMNLP2025

03.07.2025 18:12 👍 6 🔁 0 💬 0 📌 0

I like the strict policy. But I'm not sure how it can be enforced, like, at all. You either unfairly punish people that had actual life emergencies which made them unable to review. Or you can easily escape punishment by claiming an emergency.

23.06.2025 07:58 👍 2 🔁 0 💬 0 📌 0

Reviewing for EMNLP, and a paper that was clearly completely put through ChatGPT to the point that it's maddening, obtained two reviews that were also clearly put through ChatGPT to the point that the feedback seems meaningless.

Fair is fair, I guess.

18.06.2025 12:13 👍 5 🔁 0 💬 1 📌 0

Haha, wow. LLMs deserve criticism, but usefulness is the one area you can’t really argue against. Well, only if you want to ignore the experience of literally millions of people every day.

24.04.2025 17:49 👍 6 🔁 0 💬 1 📌 0

We only ran a few experiments, but the results clearly show that data quality matters (as expected), but also that stricter cleaning could be preferable.

Full paper here:

aclanthology.org/2025.coling-...

04.03.2025 13:09 👍 1 🔁 0 💬 0 📌 0

Nevertheless, one can still ask: does this actually matter for training NMT systems?

We might prefer lower-quality data if there is at least some more available.

04.03.2025 13:09 👍 0 🔁 0 💬 1 📌 0

Kreutzer et al. (2021) found issues at a glance, we show that they persist even beyond a glance.

This has serious implications for the state of the field, especially in judging how much good quality data there is actually available for certain language pairs.

04.03.2025 13:09 👍 0 🔁 0 💬 1 📌 0

For the two largest corpora, CCMatrix and CCAligned, around two-thirds (!) of the translations have serious issues.

Mostly these are issues of alignment: parts of sentences are often missing, or aligned to the wrong sentence. But issues can also be more serious.

04.03.2025 13:09 👍 0 🔁 0 💬 1 📌 0

Let me advertise our COLING2025 paper:

Quality Beyond A Glance: Revealing Large Quality Differences Between Web-Crawled Parallel Corpora

We manually evaluate 5 of the largest parallel corpora for 11 low(ish)-resource languages.

The results are quite concerning.

#NLProc #COLING2025 #ACL2025 #NMT

04.03.2025 13:09 👍 9 🔁 0 💬 1 📌 0

True. Still, the very small window they give, which also overlaps with a major holiday, for something so small…Come on now.

Plus of course the vague threat of witholding publication otherwise.

24.12.2024 06:36 👍 0 🔁 0 💬 0 📌 0

We can all be glad that #COLING2025 is making sure everybody is fixing these horrible formatting issues during Christmas. Just imagine what could have happened otherwise #NLProc

23.12.2024 20:27 👍 6 🔁 0 💬 1 📌 0

Definitely possible, but I'd say it's just one of the many ways one can be unlucky during peer review.

25.11.2024 13:16 👍 1 🔁 0 💬 0 📌 0

If I ever miss an ARR reviewing deadline, it will surely not be due to a lack of emails about it.

19.11.2024 14:03 👍 7 🔁 0 💬 0 📌 0

Yes, they will. Which is also an unfortunate side effect of these developments.

19.11.2024 13:54 👍 0 🔁 0 💬 0 📌 0

And don't forget: highlight, enhance, underscore, tailor and leverage. I can basically just look at this subset of words to know if students have used ChatGPT. Completely transparent at this point.

19.11.2024 11:15 👍 0 🔁 0 💬 1 📌 0

Rik van Noord

Latest posts by Rik van Noord @rikvannoord