Arianna Bisazza (@arianna-bis) — Bluesky Profile

Our work on contrastive SAE steering for personalizing literary machine translation was accepted to EACL main! 🎉 Check it out! ⬇️

2 months ago 16 2 0 1

(Tagging people who may have an opinion about this :))
@mdlhx.bsky.social @bjerva.bsky.social @wpoelman.bsky.social @estherploeger.bsky.social @tiedeman.bsky.social)

1 month ago 2 0 1 0

📢 Paper alert!

We know typological features can drive the difficulty of language modeling & machine translation in highly controlled setups (w/ relatively small monolingual models)

But do they also drive MT quality in the age of massively multilingual LLMs?

See @v-hirak.bsky.social’s thread ⬇️

1 month ago 11 1 1 0

Natural Language Processing How do you build Large Language Models? How do humans experience Natural Language Processing (NLP) applications in their daily lives? And how can we...

👀 Look what 🎅 has broght just before Christmas 🎁: a brand new Research Master in Natural Language Processing at @facultyofartsug.bsky.social @rug.nl

Program: www.rug.nl/masters/natu...

Applications (2026/2027) are open! Come and study with us (you will also learn why we have a 🐮 in our logo)

2 months ago 25 15 0 0

Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!

Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...

4 months ago 10 1 2 0

Interested in agent simulations of language change & pragmatic naming behavior?

Come check our poster TODAY (Fri, Nov 7, 12:30 - 13:30) #EMNLP!

4 months ago 7 1 0 0

Benchmarks of linguistic minimal pairs are key for LM evaluation & help us overcome the English-centric bias in NLP research

Come to our poster TODAY (Fr 7 Nov 10.30-12.00) #EMNLP to meet TurBLiMP, a new benchmark for Turkish, revealing how LLMs deal with free-order, morphologically rich languages

4 months ago 6 1 0 0

I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!

Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)

arxiv.org/abs/2504.02768

4 months ago 27 7 0 0

Interested in developmentally plausible LMs, and the role of child-directed language data?

Come to our poster TODAY (Fr 7 Nov, 10.30-12.00) #EMNLP!

4 months ago 8 2 0 0

Through repeated interactions & shifts in communication needs, the lexicon of a community evolves, eventually leading to language change

We show that NN simulations can help us unravel these complex processes, next to human experiments & corpus studies

See @yuqing0304.bsky.social’s thread below ⬇️

4 months ago 1 0 0 0

There’s more to Neural Nets than big fat LLMs!

We’ve built a NN-agent framework to simulate how people choose the best word in a given communication context (i.e. pragmatic naming behavior).

With @yuqing0304.bsky.social, @ecesuurker.bsky.social, Tessa Verhoef, @gboleda.bsky.social

4 months ago 4 2 1 1

- neural-agent simulations of language change (@yuqing0304.bsky.social)
- child-directed language & syntax learning in LMs (@frap98.bsky.social)
- Turkish benchmark of grammatical minimal pairs (@ezgibasar.bsky.social) & a massively multilingual one, MultiBLiMP (@jumelet.bsky.social)

...and more!

4 months ago 2 0 0 0

InCLow topics #EMNLP2025:

- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)

....

4 months ago 5 1 1 0

Thrilled to be heading to Suzhou with a big team of GroNLP'ers 🐮

Interested in Interpretable, Cognitively inspired, Low-resource LMs? Don't miss our posters & talks #EMNLP2025!

4 months ago 14 3 1 0

[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888

9 months ago 8 5 1 3

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

5 months ago 42 16 2 1

📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025

4 months ago 34 15 1 0

Delighted to share that our paper "Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization" (joint work with @arianna-bis.bsky.social and Raquel Fernández) got accepted to the main conference of #EMNLP

Can't wait to discuss our work at #EMNLP2025 in Suzhou this November!

6 months ago 14 2 0 0

We hope our work will advance the evaluation of LLMs in Turkish and, in general, encourage more research on the robustness of modern language technologies to typological diversity.

8 months ago 1 0 0 0

Finally, our experimental paradigms reveal that even LLMs excelling on general minimal pairs can be brittle to variations in word orders & subordination strategies, unlike human speakers.

See paper for results with 13 LLMs, including mono- and multilingual models of different sizes!

8 months ago 1 0 1 0

We also collect human acceptability judgements & show that *overall* harder phenomena for LLMs are also harder for people, but there are some notable exceptions.

8 months ago 1 0 1 0

TurBLiMP expands the shortlist of existing language-specific BLiMPs with 2 important properties: high word order freedom & agglutination.

To study LLMs' robustness to these properties, we create experimental paradigms testing syntactic skills w/ different word orders & subordination strategies:

8 months ago 1 0 1 0

This is hard, slow-paced work going well beyond benchmark translation (let alone LLM-assisted benchmark generation!) It requires real *linguistic* expertise & long discussions on what makes a phenomenon representative of a language. Here's our proposal, inspired by EnglishBLiMP w/ major adaptations:

8 months ago 2 0 1 0

Grammatical benchmarks are essential to drive progress in truly multilingual Language Modeling & to overcome the linguistic biases we inherit from the English-centeredness of our field.

I'm particularly happy to contribute to this for a language I spent years learning and still found fascinating!

8 months ago 2 0 1 0

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguis...

Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!

Pre-print: arxiv.org/abs/2506.13487

Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social

8 months ago 11 2 1 3

Happy to hear you find the analysis useful, Marco! If you have any extra questions, don’t hesitate to contact @jiruiqi.bsky.social

9 months ago 1 0 0 0

One step further in our quest to bring interpretability techniques to the service of MT end users: Are uncertainty & model-internals based metrics a viable alternative to supervised word-level quality estimation?

New paper w/ @gsarti.com
@zouharvi.bsky.social @malvinanissim.bsky.social

9 months ago 7 2 0 0

Large Reasoning Models are raising the bar for answer accuracy & transparency, but how does that work in multilingual settings? Can LRMs reason in your language, and what does that entail?

New preprint led by @jiruiqi.bsky.social and @shan23chen.bsky.social!

9 months ago 5 0 1 0

Proud to share the first key output of my Vidi project team w/ @frap98.bsky.social @jumelet.bsky.social @yevgenm.bsky.social who all took this topic to heart, as proved by the many overtime discussions at lunch time 😉

See Francesca’s thread & arXiv link below

9 months ago 3 0 0 0

Excited to see how the BabyLM community will take on this challenge @alexwarstadt.bsky.social @lchoshen.bsky.social @tallinzen.bsky.social @fourtassi.bsky.social and many more

9 months ago 3 0 1 0