Ali Modarressi (@amodarressi)

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance Expert persona prompting -- assigning roles such as expert in math to language models -- is widely used for task improvement. However, prior work shows mixed results on its effectiveness, and does not...

📢 New paper accepted at @eaclmeeting.bsky.social
2026:

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions

with
@mhedderich.bsky.social
@amodarressi.bsky.social
Hinrich Schuetze
& Benjamin Roth.

Preprint: arxiv.org/abs/2512.12775

23.01.2026 19:07 👍 2 🔁 1 💬 1 📌 0

🧑‍🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social!

Topics include, but aren’t limited to:

🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology

Please share!

#NLProc #NLP

11.12.2025 13:36 👍 41 🔁 25 💬 1 📌 3

CIS & MaiNLP Group picture at EMNLP 2025! 🤩 🤗 (1/3)

While I sadly 🥲 won't be at EMNLP this year myself, please do reach out to any of our members for a chat if you are interested in our research!

We also co-organize and participate in some great workshops at EMNLP:

06.11.2025 09:52 👍 13 🔁 1 💬 1 📌 0

Excited to be here in Suzhou for #EMNLP2025!
I’ll be presenting “ImpliRet”, check out our poster on Friday Nov. 7th at 14:00.
If you’re into long-context, IR, or just want to chat, come *Pay Ali* a visit 😁
Link to thread:
x.com/zeinabtaghav...

06.11.2025 02:53 👍 1 🔁 0 💬 0 📌 0

Details on poster times and locations coming soon.

Would love to meet and chat ☕️💬

If you’re attending #ACL2025, feel free to stop by and say hi! 👋
🧵[4/4]

20.07.2025 22:52 👍 0 🔁 0 💬 0 📌 0

Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual kn...

⏱️🔎 Time Course MechInterp
We track how factual knowledge forms in OLMo over training by analyzing the evolving roles of Attention Heads and FFNs.
Heads are dynamic and often repurposed; FFNs are stable and keep refining facts.
By: A. Dawar Hakimi
arxiv.org/abs/2506.03434
🧵[3/4]

20.07.2025 22:52 👍 0 🔁 0 💬 1 📌 0

Amir H. Kargaran on X: "Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx" / X Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx

🌐 MEXA: Multilingual Evaluation of English-Centric LLMs

A method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level.

By: @kargaranamir.bsky.social

x.com/amir_nlp/sta...
🧵[2/4]

20.07.2025 22:52 👍 1 🔁 0 💬 1 📌 0

Leaving Vancouver after ICML’s closing fireworks 😁🎆

Heading to Toronto for a few days, then off to
@aclmeeting.bsky.social to present:

"Collapse of Dense Retrievers"
A work by @mohsen-fayyaz.bsky.social that I was fortunate to collaborate on.

Also co-presenting two other papers…🧵 [1/4]

20.07.2025 22:52 👍 0 🔁 0 💬 1 📌 0

Ali Modarressi on X: "🚀 Introducing NoLiMa Paper 🚀 Most long-context benchmarks have literal overlaps between the questions and the context—but what if they didn’t? 🤔 Turns out, it’s a tough challenge! Powerful models like GPT-4o performance drops from 99.3% to 69.7% at 32K context length. 📉 https://t.co/Fo3YsGCBsi" / X 🚀 Introducing NoLiMa Paper 🚀 Most long-context benchmarks have literal overlaps between the questions and the context—but what if they didn’t? 🤔 Turns out, it’s a tough challenge! Powerful models like GPT-4o performance drops from 99.3% to 69.7% at 32K context length. 📉 https://t.co/Fo3YsGCBsi

Full NoLiMa post thread (X / Twitter): x.com/AModarressi/...

09.07.2025 13:53 👍 0 🔁 0 💬 0 📌 0

NoLiMa: Long-Context Evaluation Beyond Literal Matching Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves ret...

Check out the paper & our GitHub repo (with results on recent models 🆕✨)!
📄: arxiv.org/abs/2502.05167
🔗: github.com/adobe-resear...
🤗: huggingface.co/datasets/amo...
This work was my internship project at
@adobe.com, in collaboration with my mentors there and Hinrich Schütze.

09.07.2025 13:53 👍 1 🔁 0 💬 1 📌 0

I’ll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30–7pm (E-2312).

Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.

I’ll be in Toronto for a couple of days after the conference, let me know if you’re around!

09.07.2025 13:53 👍 4 🔁 2 💬 1 📌 0

MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory

Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schuetze

Action editor: Greg Durrett

https://openreview.net/forum?id=dghM7sOudh

#memory #memorizing #memllm

19.05.2025 00:07 👍 2 🔁 1 💬 0 📌 0

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robu...

The takeaway? we need robust retrievers that prioritize answer relevance, not just heuristic shortcuts.

work with an amazing team:
@mohsen-fayyaz.bsky.social,
Hinrich Schütze,
@violetpeng.bsky.social

paper: arxiv.org/abs/2503.05037
dataset 🤗: t.co/QZFyCLqP0P

Cross-post from x.com/mohsen_fayyaz

17.05.2025 20:28 👍 3 🔁 0 💬 0 📌 0

We also analyze RAG: biased retrievers can mislead LLMs, degrading their performance by 34%, worse than retrieving nothing! 😮

17.05.2025 20:28 👍 1 🔁 0 💬 1 📌 0

When multiple biases combine, retrievers fail catastrophically:
📉 Answer-containing docs ranked <3% of the time over a synthetic biased doc with no answer!

17.05.2025 20:28 👍 1 🔁 0 💬 1 📌 0

Dense retrievers are crucial for RAG and search, but do they actually retrieve useful evidence? 🤔
We design controlled experiments by repurposing a relation extraction dataset, exposing serious flaws in models like Dragon+ and Contriever.

17.05.2025 20:28 👍 2 🔁 0 💬 1 📌 0

📄 Collapse of Dense Retrievers

Accepted to #ACL2025 main conference 🎉🎉

In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
📌 Shorter docs
📌 Early positions
📌 Repeated entities
📌 Literal matches
...all while ignoring the answer's presence!

17.05.2025 20:28 👍 9 🔁 2 💬 1 📌 1

Ali Modarressi

Latest posts by Ali Modarressi @amodarressi