Can LLMs figure out who you are from your anonymous posts?
From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.
New π w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer π§΅
20.02.2026 17:03
π 122
π 44
π¬ 8
π 14
our paper on data mixing for LMs is out!
while building Olmo 3, we saw gaps between data mixing literature and real practice
π choosing proxy size, # runs, sampling, regression, constraints..
πdata shifts during LM dev: can we reuse past experiments?
Olmix tackles them all!
13.02.2026 17:30
π 29
π 4
π¬ 1
π 0
aclanthology.org/2025.finding...
07.11.2025 10:19
π 0
π 0
π¬ 0
π 0
If you're attending #EMNLP2025, we'll be presenting virtually in Gather Session 1 on Nov 5 at 4pm PT. Come say hello!
w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social
Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com
31.10.2025 15:36
π 8
π 1
π¬ 1
π 0
What if a single model could recognize an author's writing style no matter what language they wrote in? πβοΈ Our new #EMNLP2025 paper explores multilingual authorship representation, showing how training across 36 languages can sharpen stylistic signals and reduce topic bias.
ππ§΅
06.11.2025 05:42
π 18
π 2
π¬ 1
π 0
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
04.11.2025 13:42
π 0
π 0
π¬ 1
π 0
Lot's of exciting work on linguistic style this year at #EMNLP2025 #EMNLP! Including work on machine-text detection, authorship representation and more
π§΅ with anthology links below
π£ with an open call to everyone to add style work that's missing
04.11.2025 13:42
π 9
π 1
π¬ 1
π 0
I have a new blog post about the so-called βtokenizer-freeβ approach to language modeling and why itβs not tokenizer-free at all. I also talk about why people hate tokenizers so much!
25.09.2025 15:14
π 59
π 15
π¬ 5
π 2
I successfully defended my PhD in Dutch fashion and required a PhD certificate in Latin. Thank you to the amazing people that got me here, a.o. @dongng.bsky.social and the ones I blur here.
22.10.2025 14:20
π 34
π 2
π¬ 1
π 1
Come join next Wednesday if you want to rant about society's love-hate relationship with LLMs!
16.10.2025 09:32
π 13
π 7
π¬ 0
π 0
one of the other entrances was closed off yesterday, increasing my commute from front door to office by another 10 minutes
20.08.2025 06:29
π 0
π 0
π¬ 0
π 0
Tussen MΓΆnchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht: een monitor waarop staat βde bus hΓ€ltβ.
Tussen MΓΆnchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht
01.08.2025 08:48
π 18
π 3
π¬ 0
π 0
work with and by @yupeidu.bsky.socialβ¬
05.08.2025 15:40
π 1
π 0
π¬ 0
π 0
Disentangling the Roles of Representation and Selection in Data Pruning. arxiv.org/abs/2507.03648
On Support Samples of Next Word Prediction. arxiv.org/abs/2506.04047
05.08.2025 15:37
π 1
π 0
π¬ 2
π 0
VAQUUM: Are Vague Quantifiers Grounded in Visual Data? arxiv.org/pdf/2502.11874
05.08.2025 15:37
π 0
π 0
π¬ 1
π 0
Utrecht is back from #ACL2025! We had a blast.
I should have posted this before but here are some papers from people in our group that were presented at ACL.
05.08.2025 15:37
π 3
π 0
π¬ 1
π 0
I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.
30.07.2025 14:03
π 11
π 4
π¬ 2
π 0
Since people at #ACL2025 are very interested in tokenization, a reminder to join the discussion on discord set up by @mcognetta.bsky.social
29.07.2025 12:52
π 9
π 2
π¬ 0
π 0
Anyone tried the kiss the cook lunch place at #ACL2025?
28.07.2025 12:57
π 0
π 0
π¬ 0
π 0
I think accepted
28.07.2025 12:56
π 1
π 0
π¬ 0
π 0
I will present our #ACL2025 paper Tokenization is Sensitive to Language Variation in the poster session after Tuesday's keynote, 10.30 - 12.00 in Hall 4/5
28.07.2025 05:43
π 10
π 0
π¬ 0
π 0
@philipwitti.bsky.social will be presenting our paper "Tokenisation is NP-Complete" at #ACL2025 π Come to the language modelling 2 session (Wednesday morning, 9h~10h30) to learn more about how challenging tokenisation can be!
27.07.2025 09:41
π 7
π 3
π¬ 0
π 0
We are presenting this paper at #ACL2025 π Find us at poster session 4 (Wednesday morning, 11h~12h30) to learn more about tokenisation bias!
27.07.2025 11:59
π 11
π 2
π¬ 0
π 0
Im at #ACL2025 this week.
Happy to chat about measuring linguistic style, data diversity, creating synthetic data for analyzing (L)LMs, authorship attribution, paraphrases and tokenizers.
Letβs chat if youβre around
27.07.2025 17:30
π 1
π 0
π¬ 0
π 0
The #ACL2025 #ACL2025NLP feed is up and running! It matches both hashtags and any posts from or mentions of @aclmeeting.bsky.social
Pin it to your home π and enjoy!
bsky.app/profile/did:...
17.07.2025 11:15
π 48
π 14
π¬ 2
π 0
Who's presenting on subjectivity in annotation (human label variation, learning from disagreement, perspectivism) at #ACL2025?
papers by e.g. @liweijiang.bsky.social @tiancheng.bsky.social @gabriellalapesa.bsky.social @romanklinger.de
keynote @verenarieser.bsky.social
link to full list below ‡οΈ
24.07.2025 16:57
π 12
π 2
π¬ 2
π 0
I love it.
24.07.2025 15:48
π 2
π 0
π¬ 0
π 0