Semantic Component Analysis: Introducing Multi-Topic Distributions to Clustering-Based Topic Modeling
Florian Eichin, Carolin M. Schuster, Georg Groh, Michael A. Hedderich. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025.
Joining this interesting thread: Here's an idea using the embedding->clustering approach to id such dims (similar to SAEs) and how to use it for topic modeling and decomposing representations.
aclanthology.org/2025.finding...
(it's my Master's thesis, unaware of MechInterp and LRH etc.)
07.01.2026 14:08
👍 1
🔁 0
💬 0
📌 0
✨New paper✨
We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations — producing e.g. Cyrillic Japanese transcriptions.
07.01.2026 03:04
👍 9
🔁 4
💬 1
📌 0
13.08.2025 06:11
👍 1
🔁 0
💬 0
📌 0
Thank you to our great Munich Center for Machine Learning (@munichcenterml.bsky.social) for featuring me in this research film! Lots of great films clips with my MCML colleagues are available on MCML's YouTube channel. #ai #aiethics #mcml #philosophy youtu.be/KUqiY8o1yng?...
07.08.2025 09:19
👍 7
🔁 2
💬 0
📌 0
Unsure which presentations to attend at #ACL2025? 🛎️🗣️
27.07.2025 09:56
👍 4
🔁 2
💬 0
📌 0
🕺🏼swing by our poster in Hall 4/5 on Wednesday, July 30 at 11:00 to chat with @florian-eichin.com and I to find out the answers to these questions
🛎️ bonus: to see the full poster 🫣🧩
#ACL2025 #NLProc
23.07.2025 14:03
👍 3
🔁 1
💬 0
📌 0
is the bike doing fine though?? 😥
23.07.2025 12:58
👍 0
🔁 0
💬 1
📌 0
📝Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
🔎Do LLMs encode and generalize discourse knowledge across languages?
👥 @florian-eichin.com @janetlauyeung.bsky.social @mhedderich.bsky.social @barbaraplank.bsky.social
🔗 arxiv.org/abs/2503.10515
📁Main - Long
23.07.2025 12:29
👍 3
🔁 1
💬 1
📌 1
Some recommendations for #ACL2025 👇
(join me and @janetlauyeung.bsky.social to talk about discourse generalization and probing!)
23.07.2025 12:38
👍 4
🔁 1
💬 0
📌 0
Headed to ACL? MaiNLP & our most recent work will be there too👥📄
Come see what we’ve been working on!
23.07.2025 12:29
👍 14
🔁 5
💬 1
📌 2
XAI’s dogwater performance on the 2025 IMO confirms that their Grok 4 benchmark claims were hot air. Their eye popping metrics were down to the following innovations:
- train on test
- train on test
- train on test
20.07.2025 05:15
👍 306
🔁 38
💬 6
📌 9
Interpretability meets Discourse. Congratulations to
@florian-eichin.com to his first ACL paper 🎉
10.07.2025 13:19
👍 4
🔁 1
💬 0
📌 0
Paper alert 🛎️
10.07.2025 12:40
👍 6
🔁 0
💬 0
📌 0
to appear at ACL2025
🦙 how well do LLMs encode discourse knowledge? does that generalize across languages?
🛎️ in our #ACL2025 paper, we uncover fascinating trends about multilingual discourse representations!
joint work w/ @florian-eichin.com @barbaraplank.bsky.social @mhedderich.bsky.social
📄 arxiv.org/abs/2503.10515
10.07.2025 12:38
👍 16
🔁 3
💬 1
📌 2
I’ll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30–7pm (E-2312).
Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.
I’ll be in Toronto for a couple of days after the conference, let me know if you’re around!
09.07.2025 13:53
👍 4
🔁 2
💬 1
📌 0
Caught some great moments at #MCML Munich AI Day 2025 last week📍
From sharp keynotes to poster debates. Our team had the chance to show some recent work, join the conversations, and bring back plenty of food for thought🧠🗣️📊
09.07.2025 08:14
👍 7
🔁 2
💬 0
📌 0
The study is here but gated: journals.sagepub.com/doi/10.3102/...
I’d be curious how these dynamics play out in our NLP review crisis. My hunch: many conscientious volunteers might be junior women. That time comes at a cost; chasing slackers means less time for rebutting my own reviews.
08.07.2025 05:56
👍 34
🔁 6
💬 0
📌 0
Thanks for the invitation to the Freiburg Institute for Advanced Studies (FRIAS) to give this year's Hermann-Paul-Center Lecture lnkd.in/d_wUeDfY
I enjoyed the visit, the great audience, and the stay in this lovely city.
Thank you
#blackforest #freiburg #breisgau
04.07.2025 10:59
👍 10
🔁 2
💬 0
📌 0
preprint is out
bsky.app/profile/alex...
03.07.2025 09:10
👍 5
🔁 2
💬 0
📌 0
My MSc-thesis has been turned into a paper (whose framing you will probably not enjoy) that introduces a method which can be viewed as an unsupervised solution to a similar problem. Will share later to avoid biasing review process
03.07.2025 14:59
👍 0
🔁 0
💬 1
📌 0
Interesting! And indeed very relevant as it enables control over the similarity modeled by the embeddings. Figure 2 is really cool. Which base embeddings were used for this?
03.07.2025 14:57
👍 0
🔁 0
💬 1
📌 0
Haha can't wait. Let's continue the discussion at ACL!
02.07.2025 09:29
👍 1
🔁 0
💬 0
📌 0
Yeah, agreed and aware of your work :) though as established above, emb+clustering has its niche in large scale analysis with factors like multilinguality. There, LDA tends to have problems and TopicGPT is too expensive.
02.07.2025 08:32
👍 1
🔁 0
💬 2
📌 0
Awesome! And yes, I totally understand and agree with the scepticism towards that
02.07.2025 08:12
👍 1
🔁 0
💬 0
📌 0
Mixed language data is common on, e.g., Chinese Twitter which we found to be very diverse. Since topics are distributions over tokens and a single doc is usually just one language, the only way I see to make LDA work is by translating Tweets?
02.07.2025 06:58
👍 1
🔁 0
💬 1
📌 0
Yeah, that makes a lot of sense. I think of BERTopic as a convenient, quick way to try that on your own data, which is, I think, another reason why less techy people like to use it.
02.07.2025 06:53
👍 0
🔁 0
💬 1
📌 0
What scenarios are you 'typically' considering? Working with Twitter data of 1M+ samples, I couldn't get any of the LDA derivates I've tried to produce good results. Non-English/mixed language data is also challenging. (Genuinely curious)
01.07.2025 12:54
👍 0
🔁 0
💬 1
📌 0