Kathy's Avatar

Kathy

@kathaem

Computational Linguistics / Multilingual Language Models Into SciFi, choir, cats (incomplete list of interests) they/them

56
Followers
113
Following
22
Posts
16.11.2024
Joined
Posts Following

Latest posts by Kathy @kathaem

This is indeed delightful, thanks for posting! Their channel seems to have whole albums' worth of similar songs but not sure if any of the others have subtitles or dancing πŸ’ƒ

14.11.2025 17:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

@aclanthology.org not sure where to report, but in the last few months I've often had issues with long loading times/timeouts on aclanthology.org. It's particularly bad today---maybe related to the upcoming ARR deadline?

03.10.2025 10:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Idk about "primarily" mate

17.08.2025 17:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

You mean the most popular *US* politicians on this list

06.08.2025 06:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Personally, sleeping more and vitamin D in the winter.

...sorry, not much of a baker

27.07.2025 21:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

@aclrollingreview.bsky.social Why is the reviewing window (still) so short this cycle? Wasn't the cycle extended to ten weeks specifically to make the process more manageable? Wasn't it three weeks in past cycles? Instead reviewers don't even get two full weeks to handle 4+ submissions.

06.06.2025 14:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
TokShop 2025 Registering interest in all things tokenization at TokShop @ ICML 2025 (July 18) Consider joining the Google group for future updates! https://groups.google.com/g/tokshop

TokShop @ #ICML2025 got way more submissions than expected! πŸ“ˆ We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb... πŸ™

02.06.2025 16:40 πŸ‘ 0 πŸ” 4 πŸ’¬ 0 πŸ“Œ 2

Beyond text: Modern AI tokenizes images too! Vision models split photos into patches, treating each 16x16 pixel square as a "token." πŸ–ΌοΈβž‘οΈπŸ”€ #VisualTokenization

Interested in tokenization? Join our workshop tokenization-workshop.github.io
The submission deadline is already May 30!

26.05.2025 19:55 πŸ‘ 4 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

I'll be presenting this paper in Gather Town (Session 1) in a few hours 🎊 Come along!

06.05.2025 13:37 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
When ChatGPT Broke an Entire Field: An Oral History | Quanta Magazine Researchers in β€œnatural language processing” tried to tame human language. Then came the transformer.

This is a fantastic oral history of the last 10 years of NLP and AI. www.quantamagazine.org/when-chatgpt...

01.05.2025 11:55 πŸ‘ 94 πŸ” 29 πŸ’¬ 2 πŸ“Œ 4

As a second language English speaker this also confused me for so long. Eventually I decided it must be from the phrase "having cake" which also means eating the cake

06.04.2025 09:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Me posing with my poster

Me posing with my poster

The tour guide standing next to a statue of Professor Lichtenberg.

The tour guide standing next to a statue of Professor Lichtenberg.

A slide of the vocabulary learning algorithm "SaGe"

A slide of the vocabulary learning algorithm "SaGe"

Just spent two days in GΓΆttingen at #HumanCLAIM workshop! Re-presented my poster on surveying methods for cross-lingual representation alignment, got a city tour, heard cool talks and had interesting conversations πŸ’¬πŸ’­

27.03.2025 15:04 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Oh very nice to see a paper for this intuition, and the data could be very useful! Adding to the reading list πŸ‘€

22.03.2025 09:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Figure 1: Eflomal score (bottom), a measure of token alignability, predicts downstream transfer performance better than the previous metric of distributional token
overlap (top). The difference is especially stark for language pairs with different scripts (β€’), compared to language pairs with the same script (Γ—). The orange line shows the linear fit across all included pairs.

Figure 1: Eflomal score (bottom), a measure of token alignability, predicts downstream transfer performance better than the previous metric of distributional token overlap (top). The difference is especially stark for language pairs with different scripts (β€’), compared to language pairs with the same script (Γ—). The orange line shows the linear fit across all included pairs.

Alignability is more predictive of cross-lingual transfer than divergence of literal token distributions, particularly for language pairs with disparate scripts.

03.03.2025 17:04 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Basically we argue that token overlap measures for predicting multilingual performance are too literal, and introduce the notion of **token alignability**, which can be measured via the scores of a statistical aligner over a corpus tokenised with a given tokenised.

03.03.2025 17:04 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Beyond Literal Token Overlap: Token Alignability for Multilinguality Previous work has considered token overlap, or even similarity of token distributions, as predictors for multilinguality and cross-lingual knowledge transfer in language models. However, these very li...

Happy to say that our paper "Beyond Literal Token Overlap: Token Alignability for Multilinguality" will be presented at #NAACL2025!

This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.

arxiv.org/abs/2502.06468

#newpaper #NLP #NLProc

03.03.2025 17:04 πŸ‘ 11 πŸ” 3 πŸ’¬ 1 πŸ“Œ 2
Post image

Following the MT Marathon, we're hosting a hackathon in Prague. Researchers and students from five institutions (+1 online) are working together to assess how robust #LLMs are to grammar errors in machine translation and related tasks. Thanks to EAMT for their support.

27.02.2025 16:07 πŸ‘ 18 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

@queerinai.com Hi, I was invited to review for the workshop the other day but the email is not clear on when reviews will be due. This info will be important to decide if I'm able to serve; can you share the deadlines? Thanks!

19.02.2025 12:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“Œ

01.01.2025 22:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Gotta say I'm not sure what pronunciation "luh-BOEV" is referring to but in my head it sounds like French beef

26.12.2024 08:10 πŸ‘ 9 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Germany. a) ground floor b) first floor. This matches how we count in German but the German terms basically treat the "upper floors" separately from the "ground floor"

18.12.2024 09:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Bill Labov died this morning. I'm not coherent enough to talk about how important and influential and brilliant he was. I am very sad.

I was so lucky to know him, and I am grateful every day that he (and Gillian, and Walt, etc) built an academic field where kindness is expected.

18.12.2024 02:08 πŸ‘ 699 πŸ” 120 πŸ’¬ 24 πŸ“Œ 25

To add to the reviewing complaints πŸ˜… Why do authors so often respond with an absolute wall of text? (Biggest response I got this time was four comments long.) As a reviewer, I find this very tough to engage with in the short discussion period, and as an author, I try to be concise in my responses.

25.11.2024 10:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

5k is a small town, honestly πŸ˜‚

20.11.2024 12:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Just wanted to say a quick thank you for organising a lovely social! 🎊🌈

18.11.2024 14:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Right now the app is being very laggy though?

16.11.2024 23:26 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Today I finally deactivated my Twitter account (not that I'd been super active there but hey) and decided to check out Bluesky. Looks like there's already a LOT of people here!

16.11.2024 23:19 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0