's Avatar

@changemily

3
Followers
2
Following
9
Posts
21.11.2025
Joined
Posts Following

Latest posts by @changemily

This work was done with my amazing collaborator Niyati Bafna, @niyatibafna.bsky.social, @tticconnect.bsky.social

24.11.2025 23:51 👍 0 🔁 0 💬 0 📌 0
Preview
chikhapo ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

We hope you use ChiKhaPo in evaluating your own models! We have released our benchmark code and data as a
🐍 Python package (pypi.org/project/chik...) and
🤗 Huggingface dataset (huggingface.co/datasets/ec5...)

24.11.2025 23:46 👍 0 🔁 0 💬 1 📌 0
Post image

There is a strong linear correlation between MT and Word Translation. MT datasets are expensive to come by - in their absence, ChiKhaPo can provide a cheap proxy for MT performance.

24.11.2025 23:45 👍 0 🔁 0 💬 1 📌 0
Post image

Here’s a plot of language resource level against the model’s task performance. It’s logarithmic: the long tail of languages does very badly, and performance improves quickly for mid-resource languages.

24.11.2025 23:45 👍 0 🔁 0 💬 1 📌 0
Post image

When we group SOTA model results by language family, the performance gap between Indo-European languages and underrepresented Austronesian and Atlantic-Congo languages becomes evident.

24.11.2025 23:44 👍 0 🔁 0 💬 1 📌 0
Post image

Results on 6 SOTA models show that there remains significant room for improvement across all 8 subtasks: ChiKhaPo is a challenging measure of multilingual performance at the lexical level.

24.11.2025 23:44 👍 0 🔁 0 💬 1 📌 0
Post image

ChiKhaPo draws from numerous publicly available resources and can be easily extended to even more languages as these resources expand:

📗 translation lexicons (PANLEX, IDS, GATITOS),
📃 monolingual text (GLOTLID), and
📖 bitext (FLORES+)

24.11.2025 23:43 👍 0 🔁 0 💬 1 📌 0
Post image

Models in ChiKhaPo are evaluated on their ability to translate words to English (comprehension X→model) and from English (generation model→X), in 4 settings and 2 directions. We illustrate all 8 subtasks below.

24.11.2025 23:42 👍 0 🔁 0 💬 1 📌 0
Post image

Frustrated with how most of the world’s low-resource languages have NO evaluation resources?

📢 Check out ChiKhaPo, a massively multilingual lexical comprehension and generation benchmark covering 2700+ languages.
www.arxiv.org/abs/2510.16928

24.11.2025 23:41 👍 1 🔁 2 💬 1 📌 0