Ensure your cultural perspective is represented. cohere.link/FyKPWbQ
Ensure your cultural perspective is represented. cohere.link/FyKPWbQ
Does AI truly understand different cultures and languages?
Weβre surveying cultural awareness in real-world AI use.
β¨ When cultural awareness matters in real-world AI use
π‘ Whether AI reflects diverse norms, communication styles & knowledge
π«₯Where AI falls short in cultural understanding
1) what? Cohere is here?!!!!
2) this is crazy
Woo hoo, who would have thought Canada would produce efficient massively multicultural models
π±Very proud of our team's latest release π meet Tiny Aya, a massively multilingual model with 3.35B parameters.
Tech report here: github.com/Cohere-Labs/...
Tiny Aya is small enough to run on a phone and powerful enough to support 70+ languages. That unlocks offline translation, local education tools, community research, and real multilingual experimentation without cloud infrastructure. π±
Tiny Aya shows what smaller models can do. It improves on previous Aya releases and outperforms models at similar size proving that smart multilingual design can rival larger models. This shows that focused multilingual research beats brute-force scalingβachieving more with less.
Built for balance, we narrow performance gaps across languages: Most multilingual models skew toward high-resource languages. Tiny Aya narrows that gap, sustaining stronger performance for underrepresented languages. π
Despite being smaller, Tiny Aya competes with 4B models across translation, mathematical reasoning, understanding, and generation with especially strong gains for African languages. π
We take a stance for language diversity. Going beyond the one-fits-all paradigm, we release not only one instruction-finetuned model balancing all 70 languages (Tiny Aya Global), but accompany it with three region-focused models π
Introducing β¨Tiny Ayaβ¨, a family of massively multilingual small language models built to run where people actually are.
Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.
And Research Engineer, @shivalika.bsky.social : The Leaderboard Illusion. πΆβπ«οΈ
This paper reveals systematic biases and transparency gaps in the Chatbot Arena leaderboard.
www.youtube.com/watch?v=URho...
Sr Research Scientist, @juliakreutzer.bsky.social: Treasure Hunt paper. πΊοΈ
This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers.
www.youtube.com/watch?v=K3BU...
Excited to have two of our papers featured in
@j-novikova-nlp.bsky.social's @wiair.bsky.social podcast, as part of the NeurIPS reflection. β¨
Learn more / subscribe here women-in-ai-research.github.io and check out this thread π§΅ for our features...
What an incredible week itβs been at #NeurIPS2025! π
Today is our last one at the booth. We've had a great week connecting with our community in San Diego.
Join our community to continue to connect with our research team: https://cohere.com/research/open-science/application
What's the story of your legend?
Join ML researchers building their legends with 40 cards that capture our shared journeyβexplore and build yours: https://lab-legends.vercel.app/ π―
Just 1 day left until #NeurIPS2025 kicks off! The Cohere and Cohere Labs teams are ready to dive into a packed week of research, conversations, and community at the San Diego Convention Centerβ¨
Come visit our booth β weβd love to chat and send you home with some swag!
... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo SΓ‘nchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social
You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.
Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...
βοΈ LLM-as-a-judge: mixed reliability.
Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.
π€Naturalness is still a significant challenge.
Across open-ended generation and cross lingual summarization, the biggest weakness isnβt coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.
π§ English isnβt always easiest.
Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.
π§©Linguistic reasoning remains the toughest nut. π₯₯
Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.
π Language coverage matters.
Models donβt support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking β οΈ
π§© Linguistic reasoning on unseen languages
π Open-ended generation testing naturalness and usefulness
π Cross-lingual summarization
π Machine translation
π§ββοΈ LLM-as-a-Judge evaluating outputs of other models
All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...
How well do LLMs handle multilinguality? ππ€
π¬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
River, Yinhong and I will all be in person and we look forward to the discussions!
Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"
Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social
π arxiv.org/abs/2506.20544
Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"
Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet ΓstΓΌn, Nigel Collier.
π arxiv.org/abs/2502.19158
Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"
Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.
π arxiv.org/abs/2505.24119