Cohere Labs's Avatar

Cohere Labs

@cohereforai

@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research

679
Followers
12
Following
215
Posts
10.12.2024
Joined
Posts Following

Latest posts by Cohere Labs @cohereforai

Preview
Cultural Awareness User Perception Survey Hello! Welcome to the Cultural Awareness Survey! This survey is authored by a team of researchers at Cohere Labs, who investigate cultural understanding in LLMs. Below are the instructions for comple...

Ensure your cultural perspective is represented. cohere.link/FyKPWbQ

03.03.2026 17:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Does AI truly understand different cultures and languages?

We’re surveying cultural awareness in real-world AI use.
✨ When cultural awareness matters in real-world AI use
πŸ’‘ Whether AI reflects diverse norms, communication styles & knowledge
πŸ«₯Where AI falls short in cultural understanding

03.03.2026 17:12 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 1

1) what? Cohere is here?!!!!
2) this is crazy

18.02.2026 12:59 πŸ‘ 80 πŸ” 8 πŸ’¬ 2 πŸ“Œ 2

Woo hoo, who would have thought Canada would produce efficient massively multicultural models

18.02.2026 14:53 πŸ‘ 40 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

🌱Very proud of our team's latest release 😊 meet Tiny Aya, a massively multilingual model with 3.35B parameters.

Tech report here: github.com/Cohere-Labs/...

18.02.2026 02:16 πŸ‘ 33 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Tiny Aya is small enough to run on a phone and powerful enough to support 70+ languages. That unlocks offline translation, local education tools, community research, and real multilingual experimentation without cloud infrastructure. πŸ“±

17.02.2026 15:15 πŸ‘ 16 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

Tiny Aya shows what smaller models can do. It improves on previous Aya releases and outperforms models at similar size proving that smart multilingual design can rival larger models. This shows that focused multilingual research beats brute-force scalingβ€”achieving more with less.

17.02.2026 15:15 πŸ‘ 9 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Built for balance, we narrow performance gaps across languages: Most multilingual models skew toward high-resource languages. Tiny Aya narrows that gap, sustaining stronger performance for underrepresented languages. πŸ“ˆ

17.02.2026 15:15 πŸ‘ 13 πŸ” 0 πŸ’¬ 1 πŸ“Œ 2
Post image

Despite being smaller, Tiny Aya competes with 4B models across translation, mathematical reasoning, understanding, and generation with especially strong gains for African languages. 🌍

17.02.2026 15:15 πŸ‘ 16 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

We take a stance for language diversity. Going beyond the one-fits-all paradigm, we release not only one instruction-finetuned model balancing all 70 languages (Tiny Aya Global), but accompany it with three region-focused models 🌐

17.02.2026 15:15 πŸ‘ 15 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are.

Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

17.02.2026 15:15 πŸ‘ 97 πŸ” 15 πŸ’¬ 2 πŸ“Œ 5
NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed
NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed YouTube video by Women in AI Research WiAIR

And Research Engineer, @shivalika.bsky.social : The Leaderboard Illusion. πŸ˜Άβ€πŸŒ«οΈ

This paper reveals systematic biases and transparency gaps in the Chatbot Arena leaderboard.

www.youtube.com/watch?v=URho...

29.12.2025 15:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
NeurIPS 2025 in San Diego. Treasure Hunt
NeurIPS 2025 in San Diego. Treasure Hunt YouTube video by Women in AI Research WiAIR

Sr Research Scientist, @juliakreutzer.bsky.social: Treasure Hunt paper. πŸ—ΊοΈ

This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers.

www.youtube.com/watch?v=K3BU...

29.12.2025 15:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Women in AI Research Podcast Celebrating the remarkable contributions of female AI researchers from around the globe

Excited to have two of our papers featured in
@j-novikova-nlp.bsky.social's @wiair.bsky.social podcast, as part of the NeurIPS reflection. ✨

Learn more / subscribe here women-in-ai-research.github.io and check out this thread 🧡 for our features...

29.12.2025 15:59 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

What an incredible week it’s been at #NeurIPS2025! πŸŽ‰

Today is our last one at the booth. We've had a great week connecting with our community in San Diego.

Join our community to continue to connect with our research team: https://cohere.com/research/open-science/application

05.12.2025 19:00 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

What's the story of your legend?

Join ML researchers building their legends with 40 cards that capture our shared journeyβ€”explore and build yours: https://lab-legends.vercel.app/ 🎯

03.12.2025 15:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Just 1 day left until #NeurIPS2025 kicks off! The Cohere and Cohere Labs teams are ready to dive into a packed week of research, conversations, and community at the San Diego Convention Center✨

Come visit our booth β€” we’d love to chat and send you home with some swag!

01.12.2025 11:00 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo SΓ‘nchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social

30.10.2025 17:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.

Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...

30.10.2025 17:51 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

βš–οΈ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ€–Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🧩Linguistic reasoning remains the toughest nut. πŸ₯₯

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🌐 Language coverage matters.

Models don’t support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking ⚠️

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🧩 Linguistic reasoning on unseen languages
πŸ“ Open-ended generation testing naturalness and usefulness
πŸ“˜ Cross-lingual summarization
πŸ” Machine translation
πŸ§‘β€βš–οΈ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...

30.10.2025 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How well do LLMs handle multilinguality? πŸŒπŸ€–

πŸ”¬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.

30.10.2025 17:51 πŸ‘ 3 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

River, Yinhong and I will all be in person and we look forward to the discussions!

29.10.2025 21:12 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"

Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social

πŸ“œ arxiv.org/abs/2506.20544

29.10.2025 18:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"

Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.

πŸ“œ arxiv.org/abs/2502.19158

29.10.2025 18:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"

Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.

πŸ“œ arxiv.org/abs/2505.24119

29.10.2025 18:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0