MLCommons (@mlcommons.org)

Join our AI Community | MLCommons Join the MLCommons community. As the leading AI benchmarking organization in the world, our membership spans over 125+ global organizations.

125+ members. Safety, inference, training, storage, automotive benchmarks & more. MLCommons exists because neutral AI measurement standards require a community. Get involved: mlcommons.org/get-involved/ #MLCommons

06.03.2026 16:14 👍 0 🔁 0 💬 0 📌 0

Results day is coming.
MLPerf Inference v6.0 drops April 1 — cross-platform AI inference data spanning datacenter, edge & more. Follow so you don't miss it.
#MLPerf #AIInference

05.03.2026 15:30 👍 0 🔁 0 💬 0 📌 0

Standardize Gen AI Service Evaluation: An API-Centric Benchmarking Approach Evaluating Gen AI services across heterogeneous environments presents significant visibility gaps for developers, as traditional hardware-centric b...

AI benchmarking is moving to the API layer. MLCommons Co-Founder David Kanter presents MLPerf Endpoints at NVIDIA GTC on March 19, 12–12:40 PM PT. Worth attending if you're making decisions about AI infrastructure.
https://bit.ly/4l5p8Uh
#GTC2026 #MLCommons

04.03.2026 15:21 👍 0 🔁 0 💬 0 📌 0

AI risk assessment now has a global standard. The MLCommons AILuminate Global Assurance Program gives organizations a structured, independent path to evaluate AI reliability. 🔗https://mlcommons.org/2026/02/ailuminate-global-assurance/

#AIGovernance #AILuminate

03.03.2026 15:08 👍 0 🔁 0 💬 0 📌 0

MLCommons Lays the Foundation for Defensible Jailbreak Benchmarking - MLCommons MLCommons introduces a mechanism-first taxonomy for single-turn jailbreak attacks, providing the structural foundation for defensible, reproducible AI security evaluation

How do you make jailbreak benchmarks defensible to auditors? MLCommons just published a mechanism-first taxonomy for reproducible, governance-aligned LLM robustness evaluation. Not a leaderboard — the foundation for trustworthy ones. 🔗 https://bit.ly/3ZCHqlZ
#AILuminate

02.03.2026 21:56 👍 0 🔁 0 💬 0 📌 0

Last week, Rebecca Weiss announced the AILuminate Global Assurance Program from the stage of a global AI standards panel in New Delhi. Read about the program and watch the panel below.
📖 https://bit.ly/4kIS18x
▶️ https://bit.ly/46nDtp3
#AIImpactSummit #MLCommons #AILuminate

23.02.2026 18:05 👍 0 🔁 0 💬 0 📌 0

2026 Rising Star applications are open!
The ML and Systems Rising Stars program is open to all graduate students and postdoctoral associates with research backgrounds/interests in machine learning and systems.
Apply now!
https://bit.ly/4s4j4h6
Applications Deadline: Wednesday, March 18, 2026

23.02.2026 14:40 👍 0 🔁 0 💬 0 📌 0

Announcing the AILuminate Global Assurance Program.
MLCommons + Google, Microsoft, Qualcomm & KPMG are building a structured framework for AI risk measurement:
🔧 Benchmarking as a Service
🏷️ AILuminate Risk Labels
🌍 A Global Framework
https://bit.ly/4kIS18x

20.02.2026 02:36 👍 0 🔁 0 💬 0 📌 1

Acting with urgency to address the growing AI divide Microsoft announces at the India AI Impact Summit it is on pace to invest $50 billion by the end of the decade to help bring AI to countries across the Global South

Microsoft is co-leading the expansion of MLCommons' AILuminate benchmark to Hindi, Tamil, Malay, Japanese & Korean -AI safety evals built from the ground up in local linguistic and cultural contexts.
blogs.microsoft.com/on-the-issues/2026/02/17/acting-with-urgency-to-address-the-growing-ai-divide/

19.02.2026 20:43 👍 0 🔁 0 💬 0 📌 0

Friday at #AIImpactSummitIndia — who controls AI matters as much as who can access it.
#MLCommons' Peter Mattson joins The Alan Turing Institute, Microsoft, Gates Foundation, GoI & more for "Trustworthy AI for All."
🕥 10:30–11:30 AM IST | Bharat Mandapam
▶️ https://bit.ly/40jS0OQ
#TrustworthyAI

19.02.2026 04:26 👍 1 🔁 1 💬 0 📌 0

Attending #AIImpactSummitIndia? Come find us! 👋
Feb 20:
10:30 AM IST – Peter Mattson: Trustworthy AI for All
▶️ https://bit.ly/40jS0OQ
12:30 PM IST – Rebecca Weiss: Standards as Strategy
▶️ https://bit.ly/4kHum8h
We'd love to connect! DM us or reply below!

19.02.2026 03:13 👍 1 🔁 1 💬 0 📌 0

Feb 20 at #AIImpactSummit: MLCommons' Rebecca Weiss joins leaders from OpenAI, Google DeepMind, Microsoft & more for "Standards as Strategy: Accelerating AI Market Growth."
🕧 12:30–1:30 PM IST | Sushma Swaraj Bhawan
▶️ https://bit.ly/4kHum8h

18.02.2026 16:15 👍 0 🔁 0 💬 0 📌 0

Endpoints 2025 - Shalaleh Rismani: Safety Is a Systems Problem: Understanding & Evaluating AI YouTube video by MLCommons

Shalaleh Rismani: Safety Is a Systems Problem: Understanding and Evaluating AI in Context
Her core argument: We evaluate AI models in lab conditions, then wonder why things go wrong in deployment. But safety isn't about the model in isolation - it's about the model in context.
https://bit.ly/4kHtKzv

18.02.2026 15:24 👍 0 🔁 0 💬 0 📌 0

Most jailbreak testing is ad hoc — informal attack sets, inconsistent classification, and hard to defend to auditors.
New from MLCommons: a mechanism-first taxonomy for single-turn prompt attacks that enables deterministic, reproducible, and governance-ready evaluation.
https://bit.ly/3ZCHqlZ

16.02.2026 22:04 👍 3 🔁 2 💬 1 📌 0

⏰ 2 DAYS LEFT
MLPerf Inference v6.0 submissions close Friday, 2/13.
New:
-Qwen3-VL + Shopify (VLM, e-commerce scale)
-DLRMv3 (generative recommendations, 1TB)
-And much more!

Hardware vendors, cloud providers: submit your results → https://bit.ly/40bMEoI

12.02.2026 19:16 👍 0 🔁 0 💬 0 📌 0

AI agents need datasets that describe themselves. Croissant 1.1 from MLCommons adds machine-actionable provenance, semantic vocabularies, and embedded governance—so autonomous systems can discover and use data while respecting permissions.
700K+ datasets and counting
🔗 mlcommons.org/2026/02/croi...

12.02.2026 15:33 👍 1 🔁 0 💬 0 📌 1

AI is probabilistic by design—same input, different output every time. That’s what makes it powerful AND hard to trust.

New from MLCommons: Why technical standards are the bridge between ISO objectives and real-world AI reliability 🔗https://mlcommons.org/2026/02/ai-standards-bridge-adoption/

11.02.2026 16:19 👍 1 🔁 1 💬 0 📌 0

Endpoints 2025 -Token Efficiency Panel: M. Stewart, Zain Asgar, S. Annapureddy, Naveen Rao, Jay Ram YouTube video by MLCommons

#Endpoints2025 - Efficiency Panel
Deep dive on efficiency across the AI stack:
→ Token production
→ Model optimization
→ Deployment efficiency
Watch: https://bit.ly/46lhTBm
Michael Stewart (M12), Zain Asgar (Gimlet Labs), S. Annapureddy (Netradyne), Naveen Rao (UnconventionalAI), Jay Ram (Hud)

11.02.2026 13:07 👍 0 🔁 0 💬 0 📌 0

DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 ≤ 80ms latency constraint

Deadline: Friday 2/13
https://bit.ly/3NYxKQb

10.02.2026 14:24 👍 0 🔁 0 💬 0 📌 0

DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 ≤ 80ms latency constraint

Deadline: Friday 2/13
https://bit.ly/3NYxKQb

10.02.2026 14:10 👍 0 🔁 0 💬 0 📌 0

Endpoints 2025 - Lora Aroyo: Multimodal Safety: Assessing Global and Cultural Sensitivities in AI YouTube video by MLCommons

Dr. Lora Aroyo (@GoogleDeepMind) on building benchmarks that actually work across languages, cultures, and modalities.

Most benchmarks are too narrow. This is how we fix that.

Watch: https://bit.ly/4bAomff

#Endpoints2025 #MultilingualAI

10.02.2026 11:52 👍 0 🔁 0 💬 0 📌 0

MLCommons is advancing inclusive AI standards across APAC.
Feb 10: Lora Aroyo (Google) joins Korea AISI, IMDA, SNU & CeRAI to discuss benchmark inclusivity, AI governance standards, and balancing rapid adoption with safety.
📅 Feb 10, 11:15 AM SGT
📍 Google Singapore

09.02.2026 13:50 👍 1 🔁 1 💬 1 📌 0

A hand holding a megaphone is featured on the left side of the image with the "ML Commons" logo above it. To the right, the text reads "MLPerf Inference v6.0," "Call for Submission," with a deadline noted as "Submit by 2/13." The image notes a new benchmark involving Shopify's e-commerce data.

Real Shopify data in MLPerf v6.0.
Typos, multilingual text, embedded HTML—all the complexity production AI faces at scale.
No synthetic data. Submit Feb 13 https://bit.ly/4k9F5YS

05.02.2026 15:51 👍 0 🔁 0 💬 0 📌 0

Endpoints 2025 - Luis Oala: The AI Economy: Flywheels for Agentic Value Creation YouTube video by MLCommons

The AI Economy: Flywheels for Agentic Value Creation

Luis Oala on the economics that matter:
→ Data flywheels
→ Token efficiency
→ Compounding advantages
→ Sustainable growth

Watch: https://youtu.be/a8psNW72l3Q

#Endpoints2025 #AIEconomy

05.02.2026 08:46 👍 0 🔁 0 💬 0 📌 0

4/4
Multimodal AI → $10.89B by 2030
🛒 Retail CAGR: 34.6%
⚡ Production VLM benchmark for 2026
Hardware vendors, cloud providers: prove your stack.
⏰ Feb 13 deadline →https://mlcommons.org/2026/02/vlm-inference-shopify/

04.02.2026 15:07 👍 0 🔁 0 💬 0 📌 0

3/4
"...We're contributing these foundations to the ecosystem so the next generation of AI is ready to power commerce at global scale."
— Kshetrajna Raghavan, Principal Engineer ML

04.02.2026 15:07 👍 0 🔁 0 💬 1 📌 0

2/4
"Open infrastructure is essential to the future of agentic commerce and Shopify is building the systems to power it, from our open-source Standard Product Taxonomy to the Catalog benchmark....
— Kshetrajna Raghavan, Principal Engineer ML

04.02.2026 15:07 👍 1 🔁 0 💬 1 📌 0

1/4 🧵
First Qwen model in MLPerf.
40M products daily.
Real production data from Shopify's e-commerce infrastructure.
Submit by Feb 13, 2026 👇
#MLPerf #Shopify #VLM #MLCommons

04.02.2026 15:07 👍 0 🔁 0 💬 1 📌 0

🎥 The Reliability Basket: What is Risk Coverage?

Panel on measuring what can go wrong with AI systems.

Beyond accuracy → comprehensive risk assessment.

Watch: https://youtu.be/UgETTwnz6kY

#Endpoints2025 #AIReliability

03.02.2026 18:13 👍 0 🔁 0 💬 0 📌 0

Call for Submission: Qwen3 VL MoE for MLPerf Inference v6.0 - MLCommons MLCommons and Shopify debut MLPerf Inference v6.0 with Qwen3-VL and Product Catalog dataset for real-world e-commerce AI. Submit by February 13, 2026.

Processing 40 million products daily with 78.24% accuracy on noisy, multilingual catalog data.
Not a lab benchmark—Shopify's actual production reality.
Submit your VLM stack by Feb 13 →
https://mlcommons.org/2026/02/vlm-inference-shopify
#AIBenchmark

03.02.2026 16:00 👍 0 🔁 0 💬 0 📌 0

MLCommons

Latest posts by MLCommons @mlcommons.org