MLCommons's Avatar

MLCommons

@mlcommons.org

MLCommons is an AI engineering consortium, built on a philosophy of open collaboration to improve AI systems. Through our collective engineering efforts, we continually measure and improve AI technologies' accuracy, safety, speed, and efficiency.

178
Followers
54
Following
132
Posts
21.11.2024
Joined
Posts Following

Latest posts by MLCommons @mlcommons.org

Preview
Join our AI Community | MLCommons Join the MLCommons community. As the leading AI benchmarking organization in the world, our membership spans over 125+ global organizations.

125+ members. Safety, inference, training, storage, automotive benchmarks & more. MLCommons exists because neutral AI measurement standards require a community. Get involved: mlcommons.org/get-involved/ #MLCommons

06.03.2026 16:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Results day is coming.
MLPerf Inference v6.0 drops April 1 β€” cross-platform AI inference data spanning datacenter, edge & more. Follow so you don't miss it.
#MLPerf #AIInference

05.03.2026 15:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Standardize Gen AI Service Evaluation: An API-Centric Benchmarking Approach Evaluating Gen AI services across heterogeneous environments presents significant visibility gaps for developers, as traditional hardware-centric b...

AI benchmarking is moving to the API layer. MLCommons Co-Founder David Kanter presents MLPerf Endpoints at NVIDIA GTC on March 19, 12–12:40 PM PT. Worth attending if you're making decisions about AI infrastructure.
https://bit.ly/4l5p8Uh
#GTC2026 #MLCommons

04.03.2026 15:21 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

AI risk assessment now has a global standard. The MLCommons AILuminate Global Assurance Program gives organizations a structured, independent path to evaluate AI reliability. πŸ”—https://mlcommons.org/2026/02/ailuminate-global-assurance/

#AIGovernance #AILuminate

03.03.2026 15:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
MLCommons Lays the Foundation for Defensible Jailbreak Benchmarking - MLCommons MLCommons introduces a mechanism-first taxonomy for single-turn jailbreak attacks, providing the structural foundation for defensible, reproducible AI security evaluation

How do you make jailbreak benchmarks defensible to auditors? MLCommons just published a mechanism-first taxonomy for reproducible, governance-aligned LLM robustness evaluation. Not a leaderboard β€” the foundation for trustworthy ones. πŸ”— https://bit.ly/3ZCHqlZ
#AILuminate

02.03.2026 21:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Last week, Rebecca Weiss announced the AILuminate Global Assurance Program from the stage of a global AI standards panel in New Delhi. Read about the program and watch the panel below.
πŸ“– https://bit.ly/4kIS18x
▢️ https://bit.ly/46nDtp3
#AIImpactSummit #MLCommons #AILuminate

23.02.2026 18:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

2026 Rising Star applications are open!
The ML and Systems Rising Stars program is open to all graduate students and postdoctoral associates with research backgrounds/interests in machine learning and systems.
Apply now!
https://bit.ly/4s4j4h6
Applications Deadline: Wednesday, March 18, 2026

23.02.2026 14:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Announcing the AILuminate Global Assurance Program.
MLCommons + Google, Microsoft, Qualcomm & KPMG are building a structured framework for AI risk measurement:
πŸ”§ Benchmarking as a Service
🏷️ AILuminate Risk Labels
🌍 A Global Framework
https://bit.ly/4kIS18x

20.02.2026 02:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Preview
Acting with urgency to address the growing AI divide Microsoft announces at the India AI Impact Summit it is on pace to invest $50 billion by the end of the decade to help bring AI to countries across the Global South

Microsoft is co-leading the expansion of MLCommons' AILuminate benchmark to Hindi, Tamil, Malay, Japanese & Korean -AI safety evals built from the ground up in local linguistic and cultural contexts.
blogs.microsoft.com/on-the-issues/2026/02/17/acting-with-urgency-to-address-the-growing-ai-divide/

19.02.2026 20:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Friday at #AIImpactSummitIndia β€” who controls AI matters as much as who can access it.
#MLCommons' Peter Mattson joins The Alan Turing Institute, Microsoft, Gates Foundation, GoI & more for "Trustworthy AI for All."
πŸ•₯ 10:30–11:30 AM IST | Bharat Mandapam
▢️ https://bit.ly/40jS0OQ
#TrustworthyAI

19.02.2026 04:26 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Attending #AIImpactSummitIndia? Come find us! πŸ‘‹
Feb 20:
10:30 AM IST – Peter Mattson: Trustworthy AI for All
▢️ https://bit.ly/40jS0OQ
12:30 PM IST – Rebecca Weiss: Standards as Strategy
▢️ https://bit.ly/4kHum8h
We'd love to connect! DM us or reply below!

19.02.2026 03:13 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Feb 20 at #AIImpactSummit: MLCommons' Rebecca Weiss joins leaders from OpenAI, Google DeepMind, Microsoft & more for "Standards as Strategy: Accelerating AI Market Growth."
πŸ•§ 12:30–1:30 PM IST | Sushma Swaraj Bhawan
▢️ https://bit.ly/4kHum8h

18.02.2026 16:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Endpoints 2025 - Shalaleh Rismani: Safety Is a Systems Problem: Understanding & Evaluating AI
Endpoints 2025 - Shalaleh Rismani: Safety Is a Systems Problem: Understanding & Evaluating AI YouTube video by MLCommons

Shalaleh Rismani: Safety Is a Systems Problem: Understanding and Evaluating AI in Context
Her core argument: We evaluate AI models in lab conditions, then wonder why things go wrong in deployment. But safety isn't about the model in isolation - it's about the model in context.
https://bit.ly/4kHtKzv

18.02.2026 15:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Most jailbreak testing is ad hoc β€” informal attack sets, inconsistent classification, and hard to defend to auditors.
New from MLCommons: a mechanism-first taxonomy for single-turn prompt attacks that enables deterministic, reproducible, and governance-ready evaluation.
https://bit.ly/3ZCHqlZ

16.02.2026 22:04 πŸ‘ 3 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

⏰ 2 DAYS LEFT
MLPerf Inference v6.0 submissions close Friday, 2/13.
New:
-Qwen3-VL + Shopify (VLM, e-commerce scale)
-DLRMv3 (generative recommendations, 1TB)
-And much more!

Hardware vendors, cloud providers: submit your results β†’ https://bit.ly/40bMEoI

12.02.2026 19:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

AI agents need datasets that describe themselves. Croissant 1.1 from MLCommons adds machine-actionable provenance, semantic vocabularies, and embedded governanceβ€”so autonomous systems can discover and use data while respecting permissions.
700K+ datasets and counting
πŸ”— mlcommons.org/2026/02/croi...

12.02.2026 15:33 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Post image

AI is probabilistic by designβ€”same input, different output every time. That’s what makes it powerful AND hard to trust.

New from MLCommons: Why technical standards are the bridge between ISO objectives and real-world AI reliability πŸ”—https://mlcommons.org/2026/02/ai-standards-bridge-adoption/

11.02.2026 16:19 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Endpoints 2025 -Token Efficiency Panel: M. Stewart, Zain Asgar, S. Annapureddy, Naveen Rao, Jay Ram
Endpoints 2025 -Token Efficiency Panel: M. Stewart, Zain Asgar, S. Annapureddy, Naveen Rao, Jay Ram YouTube video by MLCommons

#Endpoints2025 - Efficiency Panel
Deep dive on efficiency across the AI stack:
β†’ Token production
β†’ Model optimization
β†’ Deployment efficiency
Watch: https://bit.ly/46lhTBm
Michael Stewart (M12), Zain Asgar (Gimlet Labs), S. Annapureddy (Netradyne), Naveen Rao (UnconventionalAI), Jay Ram (Hud)

11.02.2026 13:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 ≀ 80ms latency constraint

Deadline: Friday 2/13
https://bit.ly/3NYxKQb

10.02.2026 14:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 ≀ 80ms latency constraint

Deadline: Friday 2/13
https://bit.ly/3NYxKQb

10.02.2026 14:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Endpoints 2025 - Lora Aroyo: Multimodal Safety: Assessing Global and Cultural Sensitivities in AI
Endpoints 2025 - Lora Aroyo: Multimodal Safety: Assessing Global and Cultural Sensitivities in AI YouTube video by MLCommons

Dr. Lora Aroyo (@GoogleDeepMind) on building benchmarks that actually work across languages, cultures, and modalities.

Most benchmarks are too narrow. This is how we fix that.

Watch: https://bit.ly/4bAomff

#Endpoints2025 #MultilingualAI

10.02.2026 11:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

MLCommons is advancing inclusive AI standards across APAC.
Feb 10: Lora Aroyo (Google) joins Korea AISI, IMDA, SNU & CeRAI to discuss benchmark inclusivity, AI governance standards, and balancing rapid adoption with safety.
πŸ“… Feb 10, 11:15 AM SGT
πŸ“ Google Singapore

09.02.2026 13:50 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
A hand holding a megaphone is featured on the left side of the image with the "ML Commons" logo above it. To the right, the text reads "MLPerf Inference v6.0," "Call for Submission," with a deadline noted as "Submit by 2/13." The image notes a new benchmark involving Shopify's e-commerce data.

A hand holding a megaphone is featured on the left side of the image with the "ML Commons" logo above it. To the right, the text reads "MLPerf Inference v6.0," "Call for Submission," with a deadline noted as "Submit by 2/13." The image notes a new benchmark involving Shopify's e-commerce data.

Real Shopify data in MLPerf v6.0.
Typos, multilingual text, embedded HTMLβ€”all the complexity production AI faces at scale.
No synthetic data. Submit Feb 13 https://bit.ly/4k9F5YS

05.02.2026 15:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Endpoints 2025 - Luis Oala: The AI Economy: Flywheels for Agentic Value Creation
Endpoints 2025 - Luis Oala: The AI Economy: Flywheels for Agentic Value Creation YouTube video by MLCommons

The AI Economy: Flywheels for Agentic Value Creation

Luis Oala on the economics that matter:
β†’ Data flywheels
β†’ Token efficiency
β†’ Compounding advantages
β†’ Sustainable growth

Watch: https://youtu.be/a8psNW72l3Q

#Endpoints2025 #AIEconomy

05.02.2026 08:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

4/4
Multimodal AI β†’ $10.89B by 2030
πŸ›’ Retail CAGR: 34.6%
⚑ Production VLM benchmark for 2026
Hardware vendors, cloud providers: prove your stack.
⏰ Feb 13 deadline β†’https://mlcommons.org/2026/02/vlm-inference-shopify/

04.02.2026 15:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

3/4
"...We're contributing these foundations to the ecosystem so the next generation of AI is ready to power commerce at global scale."
β€” Kshetrajna Raghavan, Principal Engineer ML

04.02.2026 15:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

2/4
"Open infrastructure is essential to the future of agentic commerce and Shopify is building the systems to power it, from our open-source Standard Product Taxonomy to the Catalog benchmark....
β€” Kshetrajna Raghavan, Principal Engineer ML

04.02.2026 15:07 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

1/4 🧡
First Qwen model in MLPerf.
40M products daily.
Real production data from Shopify's e-commerce infrastructure.
Submit by Feb 13, 2026 πŸ‘‡
#MLPerf #Shopify #VLM #MLCommons

04.02.2026 15:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸŽ₯ The Reliability Basket: What is Risk Coverage?

Panel on measuring what can go wrong with AI systems.

Beyond accuracy β†’ comprehensive risk assessment.

Watch: https://youtu.be/UgETTwnz6kY

#Endpoints2025 #AIReliability

03.02.2026 18:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Call for Submission: Qwen3 VL MoE for MLPerf Inference v6.0 - MLCommons MLCommons and Shopify debut MLPerf Inference v6.0 with Qwen3-VL and Product Catalog dataset for real-world e-commerce AI. Submit by February 13, 2026.

Processing 40 million products daily with 78.24% accuracy on noisy, multilingual catalog data.
Not a lab benchmarkβ€”Shopify's actual production reality.
Submit your VLM stack by Feb 13 β†’
https://mlcommons.org/2026/02/vlm-inference-shopify
#AIBenchmark

03.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0