125+ members. Safety, inference, training, storage, automotive benchmarks & more. MLCommons exists because neutral AI measurement standards require a community. Get involved: mlcommons.org/get-involved/ #MLCommons
@mlcommons.org
MLCommons is an AI engineering consortium, built on a philosophy of open collaboration to improve AI systems. Through our collective engineering efforts, we continually measure and improve AI technologies' accuracy, safety, speed, and efficiency.
125+ members. Safety, inference, training, storage, automotive benchmarks & more. MLCommons exists because neutral AI measurement standards require a community. Get involved: mlcommons.org/get-involved/ #MLCommons
Results day is coming.
MLPerf Inference v6.0 drops April 1 β cross-platform AI inference data spanning datacenter, edge & more. Follow so you don't miss it.
#MLPerf #AIInference
AI benchmarking is moving to the API layer. MLCommons Co-Founder David Kanter presents MLPerf Endpoints at NVIDIA GTC on March 19, 12β12:40 PM PT. Worth attending if you're making decisions about AI infrastructure.
https://bit.ly/4l5p8Uh
#GTC2026 #MLCommons
AI risk assessment now has a global standard. The MLCommons AILuminate Global Assurance Program gives organizations a structured, independent path to evaluate AI reliability. πhttps://mlcommons.org/2026/02/ailuminate-global-assurance/
#AIGovernance #AILuminate
How do you make jailbreak benchmarks defensible to auditors? MLCommons just published a mechanism-first taxonomy for reproducible, governance-aligned LLM robustness evaluation. Not a leaderboard β the foundation for trustworthy ones. π https://bit.ly/3ZCHqlZ
#AILuminate
Last week, Rebecca Weiss announced the AILuminate Global Assurance Program from the stage of a global AI standards panel in New Delhi. Read about the program and watch the panel below.
π https://bit.ly/4kIS18x
βΆοΈ https://bit.ly/46nDtp3
#AIImpactSummit #MLCommons #AILuminate
2026 Rising Star applications are open!
The ML and Systems Rising Stars program is open to all graduate students and postdoctoral associates with research backgrounds/interests in machine learning and systems.
Apply now!
https://bit.ly/4s4j4h6
Applications Deadline: Wednesday, March 18, 2026
Announcing the AILuminate Global Assurance Program.
MLCommons + Google, Microsoft, Qualcomm & KPMG are building a structured framework for AI risk measurement:
π§ Benchmarking as a Service
π·οΈ AILuminate Risk Labels
π A Global Framework
https://bit.ly/4kIS18x
Microsoft is co-leading the expansion of MLCommons' AILuminate benchmark to Hindi, Tamil, Malay, Japanese & Korean -AI safety evals built from the ground up in local linguistic and cultural contexts.
blogs.microsoft.com/on-the-issues/2026/02/17/acting-with-urgency-to-address-the-growing-ai-divide/
Friday at #AIImpactSummitIndia β who controls AI matters as much as who can access it.
#MLCommons' Peter Mattson joins The Alan Turing Institute, Microsoft, Gates Foundation, GoI & more for "Trustworthy AI for All."
π₯ 10:30β11:30 AM IST | Bharat Mandapam
βΆοΈ https://bit.ly/40jS0OQ
#TrustworthyAI
Attending #AIImpactSummitIndia? Come find us! π
Feb 20:
10:30 AM IST β Peter Mattson: Trustworthy AI for All
βΆοΈ https://bit.ly/40jS0OQ
12:30 PM IST β Rebecca Weiss: Standards as Strategy
βΆοΈ https://bit.ly/4kHum8h
We'd love to connect! DM us or reply below!
Feb 20 at #AIImpactSummit: MLCommons' Rebecca Weiss joins leaders from OpenAI, Google DeepMind, Microsoft & more for "Standards as Strategy: Accelerating AI Market Growth."
π§ 12:30β1:30 PM IST | Sushma Swaraj Bhawan
βΆοΈ https://bit.ly/4kHum8h
Shalaleh Rismani: Safety Is a Systems Problem: Understanding and Evaluating AI in Context
Her core argument: We evaluate AI models in lab conditions, then wonder why things go wrong in deployment. But safety isn't about the model in isolation - it's about the model in context.
https://bit.ly/4kHtKzv
Most jailbreak testing is ad hoc β informal attack sets, inconsistent classification, and hard to defend to auditors.
New from MLCommons: a mechanism-first taxonomy for single-turn prompt attacks that enables deterministic, reproducible, and governance-ready evaluation.
https://bit.ly/3ZCHqlZ
β° 2 DAYS LEFT
MLPerf Inference v6.0 submissions close Friday, 2/13.
New:
-Qwen3-VL + Shopify (VLM, e-commerce scale)
-DLRMv3 (generative recommendations, 1TB)
-And much more!
Hardware vendors, cloud providers: submit your results β https://bit.ly/40bMEoI
AI agents need datasets that describe themselves. Croissant 1.1 from MLCommons adds machine-actionable provenance, semantic vocabularies, and embedded governanceβso autonomous systems can discover and use data while respecting permissions.
700K+ datasets and counting
π mlcommons.org/2026/02/croi...
AI is probabilistic by designβsame input, different output every time. Thatβs what makes it powerful AND hard to trust.
New from MLCommons: Why technical standards are the bridge between ISO objectives and real-world AI reliability πhttps://mlcommons.org/2026/02/ai-standards-bridge-adoption/
#Endpoints2025 - Efficiency Panel
Deep dive on efficiency across the AI stack:
β Token production
β Model optimization
β Deployment efficiency
Watch: https://bit.ly/46lhTBm
Michael Stewart (M12), Zain Asgar (Gimlet Labs), S. Annapureddy (Netradyne), Naveen Rao (UnconventionalAI), Jay Ram (Hud)
DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 β€ 80ms latency constraint
Deadline: Friday 2/13
https://bit.ly/3NYxKQb
DLRMv3 in MLPerf Inference v6.0: HSTU-based generative recommendation benchmark
- 1TB model (1B embedding hash, 512-dim)
- 260 GFLOP/candidate, 5-layer encoder
- Streaming dataset: 5M users, 100 timestamps
- P99 β€ 80ms latency constraint
Deadline: Friday 2/13
https://bit.ly/3NYxKQb
Dr. Lora Aroyo (@GoogleDeepMind) on building benchmarks that actually work across languages, cultures, and modalities.
Most benchmarks are too narrow. This is how we fix that.
Watch: https://bit.ly/4bAomff
#Endpoints2025 #MultilingualAI
MLCommons is advancing inclusive AI standards across APAC.
Feb 10: Lora Aroyo (Google) joins Korea AISI, IMDA, SNU & CeRAI to discuss benchmark inclusivity, AI governance standards, and balancing rapid adoption with safety.
π
Feb 10, 11:15 AM SGT
π Google Singapore
A hand holding a megaphone is featured on the left side of the image with the "ML Commons" logo above it. To the right, the text reads "MLPerf Inference v6.0," "Call for Submission," with a deadline noted as "Submit by 2/13." The image notes a new benchmark involving Shopify's e-commerce data.
Real Shopify data in MLPerf v6.0.
Typos, multilingual text, embedded HTMLβall the complexity production AI faces at scale.
No synthetic data. Submit Feb 13 https://bit.ly/4k9F5YS
The AI Economy: Flywheels for Agentic Value Creation
Luis Oala on the economics that matter:
β Data flywheels
β Token efficiency
β Compounding advantages
β Sustainable growth
Watch: https://youtu.be/a8psNW72l3Q
#Endpoints2025 #AIEconomy
4/4
Multimodal AI β $10.89B by 2030
π Retail CAGR: 34.6%
β‘ Production VLM benchmark for 2026
Hardware vendors, cloud providers: prove your stack.
β° Feb 13 deadline βhttps://mlcommons.org/2026/02/vlm-inference-shopify/
3/4
"...We're contributing these foundations to the ecosystem so the next generation of AI is ready to power commerce at global scale."
β Kshetrajna Raghavan, Principal Engineer ML
2/4
"Open infrastructure is essential to the future of agentic commerce and Shopify is building the systems to power it, from our open-source Standard Product Taxonomy to the Catalog benchmark....
β Kshetrajna Raghavan, Principal Engineer ML
1/4 π§΅
First Qwen model in MLPerf.
40M products daily.
Real production data from Shopify's e-commerce infrastructure.
Submit by Feb 13, 2026 π
#MLPerf #Shopify #VLM #MLCommons
π₯ The Reliability Basket: What is Risk Coverage?
Panel on measuring what can go wrong with AI systems.
Beyond accuracy β comprehensive risk assessment.
Watch: https://youtu.be/UgETTwnz6kY
#Endpoints2025 #AIReliability
Processing 40 million products daily with 78.24% accuracy on noisy, multilingual catalog data.
Not a lab benchmarkβShopify's actual production reality.
Submit your VLM stack by Feb 13 β
https://mlcommons.org/2026/02/vlm-inference-shopify
#AIBenchmark