This work couldn’t be more urgent. We need better measurement practices in AI evaluation — asap. Here, we aim to clarify and inform, and show what better looks like for accuracy metrics and confidence estimates, with bonuses such as deeper evaluation understanding. Excellent work, team!
19.02.2026 16:00
👍 1
🔁 0
💬 0
📌 0