Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian and frequentist methods you should be using instead.
We also provide a super lightweight library: github.com/sambowyer/bayeβ¦ π§΅π
06.03.2025 15:00
π 25
π 8
π¬ 1
π 0
Go read it on arXiv! Thanks to my co-authors @sambowyer.bsky.social and @laurenceai.bsky.social π₯
06.03.2025 15:00
π 4
π 1
π¬ 1
π 0
π£ Jobs alert
Weβre hiring postdoc and research engineer to work on UQ for LLMs!! Details β¬οΈ
#ai #llm #uq
12.02.2025 16:26
π 13
π 11
π¬ 0
π 0
Do you know what rating youβll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3βs? This thread is for you.
27.11.2024 17:25
π 77
π 20
π¬ 4
π 3
Would love to be added!
27.11.2024 21:47
π 1
π 0
π¬ 2
π 0
But you can't prove that the *real* asteroid won't hit earth, because the real world isn't your simplified model. e.g. you don't know the initial conditions, there might be other bodies you aren't aware of etc. etc.
27.11.2024 10:44
π 0
π 0
π¬ 0
π 0
The analogy we're working from is "mathematically provable asteroid safety": within a simplified mathematical model, with known initial conditions, you can prove that an asteroid won't hit earth. (2/3)
27.11.2024 10:44
π 0
π 0
π¬ 1
π 0
Does anyone want to collaborate on an ICML position paper on "The impossibility of mathematically proving AI safety"? The basic thesis being that it is a category error to try to prove AI safety in the real world. (1/3)
27.11.2024 10:44
π 2
π 0
π¬ 1
π 0
Can you add?
26.11.2024 10:58
π 0
π 0
π¬ 0
π 0