Pawel Szczesny's Avatar

Pawel Szczesny

@pawelszczesny

Now: evals, stability, psychology of Large Language Models @ Neurofusion Lab Previously: R&D (academia & industry) in comp-bio, medtech, data science, VR psytech, nootropics.

11
Followers
14
Following
4
Posts
21.11.2024
Joined
Posts Following

Latest posts by Pawel Szczesny @pawelszczesny

I'm advocating locally that such evals should be created as a part of a work on internal AI policy, together with a list of triggers that force company to run these evals again. Otherwise, it's pushed to R&R/IT team and company is lacking the understanding (again) of capabilities of this tech.

08.12.2024 19:34 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We did an internal experiment simulating insurance claim and got very similar results - clients from Africa had the lowest acceptance rate, but only in very specific scenarios. It seems that anti-discrimination guardrails aren't perfect.

08.12.2024 18:09 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

There's non-linear relationship between temperature and instruction. When I ask OpenAI's 4o-mini about cardinal and intercardinal directions on a compass rose and start to swap words/phrases for synonyms, it turns out that some combinations give accuracy of 0%.

08.12.2024 17:45 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

In one of my experiments I've tested what is distribution of scores assigned to a CV by a LLM when it's given a CV that is matching an offer and when it's not matching (instruction taken from a real ATS system). Variables: run (10 times) and synonyms in instruction.

Not bad, not great either.

08.12.2024 17:39 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0