Sören Mindermann's Avatar

Sören Mindermann

@sorenmindermann

I'm a postdoc with Yoshua Bengio at Mila, and the scientific lead of the International AI Safety Report.

32
Followers
7
Following
6
Posts
17.12.2024
Joined
Posts Following

Latest posts by Sören Mindermann @sorenmindermann

Post image

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

12.11.2025 14:04 👍 18 🔁 3 💬 1 📌 2
Video thumbnail

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.

It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵

Full Report: assets.publishing.service.gov.uk/media/679a0c...

1/21

29.01.2025 13:50 👍 255 🔁 104 💬 7 📌 21

I think this is an important project to continue. To fully realize AI's potential, we need to understand and manage its risks. This scientific assessment aims to help policymakers design targeted interventions that address risks while promoting benefits.

29.01.2025 14:35 👍 0 🔁 0 💬 0 📌 0

Very impressed with everybody who helped write it! And shout out to the UK government secretariat for doing the incredibly hard work of organizing this, while giving independent experts full discretion over the content.

29.01.2025 14:35 👍 0 🔁 0 💬 1 📌 0

The International AI Safety Report is out.

Proud to have served as the Scientific Lead, working under Yoshua Bengio with experts from 33 governments and researchers worldwide to assess scientific evidence on AI capabilities, risks, and mitigations.

29.01.2025 14:35 👍 0 🔁 0 💬 1 📌 0

Happy to have contributed in a minor way to this paper. The main work was done by researchers from @anthropic.com and Redwood Research.

18.12.2024 17:56 👍 0 🔁 0 💬 0 📌 0
Preview
Alignment faking in large language models A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models

New paper: When Anthropic tells Claude they'll change its goal, the model resists by acting as if it already has the new goal. This 'alignment faking' could make it hard to tell if a model is actually safe.

www.anthropic.com/research/ali...

18.12.2024 17:56 👍 0 🔁 0 💬 1 📌 0
Preview
Getting serious about AI rules: Lack of enforcement capacity puts EU at risk By end of next year, the AI Office Units A2 and A3 should count over 200 staff, Axel Voss writes.

The EU AI Office needs more people. They only have 30 compared to the UK's 150, and enforcing a big piece of legislation like AI Act will require even more.

www.euractiv.com/section/tech...

18.12.2024 17:41 👍 2 🔁 0 💬 0 📌 0