🚨New paper🚨
From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.
🧵🧵🧵
🚨New paper🚨
From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.
🧵🧵🧵
Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.
It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵
Full Report: assets.publishing.service.gov.uk/media/679a0c...
1/21
I think this is an important project to continue. To fully realize AI's potential, we need to understand and manage its risks. This scientific assessment aims to help policymakers design targeted interventions that address risks while promoting benefits.
Very impressed with everybody who helped write it! And shout out to the UK government secretariat for doing the incredibly hard work of organizing this, while giving independent experts full discretion over the content.
The International AI Safety Report is out.
Proud to have served as the Scientific Lead, working under Yoshua Bengio with experts from 33 governments and researchers worldwide to assess scientific evidence on AI capabilities, risks, and mitigations.
Happy to have contributed in a minor way to this paper. The main work was done by researchers from @anthropic.com and Redwood Research.
New paper: When Anthropic tells Claude they'll change its goal, the model resists by acting as if it already has the new goal. This 'alignment faking' could make it hard to tell if a model is actually safe.
www.anthropic.com/research/ali...
The EU AI Office needs more people. They only have 30 compared to the UK's 150, and enforcing a big piece of legislation like AI Act will require even more.
www.euractiv.com/section/tech...