"Literally" is a nice touch.
"Literally" is a nice touch.
Whatβs the best current SNARK or computationally sound proof (if zero-knowledge doesnβt matter) to use for this kind of arbitrary-length computation purpose?
We now welcome OpenAI, Microsoft, the Australian Department of Industry, Science and Resourcesβ AI Safety Institute, the AI Safety Tactical Opportunities Fund, Sympatico Ventures, and Renaissance Philanthropy. β€οΈ
Thank you to all of our partners! The 2025 launch was backed by an international coalition including the Canadian AI Safety Institute, CIFAR, Schmidt Sciences, AWS, Anthropic, Halcyon Futures, the Safe AI Fund, UK Research and Innovation, and UK ARIA. β€οΈ
I'm excited that AISI is announcing the first 60 Alignment Project grants, bringing more independent experts and ideas into AI alignment and control research! Since the RFP last year, we've grown the total funding to Β£27M. Which means more ideas will be explored! π§΅
www.aisi.gov.uk/blog/funding...
Unless you make multiple accounts!
It is unfortunately that arxiv has no mechanism for setting a social media image for a paper. I do not need the enormous ARXIV logo for this kind of link.
Oops, here's the real paper link: arxiv.org/abs/2502.14828
The scala replacements are truly cursed.
1. No, BPJ is fully black box.
2. Yep, by many of the same people. :)
arxiv.org/html/2502.14...
Work by the AISI Red Team + advisors: Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, me, and Yarin Gal. The Red Team is hiring if you like breaking things to motivate stronger defences! π
We believe this kind of attack will be very difficult to defend against with per-point jailbreak defences, but more tractable using defences that notice patterns across many queries as the method generates many failed attempts along the way.
arxiv.org/abs/2602.15001
New "boundary point jailbreaking" method against LLM safeguards (with prior disclosure to multiple labs) by using noised versions of harmful queries to turn sparse feedback from failed attacks into dense feedback. π§΅
www.aisi.gov.uk/blog/boundar...
Yeah, alas I need a rigorous, deterministic proof (for Lean purposes), so no randomness allowed. But also we should 100% get a standardised SNARK system into Lean as an option axiom. There are tricks for how to formalise that alongside a conventional kernel, but nothing that can't be surmounted.
Eigenvectors don't work as a certificate: SPD matrices are usually full rank, and checking that you have an eigenvector basis is not quadratic time.
They don't give you rigorous bounds, though. It's always possible you missed the critical eigenspace.
Anyone know if there are certificates for any sparse, symmetric positive definition matrix to be positive definite, that can be checked in quadratic time? Emphasis on *any* such matrix, no structure or other assumptions allowed other than SPD.
scicomp.stackexchange.com/questions/45...
The nogil stuff is truly a thing of beauty or of baling wire, depending on one's perspective.
If he's not careful, soon he will turn into an Alan Kay graduate student.
This doesn't treat obfuscated arguments: the provers are unbounded, and the verifier's choice of bits to read can be hard to compute. But hopefully a clean description of the unbounded setting helps get more people to think about the bounded case!
www.alignmentforum.org/posts/DGt9mJ...
The cross-examination trick is to have one prover simulate a naive verifier, and the other "cross-examine" the simulation so the actual verifier need only check one step of the overall computation. Reading only O(log n) bits then gets us all the way to PSPACE/poly.
Joint with Jonah Brown-Cohen, Simon Marshall, Ilan Newman, Georgios Piliouras, and Mario Szegedy (now I have papers with multiple Szegedy brothers :)).
arxiv.org/abs/2602.08630
New complexity theory paper mapping the precise query complexity of debate, given unbounded provers. No new safety ideas: the goal is a self-contained presentation of debate + cross-examination, with the precise complexity class it achieves. π§΅
Apparently last year there was a cyberattack on the Irving Medical Center at Columbia University, which resulted in my name getting leaked. I...feel like they didn't need a cyberattack to do that in this case?
Ah, actually I had just failed to read correctly: yes, certainly the answer in this case would be expose as a feed.
If you have the models tweak NetNewsWire I can more easily inherit your improvements. :)
The random oracle hypothesis is mostly true.
Today weβre releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. π§΅
(1/19)
Being one of the two Deputy Directors of AISI's Research Unit is a very central and important role! Please apply if interested!
> This isnβt your average Civil Service job. For 9β12 months, youβll co-lead one of the worldβs most influential AI safety research organisations.
x.com/nateburnikel...
Right, but it isnβt big enough: there are (2^n)^(2^n)! keyed permutations, which is doubly exponential, and only exponentially many polynomial size circuits.