oh- would
oh- would
I also dislike that I can't make private accounts on this platform lol
I think about Move 37 often. It will be so incredible one day when (hopefully safely!) we can get the models to come up with these creative breakthroughs in scientific domains.
A novel cancer drug, a RTP superconductor, a proof of RH, etc, etc. are all things that would net benefit humanity.
Guh I hate that hardly any of the accounts I follow and interact with are on this platform. Would have abandoned Twitter a long time ago if not for thatβ¦
Would an eventual LLM-generated correct proof of the Riemann hypothesis suffice to be impressed?
I mean it is impressive because itβs work that would be suitable for a human postdoc to write up and publish. Certainly we didnβt have models prior that could do work of such sophistication only a year ago.
I donβt really agree with this. The reasoning models have proven capable of deductive reasoning in combining established results and ideas in novel ways to form a new result (e.g. recent ErdΕs problems results). They have yet to prove capable of forming a novel useful non-trivial concept though.
Wonβt announce this on my Twitter for a while, but Iβll be working at OpenAI in San Francisco during the summer.
If any oomfs there are interested in meeting up, lmk!
Anyhow we originally were to classify this as a human-AI collaboration result prior to the new model variant. When it was able to replicate the proof, we then moved it to an autonomous AI result.
I then also formalised my corrected proof in Lean 4. Eventually though, a newer model variant was able to independently reproduce the proof except it made a mistake in the proof of Lemma 2 where it took strict inequalities despite nothing stopping the b_k from being one apart.
It was LaTeX. The proof in the Aletheia ErdΕs problems paper is actually in fact my proof. We tried the model on a few different Deep Think variants, but none produced a quite right proof, but one gave something right enough that I could see how to fix it to give a correct proof.
My collaborators and I later generalised Aletheiaβs solution to 1051 to give this paper that Iβm quite proud of. I think thereβs room to push the results further but we havenβt explored it yet. arxiv.org/abs/2601.21442
Thanks! We tried to be be very careful in the announcement of these results. I was pushing for several caveats in this work hah. I like its solution for 1051. I felt confident enough to write this paragraph because it seemed like it was in another class of LLM proofs on the ErdΕs problems so far.
A new blog post by @acerfur.bsky.social describing his experience as a pioneer of using AI tools to solve ErdΕs problems:
www.erdosproblems.com/forum/thread...
AI is capable now of generating new interesting mathematics.
But it's much easier for it to generate plausible-sounding nonsense.
I am concerned that the latter, copied and promoted by users with no understanding of the mathematics, is going to drown out the former.
Chat do I go to San Francisco to work on reasoning for mathematics at OpenAI
Sure thing. Itβs all here, say:
www.erdosproblems.com/728
Yes I know what epistemic means. Regardless, people more familiar with the subject of mathematics than you have confirmed that these are novel proofs. Perhaps put your ego and denialist cope in check. No need to be condescending.
>it has no epistemic certainty it is a proof or not
FWIW actually this is not true of GPT-5.2. It is generally highly cautious and will not say it has a proof unless itβs very confident all the logical deductions correctly follow. It will more often than not know its limitations and happily concede
One of the big challenges now in using AI for mathematics is the credit/attribution problem. AI has a tendency to use observations/techniques without giving credit as to where it 'learnt' about them (mainly because it's forgotten itself).
Elaborate?
Youβre saying this like humans get things right all the time and that we donβt make mistakes or lie lol
Yeah pretty much
Ok good, I interpreted it that way as well. GPT-5.2 Pro seemed to have given a positive answer to that at the end of the conversation link I sent which seems to be a plausible sketch to me but I only looked at it very briefly. Will try to get Aristotle to autoformalise it when Iβm back home.
Indeed yeah this is what Iβm currently exploring. Waiting on others to chime in on how to best interpret the intent of the others.
>canβt prove a thing
>proves open ErdΕs problems as true
Interesting
All well and good, but even if it just predicts the next token, Iβll happily take an LLM-generated proof the Riemann hypothesis!
I agree, but if we want these models to have a crack at the hardest of problems then they need to be able to develop new theory. When will a model be capable of doing what Galois, Ramsey, or Grothendieck did and starting a whole new field of maths (or developing it with a novel concept like schemes)
All good! Glad this is interesting to you! Big fan of your comics :)
and yetβ¦ they can exceed some humans. My opinion is that if it can imitate reasoning well enough that it would be passed as human reasoning, then we should give it the benefit of the doubt and say it *is* reasoning Γ la Turingβs test.