Mike Dodds's Avatar

Mike Dodds

@m-dodds

Formal methods nitwit. https://mikedodds.github.io AI / math / formal methods paper feed: @ai-fm-papers.bsky.social

223
Followers
57
Following
59
Posts
21.11.2024
Joined
Posts Following

Latest posts by Mike Dodds @m-dodds

Context: Knuth on Claude www-cs-faculty.stanford.edu/~knuth/paper...

05.03.2026 03:42 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
GitHub - septract/claudes-cycles-lean Contribute to septract/claudes-cycles-lean development by creating an account on GitHub.

I formalised the Knuth / Stappers / Claude theorem in Lean4. Claude for scaffolding and Harmonicโ€˜s Aristotle AI for the core proofs

This is just the construction Claude found, not all 760 constructions

(Disclaimer: theorems look plausible to me, but mistakes possible)

github.com/septract/cla...

05.03.2026 03:40 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Commands: #tcb (what's in your trust base), #tcb_tree (dependency graph), #tcb_why (why is this included?). v0.1.0, rough edges expected, feedback welcome

github.com/OathTech/lea...

22.02.2026 19:42 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
GitHub - OathTech/lean-tcb Contribute to OathTech/lean-tcb development by creating an account on GitHub.

Weekend project w/ Claude: in Lean, it can be hard to know which definitions you need to review to trust a theorem. So I built lean-tcb. It figures out your trusted computing base, ie the definitions that actually give a theorem its meaning (vs proof machinery the kernel checks)

22.02.2026 19:42 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Experimenting with ACL2 and Claude Code TL;DR: Using only prompting with Claude Code, I created: 50+ ACL2 theorem proofs translated from Software Foundations An MCP server for ACL2 with stateful solver sessions

I got curious whether Claude Code could handle a low-representation theorem prover like ACL2 - turns out yes! I proved a bunch of small to medium theorem, and for good measure built a MCP server, all in about 4 hrs. Iโ€™ve never used ACL2 before. Write-up here: mikedodds.org/posts/2025/1...

09.10.2025 23:48 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Thatโ€™s true, and I think thatโ€™s exactly why Claude does so well proofs. Itโ€˜s just I happen to know first hand that proofs are a particularly difficult *kind* of program

21.09.2025 01:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Claude Can (Sometimes) Prove It

I wrote about Claude Code, which to my absolute astonishment is quite good at theorem proving. For people who don't know theorem proving, this is like spending your whole life building F1 engines and getting lapped by a Tesco's shopping trolley www.galois.com/articles/cla...

16.09.2025 22:46 ๐Ÿ‘ 16 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Screenshot of article text: โ€œFormal verification today is very useful, but for most systems itโ€™s very difficult to write the kinds of complete specifications that verification needs. However, other kinds of specifications are popular: for example, a test case is a kind of limited, partial specification. The key is that a test case is immediately useful, and doesnโ€™t impose undue costs on the development team. We need to find ways to specify systems that have these virtues, and avoid the trap of imposing a complete and coherent view that fundamentally does not exist.โ€

Screenshot of article text: โ€œFormal verification today is very useful, but for most systems itโ€™s very difficult to write the kinds of complete specifications that verification needs. However, other kinds of specifications are popular: for example, a test case is a kind of limited, partial specification. The key is that a test case is immediately useful, and doesnโ€™t impose undue costs on the development team. We need to find ways to specify systems that have these virtues, and avoid the trap of imposing a complete and coherent view that fundamentally does not exist.โ€

Screenshot of article text: โ€œ Iโ€™ve come to think writing formal specifications is just a very difficult task. It requires a top-down view of the system that designers and engineers typically donโ€™t have or, more importantly, need. In contrast, informal specifications can be ambiguous, partial, flexible. Informal specifications are intended as communication mechanisms between humans, and as a result they can be โ€˜wrong but usefulโ€™, and elide aspects of the system that are not of interest. This is a strength, but it also results in systems that canโ€™t be easily formalized.โ€

Screenshot of article text: โ€œ Iโ€™ve come to think writing formal specifications is just a very difficult task. It requires a top-down view of the system that designers and engineers typically donโ€™t have or, more importantly, need. In contrast, informal specifications can be ambiguous, partial, flexible. Informal specifications are intended as communication mechanisms between humans, and as a result they can be โ€˜wrong but usefulโ€™, and elide aspects of the system that are not of interest. This is a strength, but it also results in systems that canโ€™t be easily formalized.โ€

Screenshot of article text: โ€œ I think systems are typically both designed from the top and grown incrementally. Most systems have some degree of top-down structure, but few systems have a mathematically coherent specification that covers every behavior. The effect is that most systems obey some formal specification for some core functionality, but if outside this core, we rapidly enter muddy territory where it is unclear what the system should do, or whether the designer should even careโ€

Screenshot of article text: โ€œ I think systems are typically both designed from the top and grown incrementally. Most systems have some degree of top-down structure, but few systems have a mathematically coherent specification that covers every behavior. The effect is that most systems obey some formal specification for some core functionality, but if outside this core, we rapidly enter muddy territory where it is unclear what the system should do, or whether the designer should even careโ€

Screenshot of article text: โ€œ Itโ€™s a formal verification clichรฉ that writing the specification tends to uncover most of the bugs in a system. To me, this suggests an analogy between specification and programmingโ€”both are tools for expressing what we want. In one way, this is a pessimistic thought: no tool can remove the burden of clarifying our ideas. But also, it gives me some hope. Programming is very difficult, but through careful tool design, weโ€™ve made it available to hundreds of millions of people. With luck and skill, perhaps we can do the same for specifications.โ€

Screenshot of article text: โ€œ Itโ€™s a formal verification clichรฉ that writing the specification tends to uncover most of the bugs in a system. To me, this suggests an analogy between specification and programmingโ€”both are tools for expressing what we want. In one way, this is a pessimistic thought: no tool can remove the burden of clarifying our ideas. But also, it gives me some hope. Programming is very difficult, but through careful tool design, weโ€™ve made it available to hundreds of millions of people. With luck and skill, perhaps we can do the same for specifications.โ€

New Galois blog: โ€œSpecifications Donโ€™t Existโ€. If we want to formally verify more systems, we need formal specifications, but most real systems are hard to specify for very deep reasons www.galois.com/articles/spe...

16.07.2025 00:48 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Iโ€™m not sure how I missed this but itโ€™s an extremely good article and you should absolutely read it. Itโ€™s about formal methods, but anyone who cares about integrating research into industry will find it valuable!

I saw a *ton* of parallels with resilience engineering too :)

25.06.2025 01:25 ๐Ÿ‘ 13 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hey :) Seems like a lot of people moved here from old Twitter and Iโ€™m still catching up

25.06.2025 22:51 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
A labyrinth icon, serving as a metaphor for the process of formal verification

A labyrinth icon, serving as a metaphor for the process of formal verification

At Galois, we often say things like: โ€œFormal methods form the backbone of everything we do.โ€

But what exactly are formal methods? How do they work, and why are they so important?

We created a handy reference page to explain: www.galois.com/what-are-for...

27.05.2025 17:56 ๐Ÿ‘ 1 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Text screenshot: I sometimes hear people claiming that formal methods are demonstrably better than the techniques software engineers mostly use today. The only reason formal techniques arenโ€™t more popular (according to this theory) is that engineering teams are unaware, conservative, maybe put off by superficial difficulties like poor interfaces and documentation. I donโ€™t think this is quite right. My observation is that engineers are mostly rational when thinking about costs and benefits, at least within the bounds of their particular systems and problems.

Text screenshot: I sometimes hear people claiming that formal methods are demonstrably better than the techniques software engineers mostly use today. The only reason formal techniques arenโ€™t more popular (according to this theory) is that engineering teams are unaware, conservative, maybe put off by superficial difficulties like poor interfaces and documentation. I donโ€™t think this is quite right. My observation is that engineers are mostly rational when thinking about costs and benefits, at least within the bounds of their particular systems and problems.

If a tool is not popular, itโ€™s uncompelling to argue that everyone is just mistaken. At some point you should ask why the tool isnโ€™t useful (at the current cost/benefit point)

24.05.2025 02:12 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
A graph of costs and benefits plotted against each other. There is a line under which is โ€œYour favourite under-appreciated formal methodโ€. There are two arrows pointing orthogonally away: โ€œbe cheaperโ€ and โ€œbe more beneficialโ€

A graph of costs and benefits plotted against each other. There is a line under which is โ€œYour favourite under-appreciated formal methodโ€. There are two arrows pointing orthogonally away: โ€œbe cheaperโ€ and โ€œbe more beneficialโ€

New-ish @galoisinc.bsky.social blog: โ€œWhat Works (and Doesn't) Selling Formal Methodsโ€. The boring truth: engineers are rational and adoption is all about cost/benefit tradeoffs www.galois.com/articles/wha...

24.05.2025 02:12 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

What actually works when selling formal methods in industry?

What doesn't?

The way Galois Principal Scientist @m-dodds.bsky.social sees it, many FM projects donโ€™t pencil out not because clients are irrational, but because the cost/benefit tradeoffs donโ€™t make sense.
www.galois.com/articles/wha...

08.05.2025 16:32 ๐Ÿ‘ 3 ๐Ÿ” 4 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Compiler Explorer - C (C2Rust (master)) /* Type your code here, or load an example. */ int square(int num) { return num * num; }

c2rust is available on the Godbolt Compiler Explorer! c2rust is a tool we developed with Immunant that can convert nearly any piece of C code into compilable Rust godbolt.org/z/crsWEGEKM

14.04.2025 14:52 ๐Ÿ‘ 14 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Why Amazon is Betting on โ€˜Automated Reasoningโ€™ to Reduce AIโ€™s Hallucinations Amazon is using math to help solve one of artificial intelligenceโ€™s most intractable problems: its tendency to make up answers, and to repeat them back to us with confidence.

Formal methods go great with AI www.wsj.com/articles/why...

06.02.2025 05:21 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I wrote about o3, the Frontier Math benchmark, and what it means if AI math keeps getting better

29.01.2025 23:24 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I do think a lot of people are in denial though!

21.01.2025 23:49 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I donโ€™t think literally everyone should drop what theyโ€™re doing. But my sense is PL research as a whole is significantly under-reacting to AI. So I suppose I think *some more* PL people should bet on AI (but maybe not you!)

21.01.2025 18:20 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Happy to mail you a couple. Email me, my address is on my website

21.01.2025 07:15 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I think youโ€™ve put your finger on the exact worldview mismatch because 5-10 years seems like an insanely long time horizon to me

21.01.2025 05:07 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Why constrain the grammar - just pull more samples and keep the ones that pass :p

21.01.2025 05:02 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hot take for POPL: the PL community is still mostly in denial about AI. This is bad because PL+AI go great together

- PL can solve the hardest problem with AI - trusting the output it produces

- AI can solve the hardest problem with PL - finding enough engineers who can even use the tools

20.01.2025 21:52 ๐Ÿ‘ 14 ๐Ÿ” 1 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Post image

Iโ€™m bringing these cute Galois stickers to POPL so if you want one, come find me

20.01.2025 20:09 ๐Ÿ‘ 11 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Vomiting Emoji

8 years on, the future is here! xkcd.com/1813/

27.12.2024 01:42 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Emoji Kitchen - Browse Google's unique emoji combinations Unique illustrations of combined emoji, cooked up in Google's Emoji Kitchen, and comprehensively available on the web

emojikitchen.dev

27.12.2024 00:45 ๐Ÿ‘ 7 ๐Ÿ” 3 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1

If I understand right, the private test set is only used during evaluation of the model - not available to the team doing the training

21.12.2024 22:16 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
OpenAI o3 Breakthrough High Score on ARC-AGI-Pub OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.

Seems almost certain itโ€™s deliberately trained on math reasoning. The way the o-series models seem to work is by long CoT, with reinforcement learning to impose correct reasoning. Not much public about how o3 works internally, but Chollet has some speculation: arcprize.org/blog/oai-o3-...

21.12.2024 17:44 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

A big jump on coding skill as well:

21.12.2024 17:18 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Re o3 - this is the big one for me. The Frontier Math benchmark is designed to be extremely difficult, and it has a private test set (no data contamination).

Today, o3 is v expensive. But seems inevitable itโ€™ll soon be cheap. If these results hold up, that means MUCH more powerful automated math

21.12.2024 17:18 ๐Ÿ‘ 3 ๐Ÿ” 2 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 0