Simon Willison (@simon.fedi.simonwillison.net.ap.brid.gy)

Agentic manual testing - Agentic Engineering Patterns Agentic manual testing - Agentic Engineering Patterns

New chapter: Agentic manual testing - about how having agents "manually" try out code is a useful way to help them spot issues that might not have been caught by their automated tests https://simonwillison.net/guides/agentic-engineering-patterns/agentic-manual-testing/

06.03.2026 16:50 👍 3 🔁 2 💬 0 📌 0

Can coding agents relicense open source through a “clean room” implementation of code? Over the past few months it’s become clear that coding agents are extraordinarily good at building a weird version of a “clean room” implementation of code. The most famous version …

The chardet open source library relicensed from LGPL to MIT two days ago thanks to a Claude Code assisted "clean room" rewrite - but original author Mark Pilgrim is disputing that the way this was done justifies the change in license - my notes here: https://simonwillison.net/2026/Mar/5/chardet/

05.03.2026 16:53 👍 6 🔁 17 💬 4 📌 6

The New York Earth Room On the second floor of 141 Wooster Street in New York's SoHo district there is a 3,600 square foot room filled with earth - 280,000 pounds of it, first installed in 1977 and maintained there ever since. This is the New York Earth Room, a piece of installation art by Walter De Maria, originally planned as a three month exhibition which has now stretched into its sixth decade. This is actually the third instance of the Earth Room, a sequel to the 1968 Earth Room in Galerie Heiner Friedrich in Munich and a second in 1974 at Hessisches Landesmuseum in Darmstadt. Only this edition survives. The exhibit is owned and maintained by the Dia Art Foundation, who also own the entire 2nd floor. The foundation was founded in 1974 by Philippa de Menil, an heiress to the Schlumberger oil exploration fortune, her husband art dealer Heiner Friedrich, and Houston art historian Helen Winkler. The [foundation's mission](https://www.diaart.org/about/about-dia) includes "to help artists achieve visionary projects that might not otherwise be realized because of scale or scope." The Earth Room is a prominent example. The earth itself is a mixture of peat and bark, most of which is the original earth from the 1970s. The curators till the soil twice a year and occasionally wet it to avoid it turning into dust. They topped it up with fresh soil in 2022 to compensate for it compacting down over the years. The Earth Room is free to visit but guests are asked not to take any photographs to respect the wishes of the artist. It has a curator who will answer questions about the artwork - painter Bill Dilworth staffed the desk from 1989 until his retirement in 2024.

I went to the New York Earth Room! It's 280,000 pounds of soil in a loft in SoHo that's been there mostly unchanged since 1977 https://www.niche-museums.com/117

04.03.2026 23:00 👍 4 🔁 3 💬 0 📌 0

Original post on fedi.simonwillison.net

I started a new chapter of my Agentic Engineering Patternw guide about anti-patterns - things NOT to do

So far I only have one: Inflicting unreviewed code on collaborators, aka dumping a thousand line PR without even making sure it works first […]

04.03.2026 18:20 👍 4 🔁 0 💬 0 📌 0

Something is afoot in the land of Qwen I’m behind on writing about Qwen 3.5, a truly remarkable family of open weight models released by Alibaba’s Qwen team over the past few weeks. I’m hoping that the 3.5 …

Published some notes on the situation at Qwen - they released the Qwen 3.5 family (an outstanding family of open weight models) but now their lead researcher and several others all appear to have resigned within the past 24 hours https://simonwillison.net/2026/Mar/4/qwen/

04.03.2026 15:53 👍 2 🔁 3 💬 0 📌 0

Original post on fedi.simonwillison.net

I started a new section of my Agentic Engineering guide for annotated versions of prompts I've used for projects - the first is a prompt I used to have Claude Code for web build me a web UI for compressing GIFs using a WebAssembly build of Gifsicle […]

02.03.2026 17:20 👍 7 🔁 1 💬 1 📌 0

5. "No new chicks for four years (due to a lack of fruiting rimu trees)" The phrasing "lack of fruiting rimu trees" is slightly imprecise. The issue isn't that rimu trees failed to fruit at all, but that there was no mass fruiting (masting) event, which is the specific trigger for kākāpō breeding. Consider "due to a lack of rimu masting" or "due to a lack of mass rimu fruiting."

Sent the February edition of my sponsors-only newsletter - a summary of my last month of blogging for people who want to pay for a shorter version

I use Claude as a proofreader and fact checker, was delighted that it called me out on this Kākāpō […]

[Original post on fedi.simonwillison.net]

02.03.2026 14:58 👍 0 🔁 0 💬 0 📌 0

Interactive explanations - Agentic Engineering Patterns - Simon Willison's Weblog

New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt https://simonwillison.net/guides/agentic-engineering-patterns/interactive-explanations/

28.02.2026 23:14 👍 15 🔁 1 💬 0 📌 0

Built a fun prototype this morning of binary search using HTTP range requests, in this case to lookup characters in ~77MB of unicode data https://simonwillison.net/2026/Feb/27/unicode-explorer/

27.02.2026 18:02 👍 0 🔁 0 💬 0 📌 0

Hoard things you know how to do - Agentic Engineering Patterns - Simon Willison's Weblog

Today's chapter of Agentic Engineering Patterns is some good general career advice which happens to also help when working with coding agents: Hoard things you know how to do https://simonwillison.net/guides/agentic-engineering-patterns/hoard-things-you-know-how-to-do/

26.02.2026 21:14 👍 3 🔁 1 💬 0 📌 0

Claude Code Remote Control New Claude Code feature dropped yesterday: you can now run a "remote control" session on your computer and then use the Claude Code for web interfaces (on web, iOS and …

Brief notes on Claude Code Remote and Cowork scheduled tasks - both of which overlap with OpenClaw, and both of which require you to leave your computer powered on somewhere https://simonwillison.net/2026/Feb/25/claude-code-remote-control/

25.02.2026 18:37 👍 3 🔁 1 💬 0 📌 0

Original post on fedi.simonwillison.net

I've been having good results recently asking coding agents to provide "a linear walkthrough of the code that explains how it all works in detail" - I demonstrated that against this vibe coded Swift codebase and wrote up the technique here […]

25.02.2026 16:57 👍 2 🔁 1 💬 1 📌 0

I vibe coded my dream macOS presentation app I gave a talk this weekend at Social Science FOO Camp in Mountain View. The event was a classic unconference format where anyone could present a talk without needing to …

Wrote up a fun vibe-coding project, I had Claude Code build me a SwiftUI macOS app for presenting a talk by turning a list of URLs into a full-screen slide experience I could remote control from my phone https://simonwillison.net/2026/Feb/25/present/

25.02.2026 16:52 👍 11 🔁 2 💬 1 📌 0

"Red/green TDD" talks about how you can get much better results from most coding agents by encouraging them to use test-first development where they watch the tests fail before building an implementation that lets them pass https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/

23.02.2026 17:54 👍 5 🔁 4 💬 0 📌 0

Original post on fedi.simonwillison.net

"Writing code is cheap now" talks about the central challenge of agentic engineering: the cost to churn out working code has dropped to almost nothing, how does that impact our existing intuitions about how we work, both individually and as a team? […]

23.02.2026 17:53 👍 4 🔁 2 💬 1 📌 0

Writing about Agentic Engineering Patterns I’ve started a new project to collect and document Agentic Engineering Patterns—coding practices and patterns to help get the best results out of this new era of coding agent development …

I've published the first two chapters of a new guide to Agentic Engineering Patterns - coding practices and patterns to help get the best results out of coding agents like Claude Code and OpenAI Codex https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/

23.02.2026 17:45 👍 36 🔁 5 💬 4 📌 2

Andrej Karpathy talks about “Claws” Andrej Karpathy tweeted a mini-essay about buying a Mac Mini ("The apple store person told me they are selling like hotcakes and everyone is confused") to tinker with Claws: I'm …

New term of art is brewing: "Claw" as the noun for OpenClaw-like agent systems, AI agents that generally run on personal hardware, communicate via messaging protocols and can both act on direct instructions and schedule tasks https://simonwillison.net/2026/Feb/21/claws/

21.02.2026 00:40 👍 6 🔁 0 💬 2 📌 0

Sonnet 4.6 Extended 4 artifacts User: Clone simonw/simonwillisonblog and tell me about the models and views Thought: Synthesized comprehensive overview of models... Here's a comprehensive overview of the models and views in simonw/simonwillisonblog - a Django-powered personal blog: Models blog app — the core The blog is built around a shared BaseModel abstract class that all content types inherit from. It provides: created, slug, tags (M2M), metadata (JSON), search_document (Postgres full-text search vector), is_draft, card_image, and series (FK).

Fun bonus tip: regular Claude chat (not Claude Code) has the ability to clone repos from GitHub these days, which means you can ask it to checkout ANY public repo and answer questions about it or even use it as a starting point for an artifact!

21.02.2026 00:01 👍 2 🔁 0 💬 0 📌 0

Adding TILs, releases, museums, tools and research to my blog I’ve been wanting to add indications of my various other online activities to my blog for a while now. I just turned on a new feature I’m calling “Beats” (after …

Added a feature to my blog I've wanted for ages - it now shows my content from elsewhere, including TILs, releases, museums, tools and research, as little badges in the various blog timeline views https://simonwillison.net/2026/Feb/20/beats/

20.02.2026 23:49 👍 1 🔁 0 💬 2 📌 0

Original post on fedi.simonwillison.net

Shared some thoughts on ggml-ai joining Hugging Face - they've been a good steward of the crucial Transformers open source library so I'm optimistic that great things are ahead for ggml-ai, which kicked off the local model revolution back in March 2023 […]

20.02.2026 17:14 👍 4 🔁 0 💬 1 📌 0

Screenshot of a blog post update. Text reads: "Update: In What happens if AI labs train for pelicans riding bicycles? last November I said:" followed by a blockquote: "If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I'm going to test it on all manner of creatures riding all sorts of transportation devices." Then: "Google's Gemini Lead Jeff Dean tweeted this video featuring an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine." Below are two side-by-side AI-generated SVG images labeled "Gemini 3 Pro" and "Gemini 3.1 Pro", both showing a dachshund driving a black stretch limousine. The Gemini 3 Pro version has a bright blue sky background, while the Gemini 3.1 Pro version has a sunset cityscape with a street lamp. At the bottom: "Prompt: Generate an animated 4:3 SVG of a dachshund driving a stretch limousine"

Had to update my blog post after I saw Google's Jeff Dean had posted a video of an SVG animated pelican riding a bicycle, plus a frog on a penny-farthing, a giraffe driving a tiny car, an ostrich on roller skates, a turtle kickflipping a skateboard, and a dachshund driving a stretch limousine

19.02.2026 19:43 👍 0 🔁 0 💬 0 📌 0

Gemini 3.1 Pro The first in the Gemini 3.1 series, priced the same as Gemini 3 Pro ($2/million input, $12/million output under 200,000 tokens, $4/$18 for 200,000 to 1,000,000). They boast about its …

Gemini 3.1 Pro produced an excellent pelican riding a bicycle SVG but took over 5 minutes to do it - I'm pretty sure that's just teething problems on launch day though, I got a few error messages about capacity while trying it out. https://simonwillison.net/2026/Feb/19/gemini-31-pro/

19.02.2026 17:59 👍 2 🔁 1 💬 2 📌 0

SWE-bench February 2025 leaderboard update SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of …

Wrote up some notes on the February 2026 update to the official SWE-bench leaderboard, with a bonus side-quest to get Claude for Chrome to redraw their chart to add percentage labels to the bars https://simonwillison.net/2026/Feb/19/swe-bench/

19.02.2026 04:54 👍 1 🔁 0 💬 0 📌 0

Introducing Claude Sonnet 4.6 Sonnet 4.6 is out today, and Anthropic claim it offers similar performance to November's Opus 4.5 while maintaining the Sonnet pricing of $3/million input and $15/million output tokens (the Opus …

Claude Sonnet 4.6 really likes drawing top hats on its pelicans https://simonwillison.net/2026/Feb/17/claude-sonnet-46/

18.02.2026 00:09 👍 7 🔁 1 💬 1 📌 0

Errors now use exit code 2, which means exit code 1 is just for for check failures. #15 New rodney assert command for running JavaScript tests, exit code 1 if they fail. #19 New directory-scoped sessions with --local/--global flags. #14 New reload --hard and clear-cache commands. #17 New rodney start --show option to make the browser window visible. Thanks, Antonio Cuni. #13 New rodney connect PORT command to debug an already-running Chrome instance. Thanks, Peter Fraenkel. #12 New RODNEY_HOME environment variable to support custom state directories. Thanks, Senko Rašić. #11 New --insecure flag to ignore certificate errors. Thanks, Jakub Zgoliński. #10 Windows support: avoid Setsid on Windows via build-tag helpers. Thanks, adm1neca. #18 Tests now run on windows-latest and macos-latest in addition to Linux.

New release of Rodney, my CLI tool for browser automation (designed for use by coding agents and with Showboat) - contributions from five people! https://simonwillison.net/2026/Feb/17/rodney/

17.02.2026 23:05 👍 7 🔁 1 💬 0 📌 0

Original post on fedi.simonwillison.net

Last week I introduced Showboat help coding agents build documents that demonstrate their work - today I'm adding another two complementary tools - Chartroom for CLI charts and datasette-showboat for receiving Showboat documents as they are being built […]

17.02.2026 00:49 👍 1 🔁 0 💬 0 📌 0

Rodney and Claude Code for Desktop I'm a very heavy user of Claude Code on the web, Anthropic's excellent but poorly named cloud version of Claude Code where everything runs in a container environment managed by …

Shared a quick Claude Desktop and Claude Code tip - if you tell Claude Code to look at screenshots of its work you can view them yourself in the chat transcript in the desktop app https://simonwillison.net/2026/Feb/16/rodney-claude-code/

16.02.2026 16:40 👍 3 🔁 0 💬 0 📌 0

Deep Blue We coined a new term on the Oxide and Friends podcast last month (primary credit to Adam Leventhal) covering the sense of psychological ennui leading into existential dread that many …

On the @oxidecomputer and friends podcast last month we (primary credit @ahl) coined the term "Deep Blue" for the sense of psychological ennui leading into existential dread that many software developers are feeling thanks to LLMs right now https://simonwillison.net/2026/Feb/15/deep-blue/

15.02.2026 21:12 👍 6 🔁 8 💬 0 📌 1

How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …

Short musings on "cognitive debt" - I'm seeing this in my own work, where excessive unreviewed AI-generated code leads me to lose a firm mental model of what I've built, which then makes it harder to confidently make future decisions https://simonwillison.net/2026/Feb/15/cognitive-debt/

15.02.2026 05:22 👍 38 🔁 55 💬 5 📌 5

Gist Host

@virtuous_sloth neat trick, thanks! https://gisthost.github.io/?502118554e31f9561f4758ae741ed62f

14.02.2026 00:13 👍 1 🔁 0 💬 1 📌 0

Simon Willison

Latest posts by Simon Willison @simon.fedi.simonwillison.net.ap.brid.gy