Agentic manual testing - Agentic Engineering Patterns
Agentic manual testing - Agentic Engineering Patterns
New chapter: Agentic manual testing - about how having agents "manually" try out code is a useful way to help them spot issues that might not have been caught by their automated tests https://simonwillison.net/guides/agentic-engineering-patterns/agentic-manual-testing/
06.03.2026 16:50
👍 3
🔁 2
💬 0
📌 0
The New York Earth Room
On the second floor of 141 Wooster Street in New York's SoHo district there is a 3,600 square foot room filled with earth - 280,000 pounds of it, first installed in 1977 and maintained there ever since. This is the New York Earth Room, a piece of installation art by Walter De Maria, originally planned as a three month exhibition which has now stretched into its sixth decade. This is actually the third instance of the Earth Room, a sequel to the 1968 Earth Room in Galerie Heiner Friedrich in Munich and a second in 1974 at Hessisches Landesmuseum in Darmstadt. Only this edition survives. The exhibit is owned and maintained by the Dia Art Foundation, who also own the entire 2nd floor. The foundation was founded in 1974 by Philippa de Menil, an heiress to the Schlumberger oil exploration fortune, her husband art dealer Heiner Friedrich, and Houston art historian Helen Winkler. The [foundation's mission](https://www.diaart.org/about/about-dia) includes "to help artists achieve visionary projects that might not otherwise be realized because of scale or scope." The Earth Room is a prominent example. The earth itself is a mixture of peat and bark, most of which is the original earth from the 1970s. The curators till the soil twice a year and occasionally wet it to avoid it turning into dust. They topped it up with fresh soil in 2022 to compensate for it compacting down over the years. The Earth Room is free to visit but guests are asked not to take any photographs to respect the wishes of the artist. It has a curator who will answer questions about the artwork - painter Bill Dilworth staffed the desk from 1989 until his retirement in 2024.
I went to the New York Earth Room! It's 280,000 pounds of soil in a loft in SoHo that's been there mostly unchanged since 1977 https://www.niche-museums.com/117
04.03.2026 23:00
👍 4
🔁 3
💬 0
📌 0
Original post on fedi.simonwillison.net
I started a new chapter of my Agentic Engineering Patternw guide about anti-patterns - things NOT to do
So far I only have one: Inflicting unreviewed code on collaborators, aka dumping a thousand line PR without even making sure it works first […]
04.03.2026 18:20
👍 4
🔁 0
💬 0
📌 0
Something is afoot in the land of Qwen
I’m behind on writing about Qwen 3.5, a truly remarkable family of open weight models released by Alibaba’s Qwen team over the past few weeks. I’m hoping that the 3.5 …
Published some notes on the situation at Qwen - they released the Qwen 3.5 family (an outstanding family of open weight models) but now their lead researcher and several others all appear to have resigned within the past 24 hours https://simonwillison.net/2026/Mar/4/qwen/
04.03.2026 15:53
👍 2
🔁 3
💬 0
📌 0
Original post on fedi.simonwillison.net
I started a new section of my Agentic Engineering guide for annotated versions of prompts I've used for projects - the first is a prompt I used to have Claude Code for web build me a web UI for compressing GIFs using a WebAssembly build of Gifsicle […]
02.03.2026 17:20
👍 7
🔁 1
💬 1
📌 0
5. "No new chicks for four years (due to a lack of fruiting rimu trees)" The phrasing "lack of fruiting rimu trees" is slightly imprecise. The issue isn't that rimu trees failed to fruit at all, but that there was no mass fruiting (masting) event, which is the specific trigger for kākāpō breeding. Consider "due to a lack of rimu masting" or "due to a lack of mass rimu fruiting."
Sent the February edition of my sponsors-only newsletter - a summary of my last month of blogging for people who want to pay for a shorter version
I use Claude as a proofreader and fact checker, was delighted that it called me out on this Kākāpō […]
[Original post on fedi.simonwillison.net]
02.03.2026 14:58
👍 0
🔁 0
💬 0
📌 0
Interactive explanations - Agentic Engineering Patterns - Simon Willison's Weblog
New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt https://simonwillison.net/guides/agentic-engineering-patterns/interactive-explanations/
28.02.2026 23:14
👍 15
🔁 1
💬 0
📌 0
Built a fun prototype this morning of binary search using HTTP range requests, in this case to lookup characters in ~77MB of unicode data https://simonwillison.net/2026/Feb/27/unicode-explorer/
27.02.2026 18:02
👍 0
🔁 0
💬 0
📌 0
Hoard things you know how to do - Agentic Engineering Patterns - Simon Willison's Weblog
Today's chapter of Agentic Engineering Patterns is some good general career advice which happens to also help when working with coding agents: Hoard things you know how to do https://simonwillison.net/guides/agentic-engineering-patterns/hoard-things-you-know-how-to-do/
26.02.2026 21:14
👍 3
🔁 1
💬 0
📌 0
Original post on fedi.simonwillison.net
I've been having good results recently asking coding agents to provide "a linear walkthrough of the code that explains how it all works in detail" - I demonstrated that against this vibe coded Swift codebase and wrote up the technique here […]
25.02.2026 16:57
👍 2
🔁 1
💬 1
📌 0
"Red/green TDD" talks about how you can get much better results from most coding agents by encouraging them to use test-first development where they watch the tests fail before building an implementation that lets them pass https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/
23.02.2026 17:54
👍 5
🔁 4
💬 0
📌 0
Original post on fedi.simonwillison.net
"Writing code is cheap now" talks about the central challenge of agentic engineering: the cost to churn out working code has dropped to almost nothing, how does that impact our existing intuitions about how we work, both individually and as a team? […]
23.02.2026 17:53
👍 4
🔁 2
💬 1
📌 0
Andrej Karpathy talks about “Claws”
Andrej Karpathy tweeted a mini-essay about buying a Mac Mini ("The apple store person told me they are selling like hotcakes and everyone is confused") to tinker with Claws: I'm …
New term of art is brewing: "Claw" as the noun for OpenClaw-like agent systems, AI agents that generally run on personal hardware, communicate via messaging protocols and can both act on direct instructions and schedule tasks https://simonwillison.net/2026/Feb/21/claws/
21.02.2026 00:40
👍 6
🔁 0
💬 2
📌 0
Sonnet 4.6
Extended
4 artifacts
User: Clone simonw/simonwillisonblog and tell me about the models and views
Thought: Synthesized comprehensive overview of models...
Here's a comprehensive overview of the models and views in simonw/simonwillisonblog - a
Django-powered personal blog:
Models
blog app — the core
The blog is built around a shared BaseModel abstract class that all content types inherit from. It provides: created, slug, tags (M2M), metadata (JSON), search_document (Postgres full-text search vector), is_draft, card_image, and series (FK).
Fun bonus tip: regular Claude chat (not Claude Code) has the ability to clone repos from GitHub these days, which means you can ask it to checkout ANY public repo and answer questions about it or even use it as a starting point for an artifact!
21.02.2026 00:01
👍 2
🔁 0
💬 0
📌 0
Original post on fedi.simonwillison.net
Shared some thoughts on ggml-ai joining Hugging Face - they've been a good steward of the crucial Transformers open source library so I'm optimistic that great things are ahead for ggml-ai, which kicked off the local model revolution back in March 2023 […]
20.02.2026 17:14
👍 4
🔁 0
💬 1
📌 0
Screenshot of a blog post update. Text reads: "Update: In What happens if AI labs train for pelicans riding bicycles? last November I said:" followed by a blockquote: "If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I'm going to test it on all manner of creatures riding all sorts of transportation devices." Then: "Google's Gemini Lead Jeff Dean tweeted this video featuring an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine." Below are two side-by-side AI-generated SVG images labeled "Gemini 3 Pro" and "Gemini 3.1 Pro", both showing a dachshund driving a black stretch limousine. The Gemini 3 Pro version has a bright blue sky background, while the Gemini 3.1 Pro version has a sunset cityscape with a street lamp. At the bottom: "Prompt: Generate an animated 4:3 SVG of a dachshund driving a stretch limousine"
Had to update my blog post after I saw Google's Jeff Dean had posted a video of an SVG animated pelican riding a bicycle, plus a frog on a penny-farthing, a giraffe driving a tiny car, an ostrich on roller skates, a turtle kickflipping a skateboard, and a dachshund driving a stretch limousine
19.02.2026 19:43
👍 0
🔁 0
💬 0
📌 0
Gemini 3.1 Pro
The first in the Gemini 3.1 series, priced the same as Gemini 3 Pro ($2/million input, $12/million output under 200,000 tokens, $4/$18 for 200,000 to 1,000,000). They boast about its …
Gemini 3.1 Pro produced an excellent pelican riding a bicycle SVG but took over 5 minutes to do it - I'm pretty sure that's just teething problems on launch day though, I got a few error messages about capacity while trying it out. https://simonwillison.net/2026/Feb/19/gemini-31-pro/
19.02.2026 17:59
👍 2
🔁 1
💬 2
📌 0
Errors now use exit code 2, which means exit code 1 is just for for check failures. #15
New rodney assert command for running JavaScript tests, exit code 1 if they fail. #19
New directory-scoped sessions with --local/--global flags. #14
New reload --hard and clear-cache commands. #17
New rodney start --show option to make the browser window visible. Thanks, Antonio Cuni. #13
New rodney connect PORT command to debug an already-running Chrome instance. Thanks, Peter Fraenkel. #12
New RODNEY_HOME environment variable to support custom state directories. Thanks, Senko Rašić. #11
New --insecure flag to ignore certificate errors. Thanks, Jakub Zgoliński. #10
Windows support: avoid Setsid on Windows via build-tag helpers. Thanks, adm1neca. #18
Tests now run on windows-latest and macos-latest in addition to Linux.
New release of Rodney, my CLI tool for browser automation (designed for use by coding agents and with Showboat) - contributions from five people! https://simonwillison.net/2026/Feb/17/rodney/
17.02.2026 23:05
👍 7
🔁 1
💬 0
📌 0
Original post on fedi.simonwillison.net
Last week I introduced Showboat help coding agents build documents that demonstrate their work - today I'm adding another two complementary tools - Chartroom for CLI charts and datasette-showboat for receiving Showboat documents as they are being built […]
17.02.2026 00:49
👍 1
🔁 0
💬 0
📌 0
Deep Blue
We coined a new term on the Oxide and Friends podcast last month (primary credit to Adam Leventhal) covering the sense of psychological ennui leading into existential dread that many …
On the @oxidecomputer and friends podcast last month we (primary credit @ahl) coined the term "Deep Blue" for the sense of psychological ennui leading into existential dread that many software developers are feeling thanks to LLMs right now https://simonwillison.net/2026/Feb/15/deep-blue/
15.02.2026 21:12
👍 6
🔁 8
💬 0
📌 1
Gist Host
@virtuous_sloth neat trick, thanks! https://gisthost.github.io/?502118554e31f9561f4758ae741ed62f
14.02.2026 00:13
👍 1
🔁 0
💬 1
📌 0