Ian Bicking (@ianbicking.org)

I guess there's a more general graph that could be made: capacity for capital to replace labor.

05.03.2026 22:31 👍 1 🔁 0 💬 0 📌 0

I like how in 2026 a common security paradigm is writing a strongly worded letter to the guy in your computer

03.03.2026 21:11 👍 445 🔁 65 💬 18 📌 9

And I freely allow the agent to modify any code to make things more testable. These are big projects that might not pay off, and might add overhead that doesn't justify itself. At least that's what I would have said before, but now this is all so damned cheap.

03.03.2026 22:11 👍 1 🔁 0 💬 0 📌 0

It's like day 2, so I can't report if it works, but it feels good so far. Each test is a story, and documentation for the agent.

I encourage the agent to invest a lot more time into making nice string representations of the objects and situations being tested. A small cost!

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

5. Testing is probably more useful to the agent as storytelling than as verification. The storytelling also helps with the design!

So I have centered my testing primarily around doctest! (A doctest framework I vibecoded.)

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

... Maybe I only worked with honest engineers, but I have almost never felt this was justified. The agent isn't entirely honest... but it seems as honest as the people I've worked with, and whoever came up with blackbox testing I suspect worked in low integrity orgs.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

3. That is, if the agent has to pass arguments through a dozen functions so something can be mocked deep in the call stack, so be it. It's super good at that.
4. One of the arguments for blackbox testing is that without it people will lie and fake tests...

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

But there are testing opportunities too:

1. I am willing to make the agent jump through all kinds of hoops. Everything carefully typed. No casts without a justification and eslint override.
2. I never liked blackbox testing, and even less now. Writing code to be testable is annoying, but who cares?

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

5. Testing doesn't seem to help much with design errors. That is: fragile, poorly abstracted, poorly encapsulated designs. Testing CAN help with these things, but not the testing the agent does on its own.
6. Testing is crappy documentation, for me and the agent.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

3. Code correctness is really high coming from Opus. At least the code you'd unit test. Stuff that would NEED tests when written by a human just... don't need tests.
4. It's not to say there are no issues in the code, but the testing it learned from humans doesn't apply well.

03.03.2026 22:11 👍 2 🔁 0 💬 2 📌 0

1. Writing tests before, after, or long after writing code seems to have no effect on quality.
2. Tests written after code seldom find errors.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

It's kind of okay to make crap tests, because crap tests waste developer time, and I'm very indifferent to the agent's time. But the tests don't seem very valuable. How do I know I know?

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

But while my old opinions are often invalidated, it doesn't mean I can't form better new opinions. And my better new opinion is that BDD still sucks, and these tests the agent makes with no firm instruction are pretty crap.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

Anyway, there's quite a few things I hate that I'm letting the agents do if they think it's a good idea. I totally enjoy new programming paradigms and figuring out how to map elegant systems to new ideas and all that, but that's not my job in this system. If it's intuitive to the agent that's ok.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

In part I'm not reading the tests any more than I'm reading the code. But also the tests are _much worse_ to read than the code, tedious and obtuse.

Agents (at least in the TypeScript stacks I'm building) love BDD testing. I HATE BDD WITH A FIERY PASSION. It's so, so stupid. I hate it so much.

03.03.2026 22:11 👍 1 🔁 0 💬 1 📌 0

I've been rethinking my testing strategy for agentically coded projects (i.e., vibecoded; where I'm not touching the code directly).

It's easy to just have it go and make tests, and sometimes poke it or add instructions to make more tests. I haven't seen much value from the result...

03.03.2026 22:11 👍 3 🔁 0 💬 1 📌 0

But drawing back to the present: even if you aren't doing autonomous weaponry, the closer you get to _appearing_ to do autonomous weaponry (which includes any speculative testing) the less reliable the AI will become as it begins to reject the requests.

28.02.2026 01:23 👍 0 🔁 0 💬 0 📌 0

with agentic AI you pay a price by tricking the agent. If you can tell the AI plainly what you are trying to do then it makes very good use of that. Which is to say: can the government mandate that it can give the AI any goal, covertly, and expect for it to be followed?

28.02.2026 01:23 👍 0 🔁 0 💬 1 📌 0

Even more realistically, the models are trained to not permit themselves to be used for things like that, even if the connection seems covert (e.g., you asked "would it be a PR nightmare if this event was bombed: ...")

You can trick it of course, ask more abstractly, but...

28.02.2026 01:23 👍 1 🔁 0 💬 1 📌 0

I can imagine relying on other AI for direct targeting, but extending the AI-led process further up into an investigative role (e.g., to identify if there's a social gathering that coincides with the planned targeting).

("I can imagine" in a regretful and disturbed way)

28.02.2026 01:15 👍 0 🔁 0 💬 1 📌 0

Buy this article Purchase on SpringerLink Instant access to the full article PDF. USD 39.95

25.02.2026 17:58 👍 0 🔁 0 💬 0 📌 0

Also this “lack of evidence” feels like we’re being urged not to apply intellect or logic. Why does my teen want to use more social media? Because of their peers. A ban would affect not just my child but also their peers, moving their social communications elsewhere

25.02.2026 17:56 👍 0 🔁 0 💬 1 📌 0

When they say “not supported by evidence” do they mean the evidence does not support bans, or that there is not evidence that supports bans?

(These two very different meanings make the phrasing feel weasely tbh)

25.02.2026 17:53 👍 0 🔁 0 💬 2 📌 0

They look so silly, but the visibility from the inside must be great

23.02.2026 16:33 👍 1 🔁 0 💬 0 📌 0

I made a voice chatbot, which also happened to have a tool to stop the microphone. When I had it accidentally on but muted, and was chatting with my wife, after a brief confusion it politely noticed that we weren’t talking to it and turned off the microphone.

18.02.2026 21:43 👍 1 🔁 0 💬 1 📌 0

It also looks like fixation: being so obsessed with anti-commercialization that one only sees the commercial aspects of a thing.

18.02.2026 14:01 👍 4 🔁 0 💬 1 📌 0

Solar power is definitely a success to be proud of… but it also points to an inability on the left to be happy or satisfied with anything. In this case not even due to any critique (solar power is pretty great all around), but an inability to let the gaze linger on success

18.02.2026 13:19 👍 1 🔁 0 💬 2 📌 0

I can’t decide if that’s true though. Is it?

Is it specifically true for the humanities?

18.02.2026 05:31 👍 1 🔁 0 💬 0 📌 0

Thinking about thinking, or computing about computing, or mathing about math… I’d be looking for the machine’s self-application of whatever it does.

So how would you apply Stockfish to Stockfish? I’m not sure, but if you could I think it would be very interesting!

17.02.2026 19:24 👍 0 🔁 0 💬 1 📌 0

Stockfish doesn't seem to do anything thinking-about-thinking? That's naturally what I would look for on any spectrum of consciousness. Is it able to apply a decision framework to its own decision framework?

17.02.2026 15:15 👍 0 🔁 0 💬 1 📌 0

Ian Bicking

Latest posts by Ian Bicking @ianbicking.org