Colin (@colin-fraser.net)

Ender’s Game but the players are all Claude

06.03.2026 23:42 👍 1 🔁 0 💬 1 📌 0

Respectfully I think you may be doing what the kids refer to as “crashing out”

06.03.2026 22:49 👍 2 🔁 0 💬 0 📌 0

I do, yes.

06.03.2026 22:46 👍 2 🔁 0 💬 1 📌 0

this is true and LLMs used to only produce interesting output in the top left corner but nowadays they are spending a lot more time in the bottom right corner

06.03.2026 18:57 👍 23 🔁 3 💬 2 📌 0

I don't hate you; I just think you might not know what the person you were replying to meant exactly by "refactoring". I think they meant something much more complex than what you inferred.

06.03.2026 18:55 👍 4 🔁 0 💬 1 📌 0

Or at loving

06.03.2026 15:12 👍 1 🔁 0 💬 1 📌 0

Yeah this is really more like what I’m saying. In many cases it’s not that it’s dangerous, just that it’s wrong in ways that can be very concretely counterproductive. Although depending how lost in the sauce you get it can also be dangerous.

06.03.2026 15:08 👍 1 🔁 0 💬 1 📌 0

I think there is a good debate in this thread and I respect @sciortino.bsky.social’s position (although I believe I am right and he is wrong).

06.03.2026 15:06 👍 5 🔁 1 💬 0 📌 0

I think this is largely true for small numbers, with a few bizarre exceptions (9.8-9.11), but for large enough numbers (which is a stand-in for arbitrary tasks of long enough duration) it can’t tell the difference, and what’s more important to it than the guy being right is the guy being confident.

06.03.2026 14:59 👍 1 🔁 0 💬 0 📌 0

No, not really. For large enough numbers, I don’t think the reward function knows the difference between a story about a guy who confidently outputs the wrong answer and a story about a guy who confidently outputs the right answer.

06.03.2026 14:55 👍 1 🔁 0 💬 1 📌 0

The trouble is you don’t have such a verifier on hand for every arbitrary task. If you did, you could probably solve your problem without the LLM in the first place. But there are many tasks (SWE chiefly among them) where such verifiers are fairly easy to come by.

06.03.2026 14:53 👍 0 🔁 0 💬 0 📌 0

The LLM is indifferent between outputting the correct product of two large numbers and outputting the incorrect product because in the fictional story it outputs about the guy who does a good job, the fictional product it comes up with is correct either way.

06.03.2026 14:50 👍 1 🔁 0 💬 0 📌 0

But like I was saying in the forked thread, if I put that into a system with some kind of verifier loop, then I have something with the goal of exiting the verifier loop.

06.03.2026 14:47 👍 1 🔁 0 💬 1 📌 0

I’d say modern LLMs have the goal of producing a sequence of tokens that yields a high reward without veering too far from the pre-training distribution. Practically that means outputting a little story about a guy who does a good job.

06.03.2026 14:45 👍 2 🔁 0 💬 2 📌 0

and if it wasn’t, the LLM had to go back and try again in a loop until the step was carried out correctly, now I argue we’ve got something more goal-pursuey. The scaffolding creates the feedback loop which makes it more like a rational agent and less like a fancy autocomplete (positive affect).

06.03.2026 14:42 👍 1 🔁 0 💬 1 📌 0

I do think that the picture is complicated by so-called scaffolding, and this is a huge and underrated part of the success of things like Claude Code (and I’m sure what you’re working on as well!). If at each step in the multiplication there was a check that the step had been carried out correctly,

06.03.2026 14:42 👍 1 🔁 0 💬 1 📌 0

Of course no one cares about multiplying large integers per se, but people do care about many tasks that I would argue are analogous to multiplying large integers in that they involve correctly carrying out a sequence of steps with no mistakes.

06.03.2026 14:38 👍 1 🔁 0 💬 2 📌 0

In particular, if the reason it can’t multiply is that it just isn’t smart enough then at some point making it smarter should lead to it gaining the ability to multiply. But if the reason is it just doesn’t want to then you need some other strategy; you need to make it want to.

06.03.2026 14:36 👍 1 🔁 0 💬 1 📌 0

I’m not really concerned with the nature of humans, squirrels, or pet cats. I’m just talking about LLMs. I think it matters practically because I think it has implications about what you can expect them to be able to reliably do, and what “scaling” as opposed to other strategies can help with.

06.03.2026 14:34 👍 1 🔁 0 💬 1 📌 0

It’s true that many IDEs have a “Refactor” menu much like a Word Processor has an “Edit” menu and the Refactor menu is a list of tools for refactoring analogous to Cut, Paste, Find And Replace etc, but that wouldn’t make editing writ large a “very low bar”

06.03.2026 13:34 👍 6 🔁 0 💬 1 📌 0

I wonder what this person thinks refactoring is

06.03.2026 03:36 👍 2 🔁 0 💬 1 📌 0

I can see that you don't understand what I'm saying and I think that's okay.

06.03.2026 00:35 👍 3 🔁 0 💬 1 📌 0

I think if you model your 7yo as an agent who optimizes in the rational pursuit of multiplying large numbers you will be led to make some bad predictions about what your 7yo actually does

06.03.2026 00:27 👍 1 🔁 0 💬 1 📌 0

correct, I don't think your 7yo's goal, in the sense of being an objective which your 7yo rationally pursues, is to multiply large numbers.

06.03.2026 00:23 👍 1 🔁 0 💬 1 📌 0

you're not going to like this answer but, for example, multiplying large integers. You're going to say, oh it's just bad at that. But it's good at all the parts of it. If you can multiply small integers you can multiply large integers and it can multiply small integers. If it wanted to it would.

06.03.2026 00:07 👍 2 🔁 0 💬 1 📌 0

and its failures to accomplish T will seem quite baffling given what you understand about its ability on tasks X, Y, Z and the assumption that it wants to accomplish T, but will appear quite normal and expected if you abandon the notion that it gives a damn one way or the other about T.

05.03.2026 19:03 👍 6 🔁 0 💬 1 📌 0

The difference is that an agent which wants to do a task T which is accomplished by correctly carrying out subtasks X, Y, Z, each if which subtasks it appears to be “able” to carry out competently, will always accomplish T. Whereas a random bumbler will only sometimes or never accomplish T

05.03.2026 19:03 👍 4 🔁 0 💬 1 📌 0

This is precisely where we disagree, yes. I think you are the one who is confusing “It doesn't want to do X" with "It's very bad at X" or more precisely “It rarely succeeds at X.” I do think the distinction has practical and pragmatic consequences.

05.03.2026 18:58 👍 6 🔁 0 💬 1 📌 0

Yeah exactly

05.03.2026 18:54 👍 2 🔁 0 💬 0 📌 0

You’ll get some output like “hello there I am a paperclip maximizer” but it won’t actually maximize paperclips.

05.03.2026 18:42 👍 8 🔁 0 💬 2 📌 0

Colin

Latest posts by Colin @colin-fraser.net