Tom Johnson's Avatar

Tom Johnson

@tomjohnson3

CTO at Multiplayer.app: autonomous AI debugging in production Also: πŸ€– robot builder πŸƒβ€β™‚οΈ runner 🎸 guitar player

271
Followers
485
Following
357
Posts
09.12.2023
Joined
Posts Following

Latest posts by Tom Johnson @tomjohnson3

Preview
Spec-driven development: the rebranded BDUF We've come full circle to Big Design Up Front… but that's not a bad thing.

More of my thoughts on this topic here: [https://beyondruntime.substack.com/p/spec-driven-development-the-rebranded](beyondruntime.substack.com/p/spec-drive...)

05.03.2026 12:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

If your team isn’t discussing Spec-Driven Development or how your design decisions are documented, versioned, and shared, you’re undermining your AI tooling strategy.

05.03.2026 12:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

AI agents (especially when running in parallel) don’t operate well on vague intent. They need precise, well-reasoned specifications to make consistent decisions.

05.03.2026 12:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Most engineers tune out when they hear β€œsystem design.”

It feels like overhead.

Until AI forces you to care about it.

05.03.2026 12:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

If your data is scattered across tools, aggressively sampled, and missing payloads … AI can't magically correlate it for you.

You're automating a broken workflow.

02.03.2026 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
At least one major incident will be traced back to an AI coding tool It's Only a Matter of When.

Full article: [https://beyondruntime.substack.com/p/a-major-incident-will-be-traced-back](beyondruntime.substack.com/p/a-major-in...)

26.02.2026 10:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Why is this worrisome? Because the latest State of Code Developer Survey from Sonar reports that AI-generated or significantly AI-assisted will jump to 65% by 2027.

This effectively means that very soon (if not already) a major incident will be traced back to an AI coding tool.

26.02.2026 10:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

So the net effect of AI-assisted development is that we’ve offloaded the part developers are generally comfortable with (writing code), and left them with the part that’s harder (system design, reviews, debugging, etc.), but without the context built naturally by doing the writing themselves.

26.02.2026 10:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Reading and understanding someone else’s code is significantly harder than writing code yourself. AI-generated code is, in every meaningful sense, someone else’s code.

26.02.2026 10:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Adding AI to legacy observability practices won't make debugging faster.

It'll just amplify the problem.

24.02.2026 14:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The talk covers the modern telemetry data problem, why most MCP implementations inherit broken observability practices, and the path to self-healing systems that can actually act on the right data.

Full agenda: leaddev.com/leaddev-lond...

23.02.2026 21:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Built an MCP server recently? Did developers actually use it, or is it collecting dust? 😡

I'll be speaking at @leaddev.com (June 1-2) about why "connecting AI to everything" doesn't work and what actually does when building tools that move from assistants to autonomous agents.

See you there? πŸ‡¬πŸ‡§

23.02.2026 21:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ‘€Β IMO, the missing piece of the puzzle is having runtime visibility into your system, auto-correlated across the stack.

20.02.2026 11:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

β–Ά Reading unfamiliar code is exhausting. Now imagine that code is coming from an LLM that writes faster than you can think and doesn't take lunch breaks β—€

The AI productivity paradox: code is written faster than ever, but humans can’t keep up with manually reviewing it (or debugging it)!

20.02.2026 11:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
The AI guardrails problem Why "add paranoia" is now part of the framework

You need the right runtime context to not fly blind.

More about this here

beyondruntime.substack.com/p/the-ai-gua...

19.02.2026 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

You can't reproduce non-deterministic behavior. You need the actual context from when it happened: the prompts, the reasoning, the state of the system, what external services it returned.

The guardrails problem is both about safety AND observability.

19.02.2026 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

When an LLM writes code for you, the output is deterministic: you can read it, test it, fix it (even if it's harder than writing it yourself).

But when an AI agent makes decisions in prod (e.g. which API to call, how to respond to a user, etc.) traditional debugging breaks down.

19.02.2026 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Debugging AI-generated code is annoying (and time consuming).

Debugging AI-as-a-runtime-component is a completely different problem.

19.02.2026 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Automatic Data Correlation in Observability Your observability stack is complete. So why does debugging still take hours, sifting through data across eight different tools?

Most teams lose hundreds of engineering hours per month to correlation tax. Time that could be spent shipping features instead of hunting for information.

Read the full breakdown in my latest post: dzone.com/articles/aut...

17.02.2026 09:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(1) Track your last 5 incidents. How much time did you spend finding the data vs writing the fix?

(2) Which types of bugs required the most time to hunt for data?

(3) Is your observability stack solving this or just shifting it around?

17.02.2026 09:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

So engineers spend hours playing detective: copying request IDs between tools, matching timestamps, manually piecing together what happened.

Here's how to determine whether your team has a correlation problem: πŸ‘‡

17.02.2026 09:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Teams that fix the data correlation problem ship faster and debug smarter. But what exactly is it?

It’s when debugging (understanding what went wrong) takes longer than fixing because the information is in a bunch of different places: Sentry, Stripe, LogRocket and several APM tools.

17.02.2026 09:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Traditional software:Β `Input A β†’ Output B`Β (always, reliably, debuggably)

AI-generated software:Β `Input A β†’ Output B... or B', or B'', or sometimes C`

That’s why πŸ‘† debugging AI-generated code is tricky.

12.02.2026 13:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Five screenshots. Three 'can you also send...' messages. Twenty minutes later, you finally have enough context to start debugging.

Have you ever had (or witnessed) a similar conversation in Slack? πŸ‘‡

05.02.2026 13:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

Agreed. AI excels at pattern recognition (code review, static analysis) and iterative debugging when it has the *right context*.

That's the key: what you feed them. AI debugging works when you give it complete context: user session replays, full traces, request/response data, runtime behavior.

29.01.2026 14:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

AI debugging is great... until you hit the context bottleneck. In particular, you need runtime data: what the user did, what the backend processed, what came back, etc.

Without that, you're indeed getting vibes, not diagnosis or solutions.

29.01.2026 13:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Where AI coding tools systematically fail They can dramatically accelerate routine development, but they’re not silver bullets

Full article: open.substack.com/pub/beyondru...

29.01.2026 09:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The five blind spots:

(1) Runtime visibility
(2) Hallucinations: Compiles β‰  works correctly
(3) Narrow debugging context
(4) Performance: No awareness of memory, concurrency, scaling bottlenecks
(5) Architecture: Missing organizational context, budget constraints, Conway's Law

29.01.2026 09:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

AI tools are exceptional at the 99.2%. They struggle with the 0.8% that actually matters.

I just published a breakdown of where AI systematically fails πŸ‘‡

29.01.2026 09:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Marc Donner: β€œOf all my programming bugs, 80% are syntax errors. Of the remaining 20%, 80% are trivial logical errors. Of the remaining 4%, 80% are pointer errors. And the remaining 0.8% are hard.”

This sentiment is surprising relevant for AI tools.

29.01.2026 09:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0