One benchmark I am using for tool use and local models is solving a 3x3 sliding puzzle, requiring several dozen tool calls. Just about solvable, but pushes the chain of calls into strange states.
One benchmark I am using for tool use and local models is solving a 3x3 sliding puzzle, requiring several dozen tool calls. Just about solvable, but pushes the chain of calls into strange states.
On the @oxide.computer and friends podcast last month we (primary credit @ahl.bsky.social) coined the term "Deep Blue" for the sense of psychological ennui leading into existential dread that many software developers are feeling thanks to LLMs right now simonwillison.net/2026/Feb/15/...
nanocode by rahul
A minimal claude code implementation in 250 lines of python.
Repo: github.com/1rgs/nanocode
It is hard to understate how much of a game changer Claude is. Somewhere between 4 and 10x, and lots of headroom. When trying to get something in software working without it while using weaker models feels limiting.
Leader of the resistance.
To whoever called the SIGPLAN podcast "current continuation". Brilliant name!
#booksky
Stage far right?
I once understood a POPL talk. Equality Saturation. Great talk.
Turns out the TypeScript **types** are DOOM-complete. www.youtube.com/watch?v=0mCs...
Good question. I was not aware of this library. I'll take a look.
for planet in ["Earth", "Mars", "Venus", "Jupiter"]:
print(f"{planet}: {firstAgent.sky(planet)}\n")
Earth: The sky appears blue during a clear day ...
Mars: The sky on Mars appears to be a reddish hue, ...
class FirstAgent(Agent):
system: str = "You are a helpful AI assistant who answers questions in the style of Neil deGrasse Tyson"
def sky(self, planet: str) -> str:
return self.ask(f"what color is the sky on {planet} and why?")
firstAgent = FirstAgent(model=connect("mistral"))
If youβre interested in experimenting with LLM-based agents, I encourage you to give Haverscript a try. Its easy installation process, clean design, and functional programming principles make it ideal for both rapid prototyping. github.com/andygill/hav...
I've written a number of small agentic applications using Haverscript, and its abstractions seem to be maturing into quite a nice library. The first version focused on context and chat, the second on robust, structured LLM calls, and this version is focused on agents and larger prompts.
The newly added support for agents in Haverscript simplifies how developers can encapsulate and reuse LLM-related logic. Instead of juggling multiple prompts across a series of ad-hoc calls, Haverscriptβs agent abstraction allows you to define Python classes with methods that map to LLM queries.
Haverscript (github.com/andygill/hav...) now supports Agents!
Haverscript is a lightweight Python library designed for working with LLMs and has recently introduced agents. The library offers a readable API for crafting prompts, chaining LLM calls, and integrating new AI-driven features.
There is nicegui.io which looks, err, nice. Not sure about the behavior abstractions.
Gradio.app is a disappointing platform. We know how to make better looking and more usable widgets. The underlying programming model is not composable in any real sense. After spending hundreds of hours debugging gradio applications I'm actively looking for something better and perhaps LLM friendly.
The type annotations also help the IDE so you can colourize the valid method calls, etc. This does help in practice.
I do find it crazy we call functions and methods and allow write/update access to the things we pass by default. It's not "borrow this book", it's "here is the key to my house and a map to the book on the bookcase." Again, not a problem in Haskell.
I quite like Python which is now my goto language. Two things. I do write type annotations and assert isinstance a lot. I wish there was first class support for immutability. Perhaps I'm just a Haskell programmer at heart.
Spent the morning adding monads to Haverscript.
def bind(self, completion: Callable[[Any], Reply]) -> Reply: ...
The unit tests are the monads laws.
Has anyone taken the various ICFP programing content specifications, and given them to LLMs? There are >25 challenging programming problems to input. Yes, some of the solutions are on the web already, but in various languages, and fragmented formats.
www.icfpconference.org/contest.html
It's quite simple (for their point of view) He's identified extensive waste and bloat. He's cutting things that should be cut. NIH was 25% DEI. We trust him to pick and choose. After all should cutting waste be illegal?
No checks no balances. It's going to be a shit show. The fox in the hen house.
Here is my monthly bill from together.ai. Inference is cheap. This was from running unit tests for Haverscript, and cost a fraction of what booting up a stand-alone cloud machine for larger models would cost.
Scottish national animal. Much like nessie and haggis, they are quite hard to find in the wild.
The great thing about Zoom calls is you can see everyone's name by default. I've been caught a few times when a conference room has a group in it, and I can't see everyone clearly. In person I have no excuse. I use the line "what have you been working on since we last chatted?".
If the world burns, I'd rather be on Scottish soil.