Wrote up the whole process including the parts that went wrong
www.zenml.io/blog/how-i-...
Wrote up the whole process including the parts that went wrong
www.zenml.io/blog/how-i-...
Our designer said it best: "It feels much nicer and powerful to work on the website now, and also flexible to make new layouts and whatever ideas that come to our minds without the Webflow restrictions."
One of those reviews caught 7 schema issues that would've broken everything downstream.
Best part is what we can do now that we couldn't before — blog posts through git, a searchable LLMOps database with real filtering, preview URLs for every PR.
The thing that made it reliable: using different models for different parts of the project. ChatGPT Deep Research for the upfront architecture decisions, Claude Code for building, and RepoPrompt to get Codex to review Claude's work at phase boundaries.
Last month I migrated our ZenML website from Webflow to Astro in a week during a Claude Code / Cerebras hackathon. 2,224 pages, 20 CMS collections, 2,397 images. The site you see now is the result.
Didn't win the hackathon but got a production website out of it, so I'll take that trade.
Full roadmap and install instructions in the repo. If you work with annotated datasets and have hit similar pain points, would be curious to hear what formats or features would be most useful.
Not going to change the world, but it might save someone a few hours of debugging coordinate transforms or prevent silent data corruption between tools.
There are a ridiculous number of object detection formats out there, and each one has its own quirks about how it handles bounding boxes, coordinates, or class mappings. I'm working through them slowly, format by format.
→ Convert between annotation formats (focusing on object detection first, but segmentation and classification coming soon)
→ Validate your datasets
→ Generate statistics
→ Semantic diff between dataset versions
→ Create random or stratified subsets
The origin story is pretty mundane: I hit one too many bounding box bugs caused by format inconsistencies and decided someone should just build a Pandoc equivalent for annotation data.
What it does:
A README document for Panlabel, a CLI tool that converts dataset annotation formats, including installation instructions for various platforms.
panlabel 0.2 is out. It's a CLI tool (and Rust library) for converting between different dataset annotation formats. Now also available via Homebrew.
The reasoning isn't strong enough for gnarly bugs, but the speed makes it useful for a different class of task. Still early days figuring out where it fits.
Are you using Codex Spark? Has it carved out a specific role in your workflow, or is it just another option you reach for occasionally?
I'm developing a mental filter for it. Docs updates after a code change? Spark's fine. First pass at demo code? Sure. Scanning docs and suggesting rewrites based on a PR? Worth trying. Complex debugging? Not there yet.
When regular Codex disappears for 30 minutes on high reasoning mode, you learn to run multiple tasks in parallel and context-switch between them. Spark doesn't need that pattern. The speed drops the friction enough that I'm less precious about what I delegate.
My main tools are still Codex 5.3 on high reasoning or Opus 4.6 (usually through @RepoPrompt), but Spark is fast enough that it makes me rethink what's worth handing off.
I've been trying to push myself to use Codex Spark more, mostly because the speed changes the workflow in ways I'm still wrapping my head around.
I don't have an answer. But I do know that when I'm choosing between two tools now, API access is moving from nice-to-have to requirement.
Are you filtering tools this way too, or am I over-indexing on this?
Which brings me to the regret part.
This filtering process might push us toward homogenisation. The same way almost everyone uses Gmail even though better options exist (I use Fastmail, it's just nicer!). Are we about to homogenise our entire toolkit by dropping anything that isn't AI-ready?
OmniFocus from the Omni Group is the painful example for me. I genuinely love it for task management (and kanban support around the corner this year!). They provide Apple Script support, and random people have built MCP wrappers around it, but you can sort of tell it's not a priority for them.
The hierarchy looks like this:
→ MCP server (ideal)
→ CLI with good scripting support
→ Public API I can wrap
→ Nothing (dealbreaker)
If your product doesn't have an MCP server or public API in 2026, you're legacy software.
I've noticed a shift in how I filter tools now. I'm basically only reaching for products that are AI-native i.e. whether I can integrate them into my AI-assisted workflows.
Very few of the harnesses (aside from @RepoPrompt) seem to offer this as a paradigm. It's obviously not in the interests of the big labs to offer this, but why not others?
There's a lot to be gained from dual-model workflows for agentic engineering. Claude Code to take a first pass, Codex to review. Or 5.2-Pro to make a detailed plan and then Sonnet to implement.
If you've copied or downloaded a ChatGPT Deep Research report as markdown, you'll have noticed that they include all sorts of gunk in the file that you don't want.
Here's a skill and a script that strips all that stuff out. (Also OpenAI please just fix this 🙏)
github.com/strickvl/sk...
RLMs let the LLM explore data programmatically instead of stuffing context windows. We built a full example using dynamic pipelines as the RLM runtime, with per-chunk observability, trajectory artifacts, and budget controls.
New blog post: running Recursive Language Models in production with ZenML.
www.zenml.io/blog/rlms-i...
Waitlist and more info: felt.place
More on my chronic pain journey: www.alexstrick.com/blog/pain-l...
trust, I'd be really grateful if you shared this.
One of the more personally meaningful things I've built.
#InteroceptiveAwareness #MentalHealth #WhatIBuilt