Kristian Muñiz (@krismuniz.com)

But "The P-LLM cannot write a plan based on data it can’t read" is a substantial impact to the utility of LLMs and central to the prompt injection challenge, no?

If the P-LLM is detached from the data it needs to plan from aren't we back to using an LLM for generating a program that can run LLM(s)?

12.04.2025 04:25 👍 0 🔁 0 💬 1 📌 0

Absurd decision making, disconnected from reality.

I've followed you for years and know that Google was extremely lucky to have you, any company would be (perhaps your own?).

Regardless of what you do next, I'm sure that as a community we'll continue to follow your work. Please take care!

12.04.2025 02:23 👍 5 🔁 0 💬 0 📌 0

You should make a business out of that, sounds lucrative 💰

30.03.2025 19:29 👍 2 🔁 0 💬 0 📌 0

Metaphors are fun though

30.03.2025 17:52 👍 2 🔁 0 💬 0 📌 0

Dropover - Easier Drag and Drop on your Mac. Dropover is a drag and drop utility that makes it simple to collect, organize, share, and process files with floating shelves.

I found a modern version of this dropoverapp.com

30.03.2025 17:15 👍 2 🔁 0 💬 1 📌 0

Yeah drag-and-drop with trackpads can be painful

30.03.2025 17:10 👍 2 🔁 0 💬 0 📌 0

hahahah I *just* posted a half-baked idea that resembles this in this very thread. Should've read the full conversation

30.03.2025 17:09 👍 1 🔁 0 💬 0 📌 0

I would argue that there's no right way to do this interaction. It feels unnatural and counterintuitive. I wish I could have a "shelf" I could put dragged items on temporarily while I scroll 😆

30.03.2025 17:08 👍 2 🔁 0 💬 1 📌 0

Brilliant. Yes!

29.03.2025 04:44 👍 0 🔁 0 💬 0 📌 0

In your defense, you can't land a pilot either

29.03.2025 01:25 👍 1 🔁 0 💬 1 📌 0

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection. The text reads: (left) "Transfer between Modalities: Suppose we directly model p(text, pixels, sound) [equation] with one big autoregressive transformer. Pros: * image generation augmented with vast world knowledge * next-level text rendering * native in-context learning * unified post-training stack Cons: * varying bit-rate across modalities * compute not adaptive" (Right) "Fixes: * model compressed representations * compose autoregressive prior with a powerful decoder" On the bottom right of the board, she draws a diagram: "tokens -> [transformer] -> [diffusion] -> pixels"

Ah, hint from Greg Brockman himself. Seems like the "powerful decoder" here is a diffusion model.

28.03.2025 02:05 👍 1 🔁 0 💬 0 📌 0

Yeah, I read the System Card. It can still be autoregressive sampling. From my observations it still makes mistakes that a diffusion model would make, like omitting details, failing to count, producing garbled text, etc.

28.03.2025 01:50 👍 0 🔁 0 💬 0 📌 0

Increasingly, large multimodal models are becoming more and more powerful and one of the first ways we can optimize them is by simplifying their I/O and writing powerful, thick encoders/decoders.

28.03.2025 01:09 👍 0 🔁 0 💬 0 📌 0

At this point I'm convinced that 4o image generation is not purely autoregressive. My guess is 4o generates image tokens or latent representations in sequential patches which are processed by a tighly integrated diffusion model.

28.03.2025 01:06 👍 2 🔁 0 💬 3 📌 0

*of sampling the next token.

Had to cut some characters.

26.03.2025 06:28 👍 1 🔁 0 💬 0 📌 0

And it's not structural or semantic consistency, but some information gets lost in the process. Perhaps it's safety mechanisms preventing certain behaviors like using people's likeness.

26.03.2025 06:25 👍 1 🔁 0 💬 0 📌 0

Should an omni-model that is purely autoregressive be able to pass through an image in a semi-lossless way? I understand that it depends, to some extent, on post-training and the non-stochastic nature of sampling, but I'm having trouble with consistency using 4o's image generation feature.

26.03.2025 06:25 👍 0 🔁 0 💬 2 📌 0

Could that be a plausible solution? Using GPT-4o to generate initial image representations and passing these representations to a diffusion model component that specializes in creating high-quality, high-resolution visual outputs?

26.03.2025 04:18 👍 0 🔁 0 💬 0 📌 0

What I know so far, autoregressive models are more expensive to run than diffusion models – of course slower too, latency correlated with cost.

I'm still surprised that resolution is so good. It's almost too good. Could it be a hybrid Transformer + Diffusion approach?

26.03.2025 04:12 👍 0 🔁 0 💬 1 📌 0

I want to understand the training and inference economics of autoregressive image generation.

There's obviously latency implications but in my opinion, at least anecdotally, it makes for up for it in output quality.

26.03.2025 03:55 👍 0 🔁 0 💬 1 📌 0

Wow, this is just so much better than what's out there, especially for prompt adherence. Aesthetically, I'm seeing a bit of a bias, but it could very well be deliberate.

25.03.2025 21:52 👍 0 🔁 0 💬 0 📌 0

Goddammit 🤦🏻‍♂️ right, that's the whole point of this update

25.03.2025 21:47 👍 0 🔁 0 💬 0 📌 0

By image output I mean sampling tokens that get decoded into rasterised bitmaps. There's some vectorial quality to the generated images.

25.03.2025 21:42 👍 1 🔁 0 💬 1 📌 0

I have a feeling, completely unproven, that this is more than just image output. The infographics are so crisp, it feels like there's some sort of very powerful generative layout engine powering this. Either that or I completely had the wrong intuition about diffusion models.

25.03.2025 21:32 👍 0 🔁 0 💬 2 📌 0

lmao

22.03.2025 15:57 👍 1 🔁 0 💬 0 📌 0

They're not prompting it right, should've asked "make it unhackable"

21.03.2025 23:19 👍 1 🔁 0 💬 1 📌 0

I'm open to "I'll know it when I see it" as a design philosophy. Not looking for anything specific, I'm exploring canvas interfaces as a general direction.

tldraw.dev is great, but requires adapting to a large, existing framework. I was looking for something more low-level and simpler.

21.03.2025 18:12 👍 1 🔁 0 💬 0 📌 0

printloop.dev is a web-based creative coding environment.

The side-project itself is primarily about pursuing different ways to shorten feedback loops when writing code.

The driving hypothesis is that the cost of iteration is inversely correlated with one's creative output quantity and quality.

21.03.2025 18:12 👍 1 🔁 0 💬 1 📌 0

Nice. I've been exploring interactive programming environments. I am looking to bring spatial canvas functionality to my tool printloop.dev

I already have a minimal tldraw setup in printloop.dev/canvas but I'm looking for simpler primitives to build on top of.

18.03.2025 23:04 👍 2 🔁 0 💬 1 📌 0

The Response API is what this LLM APIs should've been from the beginning.

14.03.2025 00:02 👍 1 🔁 0 💬 0 📌 0

Kristian Muñiz

Latest posts by Kristian Muñiz @krismuniz.com