Spencer (@spencerwhitman)

Designing AI agents to resist prompt injection How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.

OpenAI says prompt-injection attacks have evolved into social-engineering tactics and describes defenses—a social-engineering risk model plus source–sink analysis—with mitigations like Safe Url, sandboxing for ChatGPT Apps/Canvas, and safeguards in Atlas and Deep Research.

11.03.2026 23:31 👍 2 🔁 1 💬 0 📌 0

How does ChatGPT work? Or rather, language models in general- Part 1 attempting a lay explanation. YouTube video by Casey Fiesler

I'm creating a series of short form videos about how language models work technically. The goal is to be something in between "you know it's next token prediction" and "now you've taken a machine learning class." I'd love your thoughts so here are the first few! 🧵

www.youtube.com/shorts/VZB8X...

08.03.2026 13:30 👍 145 🔁 38 💬 7 📌 2

we (acsresearch.org) expanded this into a larger paper! (my first.) we added some new experiments and found an interesting correlation - prompts that encourage the model to say there is an injection, even when there isn't one, correlate with better concept identification!

arxiv.org/abs/2602.20031

05.03.2026 01:06 👍 81 🔁 13 💬 3 📌 3

#ShareGoodNewsToo

26.02.2026 16:11 👍 177 🔁 76 💬 1 📌 1

New in Claude Code: Remote Control.

Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting.

Claude keeps running on your machine, and you can control the session from the Claude app or http://claude.ai/code

24.02.2026 22:06 👍 102 🔁 9 💬 3 📌 28

New "boundary point jailbreaking" method against LLM safeguards (with prior disclosure to multiple labs) by using noised versions of harmful queries to turn sparse feedback from failed attacks into dense feedback. 🧵

www.aisi.gov.uk/blog/boundar...

17.02.2026 20:55 👍 44 🔁 5 💬 2 📌 3

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵

27.01.2026 16:12 👍 128 🔁 22 💬 1 📌 7

From the minnesota community on Reddit: How You Can Help: MASTER LIST Explore this post and more from the minnesota community

If you're not in Minnesota and curious how to help, this is a week old but is well organized, includes multiple easy-to-understand bullet points, and seems like a good place to start: www.reddit.com/r/minnesota/...

25.01.2026 12:15 👍 692 🔁 627 💬 4 📌 8

These people have absolutely zero ability to say "I was wrong, y'all were right."🤷🏿‍♂️

The closest they get is "I was right then when I said you were panicking for no reason and deranged, and I'm also right now, when I'm saying what you were saying, but way too late. I alone decide when it's right."🤡

25.01.2026 17:33 👍 473 🔁 102 💬 11 📌 3

Opinion | In Minneapolis, I Glimpsed a Civil War

Explore this gift article from The New York Times. You can read it for free without a subscription. www.nytimes.com/2026/01/19/o...

20.01.2026 00:44 👍 2 🔁 1 💬 0 📌 0

Statement by Federal Reserve Chair Jerome H. Powell YouTube video by Federal Reserve

Video message from Federal Reserve Chair Jerome H. Powell:
www.youtube.com/watch?v=KckG...
www.federalreserve.gov/newsevents/s...

12.01.2026 00:35 👍 24025 🔁 9199 💬 1149 📌 2889

How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks Meta’s secure-by-default frameworks wrap potentially unsafe OS and third-party functions, making security the default while preserving developer speed and usability. These frameworks are designed t…

Using AI and automation to accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

engineering.fb.com/2025/12/15/a...

16.12.2025 03:23 👍 0 🔁 0 💬 0 📌 0

Just updated the Big LLM Architecture Comparison article...
...it grew quite a bit since the initial version in July 2025, more than doubled!
magazine.sebastianraschka.com/p/the-big-ll...

13.12.2025 14:22 👍 77 🔁 13 💬 1 📌 0

Re-upping this, since it's only available for two days:

11.12.2025 02:52 👍 356 🔁 177 💬 4 📌 0

You may have heard that in three weeks, the affordability provisions that make healthcare accessible for about 20 million Americans will expire.

Healthcare in the US is already very expensive. The lapse of these provisions will make it even more so.

But what exactly does that mean?

10.12.2025 13:18 👍 2 🔁 1 💬 1 📌 0

I mean it’s clearly a shit product…

05.12.2025 17:01 👍 1 🔁 0 💬 0 📌 0

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. Agents Rule of Two: A Practical Approach to AI Agent Security The first is …

I wrote up some notes on two new papers on prompt injection: Agents Rule of Two (from Meta AI) and The Attacker Moves Second (from Anthropic + OpenAI = DeepMind + others) simonwillison.net/2025/Nov/2/n...

02.11.2025 23:10 👍 57 🔁 13 💬 0 📌 3

11.10.2025 18:20 👍 114 🔁 29 💬 1 📌 0

If this stuff would end up on the news it would absolutely motivate voters. Plenty of people who don't give a shit about politics would absolutely give af about this.

09.10.2025 18:52 👍 2355 🔁 906 💬 72 📌 17

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks Prompt injection attacks pose a significant security threat to LLM-integrated applications. Model-level defenses have shown strong effectiveness, but are currently deployed into commercial-grade model...

Prompt injection is a bit overloaded - the greatest risk seems to be from 3P content; SecAlign and Llama Tamer recipe adds the “content” tags to help with this: arxiv.org/abs/2507.02735

03.10.2025 03:22 👍 0 🔁 0 💬 0 📌 0

A two-part diagram comparing Agentic Reasoning and Agentic Reasoning with a world model. ⸻ Top: Agentic Reasoning Flow: 1. Problem → Think → Action 2. Action interacts with the World, producing Env Feedback (environment feedback). 3. If the result is a Fail, the system loops back to Think → Action. 4. This cycle repeats until success. Key point: Requires repeated trial-and-error with real-world feedback. ⸻ Bottom: Agentic Reasoning with a world model Flow: 1. Problem → Think → into a World Model. 2. Inside the World Model: • Imagine action • Imagine Env Feedback • Loops internally to refine (✔ or ✖ outcomes). 3. Only after internal simulation does it proceed to Action. Key point: Uses imagination/simulation to test actions before execution, reducing failures in the real environment. ⸻ Contrast: • Without world model = trial-and-error in reality. • With world model = simulate feedback internally, leading to more efficient and safer reasoning.

Meta FAIR just released CWD: a dense 32B code world model

What’s a Code World Model? Well, it’s trained to know the effect of code, rather than just mimicking the semantics

hf: huggingface.co/facebook/cwm
paper: ai.meta.com/research/pub...

24.09.2025 23:38 👍 31 🔁 4 💬 2 📌 0

This. Is. Insane.
Read this, please.

26.06.2025 22:14 👍 5812 🔁 2606 💬 388 📌 156

An Onion front page with the headline: Congress, Now More Than Ever, Our Nation Needs Your Cowardice Who will stand up for our democracy? This question, fraught in even the most peaceful times, has only grown more pressing as our country approaches its 250th anniversary. Each passing day brings growing assaults on essential liberties like freedom of speech and due process. Meanwhile, our delicately assembled legal system faces a constant barrage of threats. Even as this issue reaches publication, the U.S. military has been deployed against peaceful protesters. We teeter on the brink of collapse into an authoritarian state. That is why, today, The Onion calls upon our lawmakers to sit back and do absolutely nothing. Members of Congress, now more than ever, our nation desperately needs your cowardice. Our republic is a birthright, an exceedingly rare treasure passed down from generation to generation of Americans. It was gained through hard years of bloody resistance and can too easily be lost. Our Founding Fathers, in their abundant wisdom, understood that all it would take was men and women of little courage sitting in the corridors of power and taking zero action as this precious inheritance was stripped away—and that is where we have finally arrived. Now is not the time for bravery or valor! This is the time for protecting your own hide and lining your pocket. Now is not the time for listening to your idiotic constituents drone on about what’s happening to their precious democracy. This is the time for getting down on all fours and grov- eling. Now is not the time to say, “Enough is enough,” and have the tough conversations about resisting the ongoing assaults on American liberty. This is the time to let the wave of apathy and indifference roll over you as you think about getting a really nice renovation to your house in Kalorama. But what can I, one coward, do alone? you might ask.

Donald Trump just unilaterally bombed Iran. A masked gang is terrorizing our streets. America has rapidly devolved into an authoritarian state.

That's why, today, The Onion has purchased a full page ad in today's New York Times with a simple plea to Congress:

Sit back and do absolutely nothing.

22.06.2025 14:35 👍 15564 🔁 3882 💬 169 📌 184

Just a reminder going forward...

(cartoon from 2003)

22.06.2025 14:52 👍 6664 🔁 1867 💬 113 📌 81

13.02.2025 01:12 👍 3 🔁 0 💬 0 📌 0

mekka okereke :verified: (@mekkaokereke@hachyderm.io) Happy #BlackHistoryMonth ! I'm still not ready to talk about Black history. I still want to talk about white US history. Q: Why do Black people see racism in everything? A: A few years ago, Europea...

Happy #BlackHistoryMonth !

Feb 3:

Q: Why do Black people see racism in everything? C'mon, everything can't be racist! Maybe it's all in your heads!

A:
Read the whole linked thread.

03.02.2025 04:58 👍 46 🔁 13 💬 0 📌 2

Nazi salutes can't hurt you. Nazi policies can.

How're people more agitated by a nazi salute than by nazi policies? I don't get it.🤷🏿‍♂️

👨🏼"Well, now it's open fascism!"

Why is closed fascism any better? The same policies are happening.

Your assignment hasn't changed: Protect the most vulnerable.

20.01.2025 21:19 👍 159 🔁 35 💬 8 📌 0

Brandt from The Big Lebowski grimacing and trying to move on from an awkward moment with Bunny

Elon: *makes an unambiguous Nazi salute during inauguration speech*
The entire American press:

20.01.2025 21:15 👍 2631 🔁 468 💬 53 📌 16

19.01.2025 03:11 👍 3 🔁 0 💬 0 📌 0

Spencer

Latest posts by Spencer @spencerwhitman