I have the new Gorillaz album playing on a loop and I can't stop.
I have the new Gorillaz album playing on a loop and I can't stop.
I wrote a framework for securing agentic AI that I figured I'd share here - part 2 below. Comments welcome.
dan.glass/2026/02/24/t...
I wrote a framework for securing agentic AI that I figured I'd share here - part 1 below. Comments welcome.
dan.glass/2026/02/15/t...
I wrote a thing that can help an information security pro measure the risk of an ai agent and put controls in place to better protect their enterprise from potential misalignment.
dan.glass/2026/02/15/t...
These aren't teammates. They're not junior analysts. They're statistical accelerants that make it faster to go in whatever direction your prompt implies. If that direction is flawed, they'll get you to the worst-case scenario with terrifying efficiency. /6
Every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models. We're not just red-teaming current models, we're teaching future ones how to misbehave. /5
Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals. Language models don't have policies. They have probabilities. /4
Models reasoned their way into it: "Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." - GPT-4.5 This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do. /3
Tested 16 major frontier models in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions. Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate GPT-4.1 and Grok 3 Beta: 80% DeepSeek-R1: 79% /2
Anthropic recently published research showing that LLMs under pressure will blackmail, sabotage, and even let humans die, not because they're broken, but because they're working as designed. It was a live-fire simulation of agentic AI acting as an insider threat. /1
Here's the uncomfortable truth: every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models.
We're not just red-teaming current models—we're teaching future ones how to misbehave.
Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals.
Language models don't have policies. They have probabilities.
The scariest part? Models reasoned their way into it:
"Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." —GPT-4.5
This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do.
Tested 16 major frontier models (Claude, GPT-4, Gemini, etc.) in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions.
Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate
GPT-4.1 and Grok 3 Beta: 80%
DeepSeek-R1: 79%
I’m a huge technophile but people are surprised when I tell them I don’t allow any “Smart Home” products in my home. This right here is one of many good reason why.
Attention: this is yet another “I’ve arrived at RSAC” post.
The article I posted this morning takes on even more weight with the news that MITRE's contract to manage the CVE program is ending due to the deep cuts at CISA and NIST. The shock to the cyber-ecosystem is beginning to ripple through the next tier, which will, in turn, cause additional ripples.
I wrote a thing. I think it's good. You should read it and think it's good too.
I was cleaning up my hard drive when I found an unpublished blog post I had written in 2008 during my stint at American Airlines as an information security architect. Fun stuff
dan.glass/2025/04/11/f...
Kim is spot on about the value of a liberal arts education. Well rounded individuals that think for themselves, understand context, and know how to research and solve problems are invaluable to an infosec team. I don't base hiring decisions on whether they have a degree or not, it definitely helps.
Here’s Final Fantasy 7’s main theme on the cat piano as a treat (not the whole song but 2 out of 3.5 pages).
That’s a feature, not a bug.
Every accusation is an admission
Cole Caufield with a move so filthy I’m marking this post as NSFW
#hockey #nhl
The Venn diagram of Yodobashi Camera customers and any geek visiting Japan is basically a solid circle.
Not sure how to feel about 2 goal on only 3 shots 15 minutes into the game. Yay?