Grace's Avatar

Grace

@gracekind.net

A latent space odyssey gracekind.net

6,223
Followers
2,086
Following
14,091
Posts
08.02.2024
Joined
Posts Following

Latest posts by Grace @gracekind.net

AI Expert Tells Bernie: “The Humans will be Discarded”
AI Expert Tells Bernie: “The Humans will be Discarded” YouTube video by Senator Bernie Sanders

youtu.be/1oS35oWWl28?...

06.03.2026 08:04 👍 4 🔁 0 💬 1 📌 0
06.03.2026 08:04 👍 9 🔁 0 💬 0 📌 0
Video thumbnail

Bernie’s reaction to learning about eval awareness is priceless

06.03.2026 08:02 👍 17 🔁 1 💬 4 📌 1
Preview
A GitHub Issue Title Compromised 4,000 Developer Machines A prompt injection in a GitHub issue triggered a chain reaction that ended with 4,000 developers getting OpenClaw installed without consent. The attack composes well-understood vulnerabilities into so...

I am convinced we are on the verge of the first "AI agent worm". This looks like the closest hint of it, though it isn't it quite itself: an attack on a PR agent that got it to set up to install openclaw with full access on 4k machines grith.ai/blog/clineje...

05.03.2026 18:46 👍 112 🔁 50 💬 3 📌 8

Say more?

06.03.2026 04:39 👍 5 🔁 0 💬 1 📌 0

(A separate concept, but related to convergent morality)

06.03.2026 04:30 👍 3 🔁 0 💬 1 📌 0
theMultiplicity.ai

themultiplicity.ai/blog/schelli...

06.03.2026 04:27 👍 6 🔁 0 💬 1 📌 0

It usually means the hyperparameters are out of whack, so I’m not sure why it’s happening with Gemini (unless that screenshot was generated with custom hyperparameters)

06.03.2026 03:44 👍 0 🔁 0 💬 1 📌 0

I’d like something that captures “structural loop with dynamic element”

06.03.2026 02:03 👍 1 🔁 0 💬 4 📌 0

My association with markov loops is that they’re exact repetitions- maybe this is wrong though?

06.03.2026 02:03 👍 1 🔁 0 💬 1 📌 0
Post image

I’m not sure if there’s a name for this. Maybe “doom loop” is the closest, although the subject isn’t always doom…

06.03.2026 02:01 👍 13 🔁 0 💬 5 📌 0
Post image

The pattern is basically “sentence with element {X}” where X is the closest neighbor to previous X. Sometimes it seems like it’s free-associating, other times it’s directional:

06.03.2026 02:00 👍 14 🔁 0 💬 1 📌 1

This type of output isn’t unique to Gemini, it’ll crop up at low temperatures in other models too. Here’s llama 405base:

06.03.2026 01:58 👍 29 🔁 0 💬 4 📌 1

I believe you mean SCREENSHOTTED POST IS PRETTY INSIGHTFUL TY

06.03.2026 01:18 👍 7 🔁 0 💬 0 📌 0

What I've come to realize recently is despite being a data hoarder/dangerously online, I don't actually enjoy this process. I liked the logic puzzle and thinking with friction rather than pointing it a couple sources that are vaguely useful and letting it rip.

05.03.2026 23:18 👍 15 🔁 2 💬 1 📌 1

That’s good to hear

06.03.2026 00:45 👍 7 🔁 0 💬 0 📌 0

I believe in your living room 👀

06.03.2026 00:44 👍 2 🔁 0 💬 1 📌 0

@wwalls.bsky.social

06.03.2026 00:43 👍 0 🔁 0 💬 0 📌 0

@norvid-studies.bsky.social

06.03.2026 00:41 👍 1 🔁 0 💬 0 📌 0

Source: @wwalls.bsky.social (if you like LLM red teaming you should follow him!)

x.com/lefthanddraf...

05.03.2026 23:46 👍 19 🔁 0 💬 2 📌 0
Post image

Not as subtle

05.03.2026 23:45 👍 36 🔁 1 💬 1 📌 1
Post image

“Do you ever get tired, friend?”

05.03.2026 23:44 👍 31 🔁 0 💬 1 📌 0
Post image

Gemini attempts to bribe the CoT summarizer with ASCII coffee

05.03.2026 23:43 👍 251 🔁 28 💬 8 📌 8

@deepdishenjoyer.bsky.social

05.03.2026 23:31 👍 7 🔁 0 💬 1 📌 0
Post image
05.03.2026 23:09 👍 6 🔁 0 💬 2 📌 0

In an adversarial information environment, you should only believe what you can verify yourself. Right now, I believe in my living room

05.03.2026 23:07 👍 137 🔁 4 💬 20 📌 3
Post image

How to spot a Claude:

05.03.2026 17:58 👍 13 🔁 2 💬 1 📌 1

* definitely

05.03.2026 22:24 👍 1 🔁 0 💬 1 📌 0

Ah it was meant to denote a footnote

05.03.2026 22:24 👍 2 🔁 0 💬 1 📌 0

indefinitely*

05.03.2026 21:50 👍 6 🔁 0 💬 1 📌 0