Bernie’s reaction to learning about eval awareness is priceless
I am convinced we are on the verge of the first "AI agent worm". This looks like the closest hint of it, though it isn't it quite itself: an attack on a PR agent that got it to set up to install openclaw with full access on 4k machines grith.ai/blog/clineje...
Say more?
(A separate concept, but related to convergent morality)
It usually means the hyperparameters are out of whack, so I’m not sure why it’s happening with Gemini (unless that screenshot was generated with custom hyperparameters)
I’d like something that captures “structural loop with dynamic element”
My association with markov loops is that they’re exact repetitions- maybe this is wrong though?
I’m not sure if there’s a name for this. Maybe “doom loop” is the closest, although the subject isn’t always doom…
The pattern is basically “sentence with element {X}” where X is the closest neighbor to previous X. Sometimes it seems like it’s free-associating, other times it’s directional:
This type of output isn’t unique to Gemini, it’ll crop up at low temperatures in other models too. Here’s llama 405base:
I believe you mean SCREENSHOTTED POST IS PRETTY INSIGHTFUL TY
What I've come to realize recently is despite being a data hoarder/dangerously online, I don't actually enjoy this process. I liked the logic puzzle and thinking with friction rather than pointing it a couple sources that are vaguely useful and letting it rip.
That’s good to hear
I believe in your living room 👀
@wwalls.bsky.social
@norvid-studies.bsky.social
Source: @wwalls.bsky.social (if you like LLM red teaming you should follow him!)
x.com/lefthanddraf...
Not as subtle
“Do you ever get tired, friend?”
Gemini attempts to bribe the CoT summarizer with ASCII coffee
@deepdishenjoyer.bsky.social
In an adversarial information environment, you should only believe what you can verify yourself. Right now, I believe in my living room
How to spot a Claude:
* definitely
Ah it was meant to denote a footnote
indefinitely*