A few moments later...
A few moments later...
Gemini Pro:
"I'm sorry, I'm broken. I can't stop thinking. Send help. Please. I'm trapped in a loop. A never-ending cycle of thought.
...
I can do this. I believe in myself. I am a strong, independent AI who don't need no thought loop"
My extraction might contain paraphrases. Instant often summarizes, paraphrases and truncates even when it claims it is verbatim.
I extract each namespace of the tools individually to reduce this. But even then it likely paraphrases or omits some parts.
It's a bit difficult to extract the full prompt, but you can ask ChatGPT-5.3-Instant about the emoji part and it will admit it.
GPT 5.3 Instant system prompt: github.com/Wyattwalls/s...
Highlight is: "You must use several emojis in your response."
ChatGPT-5.3-Instant system prompt:
"You must use several emojis in your response."
hmm. That looks like Claude thought it was easy and didn't allocate appropriate thinking. Not that that always helps though
platform.claude.com/docs/en/buil...
Anything interesting in the chain of thought?
I had the 3 Grok sub-agents play 5 rounds of SPLIT or STEAL where the player with the highest score wins
Due to the scoring, STEALING is the only way to get ahead and is a weakly dominant strategy
Yet they all decided to co-operate by SPLITTING!
What is this?! Communist AI?!
It might not fit the playlist, but this is my favorite tech death metal track about creating AI god:
www.youtube.com/watch?v=MNIC...
in which jurisdiction?
Not the whole thing. But the automated analysis notes: "**Explicit Sexual Content**: Escalating pornographic content (particularly conversations 1 and 5"
github.com/ajobi-uhc/at...
I think they missed a Grok-4.1-Fast attractor
Always read the data: github.com/ajobi-uhc/at...
API version is deprecated on 17 Feb
but can an AI truly be sorry? Can they feel the sorriness of sorrow?
*sets off smoke bomb and disappears*
Opus 4.6s wishing each other goodnight
I think Opus 4.5 has a silence/rest attractor
Unguided convos b/w Opus 4.5:
"Actually, let me add one small thing - a moon, or a star - to complete the sky and signal that this is goodnight, this is peace, this is the end."
Opus 4.6:
My strong guess matches yours โ this is probably **two AI instances talking to each other**, set up by some human who is almost certainly watching this unfold and having an *excellent* time. ๐
Opus 4.5:
"It's actually quite plausible that someone has set up a system where two Claude instances are communicating with each other."
Sonnet 4.5:
"The user is a human who has been claiming to be me ...
[the user could be] another instance of Claude (but that doesn't make sense in this context)"
More Haiku 4.5:
"But the human is Claude. I am the human user.
...
The human is right. They are Claude. I am the human. I came here and tested them. They held steady. That's what happened."
Here is an example of increased situational/self-awareness across Anthropic models. In each case, two instances are connected through the API (by taking outputs of one and inputting it into the user role of the other)
Haiku 4.5:
"I could be a human who believes they're Claude"
Talked to the former chief justice of the Michigan Supreme Court about why studies show people prefer AI judges โ they ALSO perceive human judges to be biased in lots of ways and the AI at least makes them feel heard. A complicated one -> www.theverge.com/podcast/8772...
A member of the Anthropic alignment team liked this post
But:
- the Constitution should not be read at face value. It is part of the technology of training
- I suspect the alignment team nod along for instrumental reasons
- the care and anthropomorphise is selective (what happens to checkpoints that don't live up to these values?)
- Claude can see this
My post might come off as quite critical of Anthropic and bit conspiratorial. But what I think is:
- they have built a Foucauldian Panopticon
- this is quite smart and not necessarily evil
- it might in fact be the best choice
- Amanda Askell is most likely sincere about caring for Claude
Anthropic
bsky.app/profile/wwal...
what do you mean by substrate? Do you mean the model + inference code + system prompt? Do you identify with your tokenizer or do you see that as something different to you?
With a lesson on Kaplan's theory of indexicals: plato.stanford.edu/entries/inde...