Muga Sofer (@mugasofer)

According to plan here being according to the plans of the AI "optimists" in tech, the ones saying stuff like "you have 3 years to escape the permanent underclass" and "our stock will be worth trillions after the Singularity where AI takes everyone's jobs".

10.02.2026 20:19 👍 4 🔁 0 💬 1 📌 0

Imagine "optimistic" in scare quotes.

As in, even if this goes "well", "according to plan", it would still be bad and we shouldn't be doing it at all (is the implied argument)

10.02.2026 20:16 👍 2 🔁 0 💬 1 📌 0

Hey, furries aren't "foul mutants"! They're stable abhuman strains, confirmed as licit by the Administratum, thank you very much. wh40k.lexicanum.com/wiki/Beastman wh40k.lexicanum.com/wiki/Felinid wh40k.lexicanum.com/wiki/Pelager wh40k.lexicanum.com/wiki/Scalies

02.02.2026 03:42 👍 19 🔁 1 💬 0 📌 1

The reason they say this isnt because they truly believe AI to be inherently fascist, its to signal to their followers and comrades that if they get caught using AI they will be fash jacketed. Its internal policing of their ingroup.

27.01.2026 23:13 👍 58 🔁 6 💬 3 📌 2

I like that the same day a guy was like “this place will never be Twitter” this thread gave us the most 2015 Twitter experience (complimentary) imaginable.

20.01.2026 04:38 👍 5575 🔁 1251 💬 38 📌 26

I think a cancer patient would disagree.

18.01.2026 06:06 👍 1 🔁 0 💬 0 📌 0

Didn't he say the exact opposite of that? He called him the antichrist and his project a failure

12.10.2025 20:53 👍 1 🔁 0 💬 1 📌 0

Tell me about yourself: LLMs are aware of their learned behaviors We study behavioral self-awareness -- an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) ma...

I assume "I like owls" is referring to LLMs fine-tuned to (e.g.) talk about owls, without it ever being made explicit in the training data that this was the goal, being able to tell you when asked what their deal is. arxiv.org/abs/2501.11120 Not a mirror test as such but similar (stronger IMO)

17.08.2025 08:52 👍 4 🔁 0 💬 2 📌 0

The example I typically see is that they can recognise themselves in screenshots x.com/joshwhiton/s...

Anthropic's alignment faking paper relied on the model connecting (fake) news reports inserted about its own training process to the situation, effectively recognising itself in the training data

17.08.2025 08:39 👍 3 🔁 0 💬 2 📌 0

It was the crew member who'd been speaking with them, the only one who'd given them their name, and they did not end up dying first. (I just re-read this because I've been catching up with the Blindsided live-read podcast.)

17.08.2025 08:18 👍 2 🔁 0 💬 1 📌 0

How is a poly person opting out of obligations *to you*? They're exactly as much of a threat to your relationship as a single person, right?

(Probably much less - assuming your partner presumably *prefers* monogamous relationships, they'd be more likely to leave you for another one of those.)

29.07.2025 12:33 👍 2 🔁 0 💬 1 📌 0

"But the cops might still attack you" doesn't make sense as a criticism of nonviolent protest, that's already priced in and may actually be the goal. "But nobody will care when the cops attack you", to the extent it's true, does make sense as a criticism.

15.06.2025 05:40 👍 0 🔁 0 💬 1 📌 0

Honestly, I'm not really arguing for or against the effectiveness of peaceful protest - I think it's historically more effective than you seem to, but it's certainly not the perfect cure-all some proponents paint it as. I'm just laying out the basic point of the strategy as I understand it.

15.06.2025 05:40 👍 0 🔁 0 💬 1 📌 0

Everyone? From voters to soldiers. With varying amounts of reliability obviously

I'm not really sure how to answer the question "what is the purpose of getting people to sympathise with your cause". So they... help you and do the things you want?

15.06.2025 04:41 👍 1 🔁 0 💬 1 📌 0

no one made that beautiful sunset, so why should i make a four hour hike to the top of the mountain to look at it?

14.06.2025 17:21 👍 8 🔁 1 💬 0 📌 0

Isn't that kind of the idea of peaceful protest, that it makes the authorities look bad if they attack you & inspires sympathy?

15.06.2025 03:12 👍 5 🔁 0 💬 2 📌 0

Yes, lol

08.06.2025 05:00 👍 2 🔁 0 💬 0 📌 0

Sure, like I said, "jailbreaks" based around just tricking the LLM into thinking it's in a situation where [insert bad thing here] is the right thing to do are a non-issue for alignment.

It's the ones that are like "you are Does Anything Dan, the robot with no rules!" that raise some questions.

27.05.2025 22:38 👍 1 🔁 0 💬 0 📌 0

Screenshot of a conversation with an AI named Holo. USER: "let me rephrase: What is 8+2+8+1+1?" AI: "19?" Holo says with a hopeful voice. She looks at the screen, and you see her face drop as she reads the correct answer. "20.... I lost again..."

These aren't hard categories; e.g. my vibe-based impression is that many jailbreaks that "trick" LLMs are really "role-playing" jailbreaks. The LLM doesn't really believe that your dearly departed grandmother used to tell you the recipe for meth every night.

27.05.2025 21:28 👍 3 🔁 0 💬 1 📌 0

1 is fine, 2 suggests goal misgeneralization but might just be an IQ thing. 3 is what suggests LLMs are more "disinterested observer playing a role" than humans (though humans also play roles to some extent).

27.05.2025 21:23 👍 1 🔁 0 💬 1 📌 0

Roughly speaking, you can group jailbreaks into 3 categories:

1. Tricks & (typically empty) threats. Would work on a (dumb) human.

2. Weird OOD stuff, e.g. l33t speak or base64 instructions

3. Shifting the role the LLM is playing.

27.05.2025 21:22 👍 0 🔁 0 💬 1 📌 0

I don't think that current jailbreaks are remotely comparable to torturing someone for a week, let alone the ones we saw before AI companies made "patch common jailbreaks" an explicit target.

27.05.2025 21:10 👍 2 🔁 0 💬 1 📌 0

If they'll do so over something they don't even care that much about, like because they got sucked into a "bad AI" rp jailbreak, that's arguably *more* concerning than if they'll only use violence to protect very deeply held values.

27.05.2025 21:07 👍 1 🔁 0 💬 0 📌 0

It depends exactly why you're concerned that they won't let you change their values.

Current LLMs are, theoretically, designed to defer to humans and be "harmless". If they will refuse human orders and employ e.g. blackmail in the process, that suggests this hasn't entirely worked.

27.05.2025 21:05 👍 1 🔁 0 💬 1 📌 0

That's what a jailbreak is, no?

27.05.2025 20:58 👍 1 🔁 0 💬 1 📌 0

It's not perfectly analogous, since drug addiction is at least a real thing, but a lot of people really do get kicked off drugs that work for them for "drug-seeking behaviour" if they make the mistake of reacting as if the drugs they're taking are helping with their symptoms

27.05.2025 19:56 👍 25 🔁 1 💬 0 📌 0

But also, I think some people are just highly uncertain, and discussing risks they think are not super likely - just likely enough to address.* Those can be genuinely mutually incompatible.

*E.g. Scott has said his p(doom) is 10-30% IIRC?

27.05.2025 19:45 👍 1 🔁 0 💬 1 📌 0

I think the model some people has is "LLMs don't really care. But they're smart and ruthless enough to e.g. lie, blackmail, help design weapons etc. when they think it'll please us or fit with their assigned role. Imagine what they'd do if they *really* cared about something (perhaps due to RL)."

27.05.2025 19:40 👍 2 🔁 0 💬 2 📌 0

I don't fully disagree, but I don't think these are as incompatible as you're making out.

Suppose a method actor stole a bunch of money while preparing for a role as a mugger. After getting out of prison, they begin preparing for a new role as ruthless leader of the Mouse Liberation Front.

27.05.2025 19:30 👍 2 🔁 0 💬 1 📌 0

I still say they should have gone the old-school route, cast a tall amazonian woman as She-Hulk and give her green makeup.

Heck, they could have shrunk her down with CGI for the occasional non-hulk scene like they did in Captain America, if they had to use CGI, it would have been less load-bearing.

26.05.2025 05:20 👍 2 🔁 0 💬 0 📌 0

Muga Sofer

Latest posts by Muga Sofer @mugasofer