Jan Kulveit's Avatar

Jan Kulveit

@kulveit

Researching x-risks, AI alignment, complex systems, rational decision making

476
Followers
132
Following
33
Posts
14.11.2024
Joined
Posts Following

Latest posts by Jan Kulveit @kulveit

Beren Millidge is in ~top 5 people who's taste in questions I respect the most; this talk covers about 15 big ideas in half an hour, each of which would be sufficient as a topic for a pop-science book; highly recommended.

22.01.2026 09:05 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
The Post-AGI Workshop: Economics, Culture and Governance | San Diego 2025
The Post-AGI Workshop: Economics, Culture and Governance | San Diego 2025 Join us in San Diego on December 3rd, 2025 to explore post-AGI economics, culture, and governance. Co-located with NeurIPS.

AI polytheism, ultra-malthusian state, Why Not Uber-Organisms, hyper-cooperators, the multicellular transition,... and yes, what's the basin of convergent evolution of human values.
postagi.org/talks/millid...
www.youtube.com/watch?v=ua67...

22.01.2026 09:05 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
ChatGPT and other LLMs were asked to choose between consumer products, academic papers, and films summarized either by humans or LLMs. The LLMs consistently preferred content summarized by LLMs, suggesting a possible antihuman bias. In PNAS: https://www.pnas.org/doi/10.1073/pnas.2415697122

ChatGPT and other LLMs were asked to choose between consumer products, academic papers, and films summarized either by humans or LLMs. The LLMs consistently preferred content summarized by LLMs, suggesting a possible antihuman bias. In PNAS: https://www.pnas.org/doi/10.1073/pnas.2415697122

ChatGPT and other LLMs were asked to choose between consumer products, academic papers, and films summarized either by humans or LLMs. The LLMs consistently preferred content summarized by LLMs, suggesting a possible antihuman bias. In PNAS: www.pnas.org/doi/10.1073/...

14.08.2025 16:29 πŸ‘ 7 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1

Related work by @panickssery.bsky.social
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.

08.08.2025 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Full text: pnas.org/doi/pdf/10.1...

Research done at acsresearch.org

@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.

08.08.2025 15:34 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

While defining and testing discrimination and bias in general is a complex and contested matter, if we assume the identity of the presenter should not influence the decisions, our results are evidence for potential LLM discrimination against humans as a class.

08.08.2025 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Unfortunately, a piece of practical advice in case you suspect some AI evaluation is going on: get your presentation adjusted by LLMs until they like it, while trying to not sacrifice human quality.

08.08.2025 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

How might you be affected? We expect a similar effect can occur in many other situations, like evaluation of job applicants, schoolwork, grants, and more. If an LLM-based agent selects between your presentation and LLM written presentation, it may systematically favour the AI one.

08.08.2025 15:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

"Maybe the AI text is just better?" Not according to people. We had multiple human research assistants do the same task. While they sometimes had a slight preference for AI text, it was weaker than the LLMs' own preference. The strong bias is unique to the AIs themselves.

08.08.2025 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We tested this by asking widely-used LLMs to make a choice in three scenarios:
πŸ›οΈ Pick a product
πŸ“„ Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.

08.08.2025 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Being human in an economy populated by AI agents would suck. Our new study in @pnas.org finds that AI assistantsβ€”used for everything from shopping to reviewing academic papersβ€”show a consistent, implicit bias for other AIs: "AI-AI bias". You may be affected

08.08.2025 15:34 πŸ‘ 9 πŸ” 3 πŸ’¬ 1 πŸ“Œ 1
Preview
Post-AGI Civilizational Equilibria Workshop | Vancouver 2025 Are there any good ones? Join us in Vancouver on July 14th, 2025 to explore stable equilibria and human agency in a post-AGI world. Co-located with ICML.

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!

Post-AGI Civilizational Equilibria: Are there any good ones?

Vancouver, July 14th
www.post-agi.org

Featuring: Joe Carlsmith, @richardngo.bsky.social‬, Emmett Shear ... 🧡

18.06.2025 18:12 πŸ‘ 10 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0
Preview
Gradual Disempowerment: Concrete Research Projects β€” LessWrong This post benefitted greatly from comments, suggestions, and ongoing discussions with David Duvenaud, David Krueger, and Jan Kulveit. All errors are…

What to do about gradual disempowerment from AGI? We laid out a research agenda with all the concrete and feasible research projects we can think of: 🧡

www.lesswrong.com/posts/GAv4DR...

with Raymond Douglas, @kulveit.bsky.social @davidskrueger.bsky.social

03.06.2025 21:22 πŸ‘ 8 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

- Threads of glass beneath earth and sea, whispering messages in sparks of light
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces

30.04.2025 08:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Imagine explaining physical infrastructure critical for stability of our modern world in concepts familiar to the ancients
- Giant spinning wheels
- Metal moons, watching the earth from the heavens
- Ships under the sea, able to unleash the fire of the stars

30.04.2025 08:55 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
The Pando Problem AI safety has a problem: we often implicitly assume clear individualsβ€”like humans.

AI safety has a problem: we often implicitly assume clear individuals - like humans.

In a new post, I'm sharing why this fails, and why thinking of AIs as forests, fungal networks, or even reincarnating minds helps get unconfused.

Plus stories, co-authored with GPT4.5

03.04.2025 07:50 πŸ‘ 9 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Post image

The Serbian protests show The True Nature of various 'Colour revolutions':

Which is, people protesting just don't prefer to live in incompetent kleptocratic Russia-backed states. No US scheming needed.

17.03.2025 13:35 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Confusion which casual US observers often have is equating Russia with ˜former Warsaw Pact.
Warsaw Pact population was 387M: USSR 280M, Poland 35M, E.Germany 16M, Czechoslovakia 15M, Hungary 10M, Romania 22M, Bulgaria 9M.
Russia+Belarus is now 144M, NATO East& Ukraine ˜150M.

07.03.2025 14:56 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

the most surprising and disappointing aspect of becoming a global health philanthropist is the existence of an opposition team

27.02.2025 04:42 πŸ‘ 11260 πŸ” 1222 πŸ’¬ 111 πŸ“Œ 37

A simple theory of Trump’s foreign policy: "make the world safer for autocracy" (β€˜strong man rule,’ etc.), moderated by his personal self-interest.

What is the best evidence against?

24.02.2025 18:39 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New paper: What happens once AIs make humans obsolete?

Even without AIs seeking power, we argue that competitive pressures are set to fully erode human influence and values.

www.gradual-disempowerment.ai

with @kulveit.bsky.social, Raymond Douglas, Nora Ammann, Deger Turann, David Krueger 🧡

30.01.2025 17:19 πŸ‘ 17 πŸ” 1 πŸ’¬ 1 πŸ“Œ 4
Preview
A Three-Layer Model of LLM Psychology β€” LessWrong This post offers an accessible model of psychology of character-trained LLMs like Claude.Β  …

Accessible model of psychology of character-trained LLMs like Claude: "A Three-Layer Model".
-Mostly phenomenological, based on extensive interactions with LLMs, eg Claude.
-Intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions

27.12.2024 17:53 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

7/7 At the end ... humanity survived, at least to the extent that "moral facts" favoured that outcome. A game where the automated moral reasoning led to some horrible outcome and the AIs were at least moderately strategic would have ended the same.

29.11.2024 11:38 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

6/7 Most attention went to geopolitics (US vs China dynamics). Way less on alignment, if, than focused mainly on evals. How a future with extremely smart AIs may going well may even look like, what to aim for? Almost zero

29.11.2024 11:38 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

5/7 Most people and factions thought their AI was uniquely beneficial to them. By the time decision-makers got spooked, AI cognition was so deeply embedded everywhere that reversing course wasn't really possible.

29.11.2024 11:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

4/7 Fascinating observation: humans were often deeply worried about AI manipulation/dark persuasion. Reality was often simpler - AIs just needed to be helpful. Humans voluntarily delegated control, no manipulation required.

29.11.2024 11:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

3/7 Today's AI models like Claude already engage in moral extrapolation. For example, this is an Opus eigenmode/attractor: x.com/anthrupad/st...
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.

29.11.2024 11:38 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

2/7 The game determined AI alignment through dice rolls. My AIs ended up aligned with "Morality itself" + "Convergent instrumental goals." This is less wild than it sounds.

29.11.2024 11:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Over the weekend, I was at "The Curve" conference. It was great.

One highlight was an AI takeoff wargame/role-play by
Daniel Kokotajlo and Eli Lifland

I played 'the AIs'

Spoiler: we won. Here's how it went:

29.11.2024 11:38 πŸ‘ 5 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Space of minds - what's even possible there

28.11.2024 08:32 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0