5️⃣ Morality is encoded in modular subspaces.
PLSC reveals orthogonal latent dimensions corresponding to individual foundations, promising for interpretability and alignment.
5️⃣ Morality is encoded in modular subspaces.
PLSC reveals orthogonal latent dimensions corresponding to individual foundations, promising for interpretability and alignment.
3️⃣ LLM activations predict human judgments.
Mid-layer states reliably forecast participants’ wrongness ratings for the same moral vignettes.
4️⃣ Neural alignment emerges.
fMRI during moral reading shows representational alignment in PCC (moral/social hub) and even somatosensory cortex
1️⃣ Moral foundations are decodable inside LLMs.
Mid-layers of mid-sized models show clean, separable representations of the Moral Foundations.
2️⃣ LLM geometry mirrors human moral structure.
They reproduce the hierarchical MFT layout, individualizing vs. binding foundations, distinct from social norms.
🚨 New Preprint Out!
“Emergent Moral Representations in Large Language Models Align with Human Conceptual, Neural, and Behavioral Moral Structure”
www.researchsquare.com/article/rs-8...
Do LLMs internally represent morality like humans?
Our results point to a striking yes!
Key findings:
1️⃣ Moral foundations are decodable inside LLMs.
Mid-layers of mid-sized models show clean, separable representations of the Moral Foundations.
2️⃣ LLM geometry mirrors human moral structure.
They reproduce the hierarchical MFT layout, individualizing vs. binding foundations, distinct from social norms.
Sure, but how do we know the baby “experiences pain” rather than just reacts? Behavior isn’t consciousness. My dream toothache shows that the same nociceptive signal can exist without the feeling of pain until a self-model interprets it. Maybe even a baby’s hurt is already a primitive story of self
Maybe pain needs a story to exist.
meaning of the pain gets displaced into the dream world.
When I wake up, the same signal is processed within the awake narrative frame of self-in-the-world. Then it becomes pain — localized, owned (“my tooth”), temporal (“it started last night”), and affective (“it hurts”).
During sleep, my brain still receives nociceptive signals (e.g., from my tooth), but since the waking self-model is offline, my brain weaves the signal into a dream narrative. Instead of “I have a toothache,” the dream constructs a story like “I’m late for school,” or “something’s wrong.” The
If pain is purely raw and doesn’t need interpretation, why do I sometimes dream my toothache as “being late for school” instead of feeling pain? The pain, as I consciously experience it(if i do not feel it like aliens!!), is not just raw sensory input,it's interpreted and narrativized by my mind.
Adding a realizer layer that links LLMs to physical and chemical processes would reconnect them to the causal hierarchy—allowing the system to be-in-the-world (Dasein) and potentially achieve consciousness.
In the hierarchical higher-order pointer theory, each layer points to the one below it, forming an unbroken causal chain. In AI, this chain is severed—the software doesn’t “point down” to physical reality, making it mere simulation.
Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCarty—our lab's wonderful RA—we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧵
‘Being’ is already the newest version
100 upgraded, 1.5M newly installed, 1M to remove
MyMind@MyBrain: ~$
@suryaganguli.bsky.social gives great talk at LLM workshop at Berkley
-LLM<->brain is still a new topic, less progress so far than LLM<->vision
-lets train LLM foundation models of specific brain systems and then reverse engineer them
-emerging paradigm: read-write experiments in brains+machines