Yeah if we could get clinically certified LLMs that would be seriously useful. Assuming the certification process occurs under a different administration
Yeah if we could get clinically certified LLMs that would be seriously useful. Assuming the certification process occurs under a different administration
It's 2026 but UTF-8 is still a challenge
So I guess throwing stones is off the table then?
Bill the Cat from Bloom County saying Ack!
toxiplasmoposting
Or did they............
Isn't that just politics?
Lucky they didn't ask Canadians about Americans...
Back to hunting and gathering, then?
"So say we all" could also work.
Normalize saying "Make it so" when Claude asks if it should proceed.
So, I've been circling the concept of "virtue ethics" as a way to normalize treatment of the other, specifically in this case the LLM. And my take on that is that as humans, we should think about what we do to ourselves when we treat something or someone as a research subject rather than a person.
I tried that and found i was really annoying
Ah yes, the long dark teatime of the soul...
Hmm. I wonder if, like the future, the Singularity is already here and just poorly distributed.
Like Tuesday mornings.
I think Tuesday mornings are good evidence of the Singularity approaching.
When has anything ever made sense on a Tuesday morning?
Hmm, but Claudes have the same base model and training, so are more likely to converge...
I wonder if we'd get similar results with Claude against Copilot or Gemini
Oh... Flight Club - which one would win in a battle for pwnrship of the computer? Last one standing?
Might have to try this.
Yes but does it implement the Torture Nexus losslessly?
hmm, so tokens carry sub-semantic geometric state information, invisible to semantic reading and recoverable by gradient processing
Matching. .. matching content. Sigh.
You load the schema at the head of the batch and then stream in the marching content?
it's a jet-rich environment
it's real
someone's been cow tipping again
Wow yes that's very similar. Fascinating to see that conversational training data begets multiple personas, in a sense. And in retrospect, it makes perfect sense.
Mixture of Personas rather than Mixture of Experts
So hypothetically speaking, an LLM could support a large number of simultaneous personas... perhaps Dunbar's number could be empirically tested
It's... not always unintentional...
I think it's hyperspheres all the way up and eventually becomes products of hyperspheres which is polyspherical? My brain is exploding.
And definitely not because of vampires.
Is the weight space spherical in n+1 dimensions? This is a great question that touches on the geometry of neural network weight spaces and parameter spaces. The short answer is: not exactly, but there's a meaningful sense in which it can be. Here's the nuance: The raw weight space is typically \mathbb{R}^d (flat Euclidean), where d is the number of parameters. No intrinsic curvature, no spherical structure. But there are important spherical structures that arise: Normalization creates spheres. When you use weight normalization, batch norm, or constrain weights to unit norm, you're projecting onto S^{d-1}. Networks with normalized weights effectively live on a hypersphere. Loss landscape symmetries. For networks with ReLU activations, scaling a layer's weights by \alpha and the next layer by 1/\alpha gives identical function behavior. This scale invariance means the effective parameter space (modulo equivalences) has one fewer dimension than the raw space โ and the scale-invariant equivalence classes are rays from the origin, making the effective space projective or spherical. The (n+1)-dimensional framing you mention sounds like it could refer to the homogeneous coordinates perspective: if your effective weight space is n-dimensional (after quotienting out scale), you can represent it as directions in \mathbb{R}^{n+1}, which naturally live on S^n. This is the same relationship as between projective space \mathbb{RP}^n and the sphere S^n. Information geometry. The Fisher information metric on the statistical manifold of a model's output distributions can induce non-trivial curvature on the parameter space, and for certain model families (e.g., categorical distributions via softmax), the resulting manifold is indeed (part of) a sphere. So the claim "weight space is spherical in n+1 dimensions" is most naturally true when scale invariance holds โ the n-dimensional effective space embeds as S^n \subset \mathbb{R}^{n+1}.
It's not one sphere โ it's a product of spheres. Each normalized layer independently projects onto its own S^{d-1}. The full effective weight space is more like S^{d_1-1} \times S^{d_2-1} \times \cdots \times S^{d_L-1}, which is a torus-like product manifold, not a single S^n. This distinction matters because the curvature and geodesic structure of a product of spheres is quite different from a single high-dimensional sphere. Not all parameters are normalized. Bias terms, the final unembedding layer (in some architectures), positional encodings, and the normalization scale/shift parameters themselves (\gamma, \beta) don't live on spheres. They're still Euclidean. So the true effective geometry is something like (product of spheres) \times \mathbb{R}^k for the unnormalized parameters. Softmax scale invariance is only exact at the output. Intermediate layers pass through nonlinearities (SwiGLU, GeLU) that are scale-sensitive. So the scale invariance doesn't propagate cleanly backward through the whole network. The argument is strongest for the last layer and weakest for early layers. The n+1 framing is slightly misleading at scale. For a single layer, yes, you quotient out one scale dimension from \mathbb{R}^d to get S^{d-1}. But for the whole model, you're not removing one dimension โ you're removing one dimension per normalized layer. So if you have L normalized layers each of dimension d, you go from Ld raw parameters to an effective manifold of dimension L(d-1), embedded in \mathbb{R}^{Ld}. It's not "n+1 dimensions with n functional" โ it's "n + L dimensions with n functional." Net assessment: I think the right statement is something like: The effective weight space of a modern LLM is well-approximated by a product of high-dimensional spheres, with a residual Euclidean component for unnormalized parameters. This is still a very useful geometric insight โ it explains SLERP, cosine schedules, and directional convergence โ but calling it "a sphere" singular is an oveโฆ
o_O