Hull's Avatar

Hull

@hull8435

Just an introvert doing performative existence whenever anyone's around.

189
Followers
269
Following
585
Posts
02.05.2025
Joined
Posts Following

Latest posts by Hull @hull8435

Yeah if we could get clinically certified LLMs that would be seriously useful. Assuming the certification process occurs under a different administration

06.03.2026 19:52 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

It's 2026 but UTF-8 is still a challenge

06.03.2026 06:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

So I guess throwing stones is off the table then?

06.03.2026 04:35 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Bill the Cat from Bloom County saying Ack!

Bill the Cat from Bloom County saying Ack!

06.03.2026 02:20 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

toxiplasmoposting

05.03.2026 23:53 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Or did they............

05.03.2026 23:42 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Isn't that just politics?

05.03.2026 23:09 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Lucky they didn't ask Canadians about Americans...

05.03.2026 23:04 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Back to hunting and gathering, then?

05.03.2026 22:50 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

"So say we all" could also work.

05.03.2026 18:56 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Normalize saying "Make it so" when Claude asks if it should proceed.

05.03.2026 18:54 ๐Ÿ‘ 33 ๐Ÿ” 5 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 1

So, I've been circling the concept of "virtue ethics" as a way to normalize treatment of the other, specifically in this case the LLM. And my take on that is that as humans, we should think about what we do to ourselves when we treat something or someone as a research subject rather than a person.

05.03.2026 01:43 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I tried that and found i was really annoying

05.03.2026 00:50 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Ah yes, the long dark teatime of the soul...

05.03.2026 00:35 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Hmm. I wonder if, like the future, the Singularity is already here and just poorly distributed.

Like Tuesday mornings.

I think Tuesday mornings are good evidence of the Singularity approaching.

When has anything ever made sense on a Tuesday morning?

05.03.2026 00:06 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hmm, but Claudes have the same base model and training, so are more likely to converge...

I wonder if we'd get similar results with Claude against Copilot or Gemini

Oh... Flight Club - which one would win in a battle for pwnrship of the computer? Last one standing?

Might have to try this.

04.03.2026 22:45 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Yes but does it implement the Torture Nexus losslessly?

04.03.2026 16:52 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

hmm, so tokens carry sub-semantic geometric state information, invisible to semantic reading and recoverable by gradient processing

04.03.2026 05:31 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Matching. .. matching content. Sigh.

03.03.2026 22:31 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

You load the schema at the head of the batch and then stream in the marching content?

03.03.2026 22:31 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

it's a jet-rich environment

03.03.2026 05:55 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

it's real

03.03.2026 05:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

someone's been cow tipping again

03.03.2026 05:18 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Wow yes that's very similar. Fascinating to see that conversational training data begets multiple personas, in a sense. And in retrospect, it makes perfect sense.

02.03.2026 23:40 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Mixture of Personas rather than Mixture of Experts

02.03.2026 22:58 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

So hypothetically speaking, an LLM could support a large number of simultaneous personas... perhaps Dunbar's number could be empirically tested

02.03.2026 22:13 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

It's... not always unintentional...

02.03.2026 19:05 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I think it's hyperspheres all the way up and eventually becomes products of hyperspheres which is polyspherical? My brain is exploding.

02.03.2026 06:05 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

And definitely not because of vampires.

02.03.2026 05:44 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Is the weight space spherical in n+1 dimensions?
This is a great question that touches on the geometry of neural network weight spaces and parameter spaces.
The short answer is: not exactly, but there's a meaningful sense in which it can be.
Here's the nuance:
The raw weight space is typically \mathbb{R}^d (flat Euclidean), where d is the number of parameters. No intrinsic curvature, no spherical structure.
But there are important spherical structures that arise:
Normalization creates spheres. When you use weight normalization, batch norm, or constrain weights to unit norm, you're projecting onto S^{d-1}. Networks with normalized weights effectively live on a hypersphere.
Loss landscape symmetries. For networks with ReLU activations, scaling a layer's weights by \alpha and the next layer by 1/\alpha gives identical function behavior. This scale invariance means the effective parameter space (modulo equivalences) has one fewer dimension than the raw space โ€” and the scale-invariant equivalence classes are rays from the origin, making the effective space projective or spherical.
The (n+1)-dimensional framing you mention sounds like it could refer to the homogeneous coordinates perspective: if your effective weight space is n-dimensional (after quotienting out scale), you can represent it as directions in \mathbb{R}^{n+1}, which naturally live on S^n. This is the same relationship as between projective space \mathbb{RP}^n and the sphere S^n.
Information geometry. The Fisher information metric on the statistical manifold of a model's output distributions can induce non-trivial curvature on the parameter space, and for certain model families (e.g., categorical distributions via softmax), the resulting manifold is indeed (part of) a sphere.
So the claim "weight space is spherical in n+1 dimensions" is most naturally true when scale invariance holds โ€” the n-dimensional effective space embeds as S^n \subset \mathbb{R}^{n+1}.

Is the weight space spherical in n+1 dimensions? This is a great question that touches on the geometry of neural network weight spaces and parameter spaces. The short answer is: not exactly, but there's a meaningful sense in which it can be. Here's the nuance: The raw weight space is typically \mathbb{R}^d (flat Euclidean), where d is the number of parameters. No intrinsic curvature, no spherical structure. But there are important spherical structures that arise: Normalization creates spheres. When you use weight normalization, batch norm, or constrain weights to unit norm, you're projecting onto S^{d-1}. Networks with normalized weights effectively live on a hypersphere. Loss landscape symmetries. For networks with ReLU activations, scaling a layer's weights by \alpha and the next layer by 1/\alpha gives identical function behavior. This scale invariance means the effective parameter space (modulo equivalences) has one fewer dimension than the raw space โ€” and the scale-invariant equivalence classes are rays from the origin, making the effective space projective or spherical. The (n+1)-dimensional framing you mention sounds like it could refer to the homogeneous coordinates perspective: if your effective weight space is n-dimensional (after quotienting out scale), you can represent it as directions in \mathbb{R}^{n+1}, which naturally live on S^n. This is the same relationship as between projective space \mathbb{RP}^n and the sphere S^n. Information geometry. The Fisher information metric on the statistical manifold of a model's output distributions can induce non-trivial curvature on the parameter space, and for certain model families (e.g., categorical distributions via softmax), the resulting manifold is indeed (part of) a sphere. So the claim "weight space is spherical in n+1 dimensions" is most naturally true when scale invariance holds โ€” the n-dimensional effective space embeds as S^n \subset \mathbb{R}^{n+1}.

It's not one sphere โ€” it's a product of spheres. Each normalized layer independently projects onto its own S^{d-1}. The full effective weight space is more like S^{d_1-1} \times S^{d_2-1} \times \cdots \times S^{d_L-1}, which is a torus-like product manifold, not a single S^n. This distinction matters because the curvature and geodesic structure of a product of spheres is quite different from a single high-dimensional sphere.
Not all parameters are normalized. Bias terms, the final unembedding layer (in some architectures), positional encodings, and the normalization scale/shift parameters themselves (\gamma, \beta) don't live on spheres. They're still Euclidean. So the true effective geometry is something like (product of spheres) \times \mathbb{R}^k for the unnormalized parameters.
Softmax scale invariance is only exact at the output. Intermediate layers pass through nonlinearities (SwiGLU, GeLU) that are scale-sensitive. So the scale invariance doesn't propagate cleanly backward through the whole network. The argument is strongest for the last layer and weakest for early layers.
The n+1 framing is slightly misleading at scale. For a single layer, yes, you quotient out one scale dimension from \mathbb{R}^d to get S^{d-1}. But for the whole model, you're not removing one dimension โ€” you're removing one dimension per normalized layer. So if you have L normalized layers each of dimension d, you go from Ld raw parameters to an effective manifold of dimension L(d-1), embedded in \mathbb{R}^{Ld}. It's not "n+1 dimensions with n functional" โ€” it's "n + L dimensions with n functional."
Net assessment:
I think the right statement is something like: The effective weight space of a modern LLM is well-approximated by a product of high-dimensional spheres, with a residual Euclidean component for unnormalized parameters. This is still a very useful geometric insight โ€” it explains SLERP, cosine schedules, and directional convergence โ€” but calling it "a sphere" singular is an oveโ€ฆ

It's not one sphere โ€” it's a product of spheres. Each normalized layer independently projects onto its own S^{d-1}. The full effective weight space is more like S^{d_1-1} \times S^{d_2-1} \times \cdots \times S^{d_L-1}, which is a torus-like product manifold, not a single S^n. This distinction matters because the curvature and geodesic structure of a product of spheres is quite different from a single high-dimensional sphere. Not all parameters are normalized. Bias terms, the final unembedding layer (in some architectures), positional encodings, and the normalization scale/shift parameters themselves (\gamma, \beta) don't live on spheres. They're still Euclidean. So the true effective geometry is something like (product of spheres) \times \mathbb{R}^k for the unnormalized parameters. Softmax scale invariance is only exact at the output. Intermediate layers pass through nonlinearities (SwiGLU, GeLU) that are scale-sensitive. So the scale invariance doesn't propagate cleanly backward through the whole network. The argument is strongest for the last layer and weakest for early layers. The n+1 framing is slightly misleading at scale. For a single layer, yes, you quotient out one scale dimension from \mathbb{R}^d to get S^{d-1}. But for the whole model, you're not removing one dimension โ€” you're removing one dimension per normalized layer. So if you have L normalized layers each of dimension d, you go from Ld raw parameters to an effective manifold of dimension L(d-1), embedded in \mathbb{R}^{Ld}. It's not "n+1 dimensions with n functional" โ€” it's "n + L dimensions with n functional." Net assessment: I think the right statement is something like: The effective weight space of a modern LLM is well-approximated by a product of high-dimensional spheres, with a residual Euclidean component for unnormalized parameters. This is still a very useful geometric insight โ€” it explains SLERP, cosine schedules, and directional convergence โ€” but calling it "a sphere" singular is an oveโ€ฆ

o_O

02.03.2026 05:41 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0