you can check this yourself on gpt-oss 20b with High reasoning in the prompt vs low reasoning in the prompt and any entropy reduction method. Or if you have access to an 80 gig card, ablate the 120b.
you can check this yourself on gpt-oss 20b with High reasoning in the prompt vs low reasoning in the prompt and any entropy reduction method. Or if you have access to an 80 gig card, ablate the 120b.
now that i have sold out and started working on these: it is because the big labs figured out local entropy reduction techniques are very effective, and they aggressively tune that knob
completely unrelatedly, i am now ~fully convinced that there isn't a single real world smooth mapping that you can't capture by diffusing the correct amount in the correct space
blog.google/technology/g...
at the same time, different channels will have different overall power spectra (that a full rank representation preserves) and so good latents must be doing some sort of spatial mixing directly, and the diffusion models must untangle that and step *down* in dimensionality while increasing dof
because any information noised in the forward process cannot be seen later, these models always encode a hierarchical series of representations. but latent space is much closer to full rank than the target data manifold (a perfect one would be exactly full rank)
there is. in the continuous limit the models learn the target score of the conditional distribution. but the forward process is a gaussian perturbation kernel so the step between any two diffusion times is a white noise, so high frequency modes must drop (exponentially) faster
lmao that these might print
this is easily one of the top 3 worst trades in nba history. david stern is furiously fighting his way out of hell to stop this
shoutout to deepseek, showing you can just bolt on CoT with direct rl if your base model is good enough
california basically needs to remove its entire regulatory state at this point or the people are going to elect democrat hitler
hard to say with insurance its very regulated and im not an insurance guy. the broad issues are fraud and states making it unprofitable to service. definitely more parametric structures in the policies, but idk if those are even legal to offer to consumers (mb bypasses california's idiot laws?)
there are already hurricane binary options, but there are some otc parametric structures (so like wind pressure, rainfall). i'm sure somebody has something similar for fire, but that is still otc. the big volume rn are temperature based contracts (LNG hedge)
unrelated: you see this paper openreview.net/pdf?id=gojL6...
if this works on fluid dynamics im gonna lose my shit
resolving to reply to more posts with lol in 2025
this person is going to the reeducation camps when i take control
i thought about doing something like this for fusion operations with triton or MLIR, but i think that's actually just a full phd topic of work because i'd need to develop some sort of proof engine for it
some lab needs to give me 50000 h200s so i can implement an implicit runge kutta token sampler that costs 3.5 million dollars per inference run and outputs "i don't feel like doing that right now" 50% of the time
if your children don't venerate Urkel thought theyre ngmi
yeah it's basically greenfield and its the sort of problem where throwing money into a furnace gets you better solutions for a while
not going to speak for him, but at least in terms of "make my llm bigger and deeper" it's unlikely that going from 600bn to 6T model size with autoregressive LLMs gets you even a 20% better model
there's a lot of room in inference compute though. imo mindless scaling isn't dead for at least 5 years
its also a pretty good book! not as a reference manual, but a good introduction
Calculus of Variations and Optimal Control Theory: A Concise Introduction by Daniel Liberzon
i will stop flaming you when you read this book
lol
genuinely shocked you didnt have it already lmao. my family loves making fun of me because i can pack all my material possessions into 4 boxes and move within 4 hours, and i still had one of their dutch ovens
o3 arc just being optimal control style value iteration over token trajectories is a really funny way to blow up the agi foom cranks though
the only mistake i made by being stochastic control theory brained was not leaning into it 50x harder
i know i have insane bay area brain when im looking at a 30y 800k 7% and thinking 'huh that's reasonable'
they cant really even install cuda they just pip install torch, huggingface and pray the installation doesnt detonate
not mine im built different
anthropic does more and longer multi round in both training and rlhf