they already use very optimized human language, i'd be very surprised if they dropped human language-like structure entirely
they already use very optimized human language, i'd be very surprised if they dropped human language-like structure entirely
this seems to clearly not be the kind of thing that emerges from training for reasoning in the current architecture but like the kind of thing that requires a different architecture
what does this mean exactly? like they'll think for many forward passes without generating tokens?
i would think 2% time horizons are much more than 3x the length of 50% time horizons? based on this image, it's roughly a factor of 4^6 away from 50%, so 5h vs 20k hours? which sounds silly now that i mention it, idk why it's wrong though
more like 7 months to 4.5months
if you want to do this you need to also take the 50% time horizons in july and september and all the other months and similarly "rescale them" to your attempted "correction". otherwise you get stupidly short doubling times that have no connection whatsoever to reality
in july, the METR TH SOTA was Grok 4 with 1h49min. if Math TH were 7h (I doubt it but wtv), that gives you a rough 3.5x multiple. assuming similar doubling time, that would mean math time horizons are currently ~17.5h. ofc maybe the doubling times are very different (I think this). so yeah, tricky
I would be very careful to compare METR coding time horizons to math time horizons, this doesn't make sense in this case at all. you're using it as a justification to dismiss the 4.9h current SOTA and consider a larger imagined number to be more representative, which is crazy
the labs don't have enough compute to run Claude Max subscriptions for every knowledge worker. Not even close in fact
not 50%
I think another important part is "that market participants want to participate it"