... still me ...
... still me ...
I have become radicalized
405B Base (bf16) will withstand the test of time and remain eternal (18 months)
so far the experience has been pretty good here but the default feeds are _terrible_. feels like its going to take a few weeks to whip these feeds into shape with mutes and "show less like these" plus lots of likes. Following feed is good but i need to follow a lot more people
theoretically they should be semantically similar / close in latent space but YMMV based on the model.
very interesting work and it reminds me a bit of this paper. Tokenizers and ROPE must die. after samplers, i am on to those next ...
arxiv.org/abs/2407.036...
good call. this isn't universal advice but it is my general advice for most people for most use cases. I have been very surprised with 1B, specifically with function calling and moderately complex coding tasks. punches well above its weight IMHO
i keep forgetting to include this cause i always assume people do this by default. Any time there is an exponent or a norm, you should be working in the highest practical precision
my old recommendation used to be run the largest model you can at Q4, but with L3.2 and Qwen2.5 that has changed. Generally, i now suggest run the largest 5T+ token trained model you can at bf16 (not fp16 unless that is how it was trained). L3.2 1B or Qwen 2.5 1.5B are good
the BigVision repo is my current reference impl for gemma and ViT. such an underrated repo @giffmana.bsky.social and team are doing the lord's work
github.com/google-resea...
github.com/google-resea...
now that people are paying attention again, here is your periodic reminder. Always run in bf16. always apply ROPE and attention softmax at float32 (as shown here)
github.com/xjdr-alt/ent...
So first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.
i am willing to be hurt again.
very solid list, but i am biased
i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)
Well this looks shockingly professional. I may have to put on a tie to post here