xjdr's Avatar

xjdr

@xjdr

hot takes, linear Algebra, JAX apologist, Raconteur

1,290
Followers
91
Following
15
Posts
22.11.2024
Joined
Posts Following

Latest posts by xjdr @xjdr

Post image

... still me ...

26.11.2024 22:13 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

I have become radicalized

26.11.2024 17:27 ๐Ÿ‘ 54 ๐Ÿ” 0 ๐Ÿ’ฌ 10 ๐Ÿ“Œ 0

405B Base (bf16) will withstand the test of time and remain eternal (18 months)

25.11.2024 07:14 ๐Ÿ‘ 10 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

so far the experience has been pretty good here but the default feeds are _terrible_. feels like its going to take a few weeks to whip these feeds into shape with mutes and "show less like these" plus lots of likes. Following feed is good but i need to follow a lot more people

25.11.2024 02:56 ๐Ÿ‘ 98 ๐Ÿ” 4 ๐Ÿ’ฌ 15 ๐Ÿ“Œ 0

theoretically they should be semantically similar / close in latent space but YMMV based on the model.

25.11.2024 02:53 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

very interesting work and it reminds me a bit of this paper. Tokenizers and ROPE must die. after samplers, i am on to those next ...
arxiv.org/abs/2407.036...

25.11.2024 02:20 ๐Ÿ‘ 78 ๐Ÿ” 12 ๐Ÿ’ฌ 9 ๐Ÿ“Œ 0

good call. this isn't universal advice but it is my general advice for most people for most use cases. I have been very surprised with 1B, specifically with function calling and moderately complex coding tasks. punches well above its weight IMHO

24.11.2024 20:41 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

i keep forgetting to include this cause i always assume people do this by default. Any time there is an exponent or a norm, you should be working in the highest practical precision

24.11.2024 20:05 ๐Ÿ‘ 25 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

my old recommendation used to be run the largest model you can at Q4, but with L3.2 and Qwen2.5 that has changed. Generally, i now suggest run the largest 5T+ token trained model you can at bf16 (not fp16 unless that is how it was trained). L3.2 1B or Qwen 2.5 1.5B are good

24.11.2024 19:55 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
big_vision/big_vision/models/ppp/gemma.py at main ยท google-research/big_vision Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more. - google-research/big_vision

the BigVision repo is my current reference impl for gemma and ViT. such an underrated repo @giffmana.bsky.social and team are doing the lord's work

github.com/google-resea...

github.com/google-resea...

24.11.2024 17:25 ๐Ÿ‘ 92 ๐Ÿ” 12 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 0
Post image

now that people are paying attention again, here is your periodic reminder. Always run in bf16. always apply ROPE and attention softmax at float32 (as shown here)

github.com/xjdr-alt/ent...

24.11.2024 17:23 ๐Ÿ‘ 78 ๐Ÿ” 7 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 2

So first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.

24.11.2024 16:55 ๐Ÿ‘ 63 ๐Ÿ” 17 ๐Ÿ’ฌ 10 ๐Ÿ“Œ 5

i am willing to be hurt again.

24.11.2024 17:13 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

very solid list, but i am biased

24.11.2024 17:08 ๐Ÿ‘ 8 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)

24.11.2024 17:08 ๐Ÿ‘ 36 ๐Ÿ” 1 ๐Ÿ’ฌ 8 ๐Ÿ“Œ 0

Well this looks shockingly professional. I may have to put on a tie to post here

22.11.2024 18:53 ๐Ÿ‘ 31 ๐Ÿ” 0 ๐Ÿ’ฌ 6 ๐Ÿ“Œ 0