π’Thrilled to introduce ATLAS πΊοΈ: the largest multilingual scaling study to-dateβwe ran 774 exps (10M-8B params, 400+ languages) to answer:
π Is scaling diff by lang?
π§ββοΈ Can we model the curse of multilinguality?
βοΈ Pretrain vs finetune from checkpoint?
π X-lingual transfer scores across langs?
1/π§΅
28.10.2025 14:01
π 18
π 1
π¬ 1
π 1
Matformer introduces nested structure into the Transformer's FFN block & jointly trains all the submodels, enabling free extraction of hundred of accurate submodels for elastic inference
I will be at poster #2507 w/ my co-authors in East Exhibit Hall A-C at #NeurIPS2024 chatting about MatFormer and elastic models today at 4.30pm!
Come by, or reach out if you want to chat about pretraining, scaling laws or conditional computation!
arxiv.org/abs/2310.07707
11.12.2024 21:42
π 8
π 0
π¬ 0
π 0
Would love to be added!
11.12.2024 17:23
π 3
π 0
π¬ 1
π 0