Sneha Kudugunta @NeurIPS2024 (@snehaark)

📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵

28.10.2025 14:01 👍 18 🔁 1 💬 1 📌 1

Matformer introduces nested structure into the Transformer's FFN block & jointly trains all the submodels, enabling free extraction of hundred of accurate submodels for elastic inference

I will be at poster #2507 w/ my co-authors in East Exhibit Hall A-C at #NeurIPS2024 chatting about MatFormer and elastic models today at 4.30pm!

Come by, or reach out if you want to chat about pretraining, scaling laws or conditional computation!

arxiv.org/abs/2310.07707

11.12.2024 21:42 👍 8 🔁 0 💬 0 📌 0

Would love to be added!

11.12.2024 17:23 👍 3 🔁 0 💬 1 📌 0

Sneha Kudugunta @NeurIPS2024

Latest posts by Sneha Kudugunta @NeurIPS2024 @snehaark