Muru Zhang's Avatar

Muru Zhang

@muruzhang

First-year NLP PhD @ USC | Intern @ TogetherAI | Prev. UW, AWS https://nanami18.github.io/

639
Followers
64
Following
2
Posts
12.11.2024
Joined
Posts Following

Latest posts by Muru Zhang @muruzhang

Great to be part of this project led by the amazing @hamishivi.bsky.social. The most fun (in retrospect) thing is to observe how the results start to shift as we scale up the candidate pool, evaluation suite, and selection size :) And eventually we find a simple method does the best!

04.03.2025 21:14 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

How well do data-selection methods work for instruction-tuning at scale?

Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!

More below โฌ‡๏ธ (1/8)

04.03.2025 17:10 ๐Ÿ‘ 13 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2

This is a great effort for the migration, thanks for putting it together! Can I be added to the list?

12.11.2024 22:23 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0