Muru Zhang

@muruzhang

First-year NLP PhD @ USC | Intern @ TogetherAI | Prev. UW, AWS https://nanami18.github.io/

639
Followers 64
Following 2
Posts 12.11.2024
Joined

Posts Following

Latest posts by Muru Zhang @muruzhang

Great to be part of this project led by the amazing @hamishivi.bsky.social. The most fun (in retrospect) thing is to observe how the results start to shift as we scale up the candidate pool, evaluation suite, and selection size :) And eventually we find a simple method does the best!

04.03.2025 21:14 👍 2 🔁 1 💬 0 📌 0

How well do data-selection methods work for instruction-tuning at scale?

Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!

More below ⬇️ (1/8)

04.03.2025 17:10 👍 13 🔁 4 💬 1 📌 2

This is a great effort for the migration, thanks for putting it together! Can I be added to the list?

12.11.2024 22:23 👍 1 🔁 0 💬 1 📌 0