Joshua Ong (@jong21) — bluesky.baby

MMLU-Redux Poster at NAACL 2025

MMLU-Redux just touched down at #NAACL2025! 🎉
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅
If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

02.05.2025 13:00 👍 17 🔁 11 💬 0 📌 0

Thanks @nolovedeeplearning.bsky.social for the picture!!! 🥰

06.12.2024 21:54 👍 20 🔁 3 💬 1 📌 1

Very cool work! 👏🚀 Unfortunately, errors in the original dataset will propagate to all new languages 😕

We investigated the issue of existing errors in the original MMLU in
arxiv.org/abs/2406.04127

@aryopg.bsky.social @neuralnoise.com

06.12.2024 13:57 👍 4 🔁 2 💬 0 📌 1

For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊

06.12.2024 09:26 👍 15 🔁 4 💬 1 📌 0

Super Cool work from Cohere for AI! 🎉 However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?

06.12.2024 09:38 👍 9 🔁 3 💬 1 📌 0

Sohee (@soheeyang.bsky.social) in the house! 🚀🚀🚀

05.12.2024 14:38 👍 9 🔁 1 💬 0 📌 0

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

26.11.2024 20:51 👍 151 🔁 36 💬 5 📌 12

This papers' findings about testing LLMs on NLI aligns with many of personal thoughts:

1) NLI remains a difficult task for LLMs
2) Having more few-shot examples is helpful (in my view, helping LLMs better understand class boundaries)
3) Incorrect predictions are often a result of ambiguous labels

24.11.2024 16:38 👍 27 🔁 3 💬 1 📌 0

Hey John! Thanks for reaching out—I’ve sent you a DM to discuss this further!

24.11.2024 22:25 👍 0 🔁 0 💬 1 📌 0

rebuttal template

Since friends are doing NAACL / ICLR rebuttals, sharing my rebuttal template.
It works for me because it allows me to visually break down comments across reviewers into common themes, things that I can easily address v those that I can't, and also filter across these.

You all've got this!!!

23.11.2024 16:07 👍 16 🔁 5 💬 1 📌 0

Hii I’d love to join as well!!!🙋🏼‍♀️

24.11.2024 03:48 👍 0 🔁 0 💬 0 📌 0

Hii I’d love to join as well!!

24.11.2024 03:46 👍 1 🔁 0 💬 0 📌 0

Check out our CoMAT: Chain of Mathematically Annotated Thought, which improves mathematical reasoning by converting mathematical questions into structured symbolic representations and performing step-by-step reasoning🎉 works on various languages and challenging benchmarks

arxiv.org/pdf/2410.103...

20.11.2024 15:29 👍 0 🔁 0 💬 1 📌 0

The main question about the current LLM “reasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.

19.11.2024 05:34 👍 24 🔁 3 💬 5 📌 1

Thanksss!!!!!

20.11.2024 14:50 👍 1 🔁 0 💬 0 📌 0

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

19.11.2024 16:30 👍 161 🔁 39 💬 6 📌 8

Hi I’d love to be added as well!🙋🏼‍♀️

20.11.2024 13:40 👍 0 🔁 0 💬 1 📌 0

Hey, I’m available! However, I can’t send you a dm since it’s restricted to followers. If you could send me a message instead, that’d be great!

20.11.2024 13:40 👍 0 🔁 0 💬 0 📌 0

I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! 🚀
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! 👋

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

18.11.2024 13:48 👍 11 🔁 7 💬 0 📌 0

dm-ed you!

20.11.2024 00:43 👍 1 🔁 0 💬 0 📌 0

Added! Thanks!!

18.11.2024 11:04 👍 0 🔁 0 💬 0 📌 0

I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.

Let me know if I missed you!
go.bsky.app/RMJ8q3i

11.11.2024 15:27 👍 92 🔁 36 💬 16 📌 2

Hi I would love to be added as well!!

18.11.2024 09:33 👍 1 🔁 0 💬 1 📌 0

Hi, I would love to be added as well!

18.11.2024 09:31 👍 1 🔁 0 💬 0 📌 0

Hi, I’d love to be added as well!

18.11.2024 09:26 👍 1 🔁 0 💬 0 📌 0

Hi, I’d love to be added, thanks!!!

18.11.2024 08:50 👍 0 🔁 0 💬 0 📌 0

Joshua Ong

Latest posts by Joshua Ong @jong21