Eliya Habba

@eliyahabba

PhD student at Hebrew University #HebrewU #NLP

56
Followers 165
Following 7
Posts 21.11.2024
Joined

Posts Following

Latest posts by Eliya Habba @eliyahabba

Let’s build a more robust foundation for LLM evaluation!

A collaboration from @hebrewuniversity.bsky.social @nlphuji.bsky.social @IBMResearch and more:

@yperlitz.bsky.social @lchoshen.bsky.social @gabistanovsky.bsky.social

17.03.2025 14:43 👍 5 🔁 0 💬 0 📌 0

3. Some instances are consistently easy or hard across ALL prompts, no matter how you prompt: models either always succeed or consistently fail.

17.03.2025 14:39 👍 1 🔁 0 💬 1 📌 0

2. Selecting prompt characteristics (e.g., phrasing, enumerators) based on past examples helps efficiently find optimal prompts.

17.03.2025 14:38 👍 1 🔁 0 💬 1 📌 0

Key findings from 🕊️ DOVE:

1. Prompt sensitivity is HUGE! Performance varies dramatically with small changes (e. g. ➡ OLMo’s accuracy on HellaSwag ranges from 1% to 99%, simply by changing prompt elements like phrasing, enumerators, and answer order).

17.03.2025 14:38 👍 1 🔁 0 💬 1 📌 0

Goal: democratize LLM evaluation research and build meaningful, generalizable methods.

Talk to us about data you'd like to contribute or request evaluations you want to see added to 🕊️ DOVE!

17.03.2025 14:38 👍 2 🔁 0 💬 1 📌 0

Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!

17.03.2025 14:37 👍 11 🔁 3 💬 1 📌 2

🌍 AI is changing the world. Is AI regulation on the right track? 🤔

While regulators rely on benchmarking 📊, we show why it cannot guarantee AI behavior:
arxiv.org/pdf/2501.15693

Excited about this multidisciplinary collaboration!
@gabistanovsky.bsky.social,
@rkeydar.bsky.social , Gadi Perl

03.02.2025 09:00 👍 0 🔁 0 💬 0 📌 0