Werner Geyer (@wernergeyer)

Picard technology tip: Sometimes your chief engineer can build new systems that are better than your existing enterprise software.

19.10.2025 20:04 👍 112 🔁 16 💬 1 📌 0

❣️ Shout out to my amazing co-authors:
Rachel Ostrand, @wernergeyer.bsky.social , @keerthi166.bsky.social, Dennis Wei, and Justin Weisz!

If you'll be at AIES, I would love to connect and chat more about our work! 🙌

16.10.2025 11:01 👍 1 🔁 1 💬 0 📌 0

Picard management tip: Even without game-changing results, experimentation is time well spent.

26.09.2025 20:49 👍 113 🔁 22 💬 0 📌 1

6/ Try it out & explore more:
👉 GitHub: github.com/IBM/eval-ass...
👉 Demo: evalassist-evalassist.hf.space
👉 Project page: ibm.github.io/eval-assist/

25.09.2025 17:56 👍 0 🔁 0 💬 0 📌 0

5/ And we’re planning to bring several backend capabilities into the UI soon. Stay tuned 👀

25.09.2025 17:56 👍 0 🔁 0 💬 1 📌 0

eval-assist/backend/src/evalassist/judges at main · IBM/eval-assist EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refin...

4/ ⚙️ Backend updates
• Independent Judges module (no UI - see: github.com/IBM/eval-ass...)
• Unified Judge API
• Extensible: supports Unitxt, M-Prometheus & more
• Self-consistency: run judges multiple times
• In-context examples
• Multi-criteria evals w/roll-ups
• Custom prompts supported

25.09.2025 17:56 👍 0 🔁 0 💬 1 📌 0

3/ 🖥️ UI updates
• Export & import test data (CSV)
• More benchmarks: JudgeBench & BigGen, grouped by capabilities
• 50+ Unitxt () criteria via Unixt (www.unitxt.ai) catalog integration
• Export/import test cases in JSON
• Model provider connections can be tested before evals

25.09.2025 17:56 👍 0 🔁 0 💬 1 📌 0

2/ 📄 Paper @acmuist.bsky.social : EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy Preferences
By @dohyojin.bsky.social - presenting Wed 9:00–10:30 in “Managing Tasks.” session
👉 arxiv.org/pdf/2410.00873

25.09.2025 17:56 👍 2 🔁 0 💬 1 📌 0

EvalAssist EvalAssist simplifies LLM-as-a-Judge by supporting users in iteratively refining evaluation criteria in a web-based user experience.

1/ EvalAssist makes it easier to test, refine & share evaluation criteria for LLMs. ibm.github.io/eval-assist/
We’ve added powerful new features on both the UI and backend, plus we’ll be at UIST next week presenting our paper on task-specific evaluations & AI-assisted judgment strategies.

25.09.2025 17:56 👍 0 🔁 0 💬 1 📌 0

🚀 Excited to share some updates from EvalAssist, the open-source LLM-as-a-Judge framework we released a few months ago! 🧵

25.09.2025 17:56 👍 1 🔁 0 💬 1 📌 0

We've just extended the IUI Workshop deadline by one week to August 29.

Looking forward to your contributions!

21.08.2025 13:55 👍 2 🔁 0 💬 0 📌 0

Getting ready! Come visit us at the IBM booth @acl to learn about our latest Research. We have a number of super interesting demos lined up. research.ibm.com/events/acl-2...

28.07.2025 07:44 👍 0 🔁 0 💬 0 📌 0

We’re growing and going global! 🌍

CHIWORK 2025 is shaping up to be our biggest and most diverse edition yet. Thanks to everyone who submitted, reviewed, and supported us 💙

Can’t wait to see you in Amsterdam!

🔗 chiwork.org

#CHIWORK2025 #HCI #FutureOfWork

04.04.2025 10:59 👍 5 🔁 3 💬 0 📌 1

📢 Call for Workshop & Tutorial Proposals 📢
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.

iui.hosting.acm.org/2026/call-fo...

#CallForProposals #IUI2026 #HCI #AI

09.06.2025 14:39 👍 2 🔁 1 💬 0 📌 1

LLM-as-a-Judge Without the Headaches: EvalAssist Brings Structure and Simplicity to the Chaos of LLM Output Review | AI Alliance Evaluating AI model outputs at scale is a major challenge for teams using LLMs, especially when assessing nuanced qualities like politeness, fairness, and tone that traditional benchmarks miss. IBM Re...

📣 Today we open-sourced EvalAssist, a web-based tool that makes it super easy to develop criteria for llm judges. You can run this now locally and then scale up with notebooks using Unitxt. Check out the AI Alliance article to get the scoop:
thealliance.ai/blog/llm-as-...

16.06.2025 15:38 👍 5 🔁 3 💬 1 📌 1

Call for Workshop & Tutorial Proposals | IUI

📣 Call for Workshop & Tutorial Proposals 📣 #IUI2026 is looking forward to your contribution! Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026. 🚨 Proposal Deadlines: Aug 22 (Workshops) and Oct 17 (Tutorials)🚨 iui.hosting.acm.org/2026/call-fo...

05.06.2025 16:02 👍 2 🔁 2 💬 1 📌 0

Call for Workshop & Tutorial Proposals | IUI

📣 IUI 2026 Call for Workshops and Tutorials is live 📣

iui.acm.org/2026/call-fo...

Note that this year, submissions will be due August 22 earlier than previous years. Pls. spread the word! We had a fantastic workshop program in 2025 and I'm looking forward to an even better one in 2026 in Cyprus.

05.06.2025 15:09 👍 2 🔁 0 💬 0 📌 0

HAI-GEN 2025: 6th Workshop on Human-AI Co-Creation with Generative Models by Osnat Mokryn (University of Haifa, IL), Orit Shaer (Wellesley College, US), Werner Geyer (IBM Research, US), Mary Lou Maher (Computing…

We just published a summary the 6th workshop on Human-AI Co-Creation with Generative Models at IUI 2025 in March. This year's special topic, of course, AI agents and agency. Two of our sessions covered this topic and we had an exciting panel discussion. Check it out! medium.com/human-center...

06.05.2025 18:35 👍 1 🔁 0 💬 0 📌 0

Great work from our team @ IBM Research

29.04.2025 23:26 👍 1 🔁 0 💬 0 📌 0

Decolonial AI Alignment by Kush Varshney (IBM Research, US)

A summary of decolonial AI alignment in the Human-Centered AI publication on Medium. Thanks to @jweisz3.bsky.social for asking me to write it, and for editing the piece. medium.com/human-center...

08.04.2025 15:12 👍 5 🔁 2 💬 0 📌 0

DeepSeek-V3-0324, Gemini Canvas and GPT-4o image generation YouTube video by IBM Technology

I'm on the IBM Mixture of Experts podcast wearing a safety vest. We talk about all the new things in AI this week. I also connect to older work by IBM Fellows Irene Greif, Bob Dennard, Rolf Landauer, and Charlie Bennett and to Mauro Martino's new AI-generated film. www.youtube.com/watch?v=CgqH...

28.03.2025 13:10 👍 2 🔁 2 💬 0 📌 0

Granite Guardian tops third-party AI benchmark IBM’s collection of LLM guardrail models take six of the top 10 spots on the new GuardBench leaderboard.

Granite Guardian tops a new benchmark! research.ibm.com/blog/granite...

09.04.2025 19:51 👍 3 🔁 2 💬 0 📌 0

And the final product 😋

01.04.2025 15:40 👍 1 🔁 0 💬 0 📌 0

Asparagus time in Germany. This is an automated peeling machine. No AI 😀

31.03.2025 12:22 👍 0 🔁 0 💬 0 📌 0

All set up for demo time at IUI. We are showing a tool for GenAU-assisted hypotheses exploration. dl.acm.org/doi/10.1145/...

26.03.2025 15:39 👍 1 🔁 0 💬 0 📌 0

IBM Research on their way to IUI

22.03.2025 15:53 👍 0 🔁 0 💬 0 📌 0

What is AI vibe coding? It's all the rage but it's not for everyone - here's why Caution: Experience required. Vibe coding feels like magic, until your AI assistant starts overwriting your work.

Ah, and now there is a cool name for it :) www.zdnet.com/article/what...

Is there already a CHI paper about it? :)

18.03.2025 16:44 👍 1 🔁 0 💬 0 📌 0

We have two amazing keynotes this year at HAI-GEN 2025 to challenge our thinking on co-creative systems from an interaction perspective.

Hope to cu at IUI this year!

hai-gen.github.io/2025/program/

11.02.2025 15:38 👍 3 🔁 0 💬 0 📌 0

CALL FOR PAPERS – CSCW 2025

📣 The #CSCW2026 deadline (@acm-cscw.bsky.social) has been posted. Big change this year. There is **only one deadline** for 2026 and it is May 13, 2025. 📣

Please spread the word!
#CSCW #CHI #HCI #socialcomputing
cscw.acm.org/2025/index.p...

31.01.2025 13:51 👍 27 🔁 13 💬 2 📌 0

Looking forward seeing your papers!!!

23.01.2025 13:19 👍 5 🔁 0 💬 0 📌 0

Werner Geyer

Latest posts by Werner Geyer @wernergeyer