Picard technology tip: Sometimes your chief engineer can build new systems that are better than your existing enterprise software.
Picard technology tip: Sometimes your chief engineer can build new systems that are better than your existing enterprise software.
β£οΈ Shout out to my amazing co-authors:
Rachel Ostrand, @wernergeyer.bsky.social , @keerthi166.bsky.social, Dennis Wei, and Justin Weisz!
If you'll be at AIES, I would love to connect and chat more about our work! π
Picard management tip: Even without game-changing results, experimentation is time well spent.
6/ Try it out & explore more:
π GitHub: github.com/IBM/eval-ass...
π Demo: evalassist-evalassist.hf.space
π Project page: ibm.github.io/eval-assist/
5/ And weβre planning to bring several backend capabilities into the UI soon. Stay tuned π
4/ βοΈ Backend updates
β’ Independent Judges module (no UI - see: github.com/IBM/eval-ass...)
β’ Unified Judge API
β’ Extensible: supports Unitxt, M-Prometheus & more
β’ Self-consistency: run judges multiple times
β’ In-context examples
β’ Multi-criteria evals w/roll-ups
β’ Custom prompts supported
3/ π₯οΈ UI updates
β’ Export & import test data (CSV)
β’ More benchmarks: JudgeBench & BigGen, grouped by capabilities
β’ 50+ Unitxt () criteria via Unixt (www.unitxt.ai) catalog integration
β’ Export/import test cases in JSON
β’ Model provider connections can be tested before evals
2/ π Paper @acmuist.bsky.social : EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy Preferences
By @dohyojin.bsky.social - presenting Wed 9:00β10:30 in βManaging Tasks.β session
π arxiv.org/pdf/2410.00873
1/ EvalAssist makes it easier to test, refine & share evaluation criteria for LLMs. ibm.github.io/eval-assist/
Weβve added powerful new features on both the UI and backend, plus weβll be at UIST next week presenting our paper on task-specific evaluations & AI-assisted judgment strategies.
π Excited to share some updates from EvalAssist, the open-source LLM-as-a-Judge framework we released a few months ago! π§΅
We've just extended the IUI Workshop deadline by one week to August 29.
Looking forward to your contributions!
Getting ready! Come visit us at the IBM booth @acl to learn about our latest Research. We have a number of super interesting demos lined up. research.ibm.com/events/acl-2...
Weβre growing and going global! π
CHIWORK 2025 is shaping up to be our biggest and most diverse edition yet. Thanks to everyone who submitted, reviewed, and supported us π
Canβt wait to see you in Amsterdam!
π chiwork.org
#CHIWORK2025 #HCI #FutureOfWork
π’ Call for Workshop & Tutorial Proposals π’
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.
iui.hosting.acm.org/2026/call-fo...
#CallForProposals #IUI2026 #HCI #AI
π£ Today we open-sourced EvalAssist, a web-based tool that makes it super easy to develop criteria for llm judges. You can run this now locally and then scale up with notebooks using Unitxt. Check out the AI Alliance article to get the scoop:
thealliance.ai/blog/llm-as-...
π£ Call for Workshop & Tutorial Proposals π£ #IUI2026 is looking forward to your contribution! Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026. π¨ Proposal Deadlines: Aug 22 (Workshops) and Oct 17 (Tutorials)π¨ iui.hosting.acm.org/2026/call-fo...
π£ IUI 2026 Call for Workshops and Tutorials is live π£
iui.acm.org/2026/call-fo...
Note that this year, submissions will be due August 22 earlier than previous years. Pls. spread the word! We had a fantastic workshop program in 2025 and I'm looking forward to an even better one in 2026 in Cyprus.
We just published a summary the 6th workshop on Human-AI Co-Creation with Generative Models at IUI 2025 in March. This year's special topic, of course, AI agents and agency. Two of our sessions covered this topic and we had an exciting panel discussion. Check it out! medium.com/human-center...
Great work from our team @ IBM Research
A summary of decolonial AI alignment in the Human-Centered AI publication on Medium. Thanks to @jweisz3.bsky.social for asking me to write it, and for editing the piece. medium.com/human-center...
I'm on the IBM Mixture of Experts podcast wearing a safety vest. We talk about all the new things in AI this week. I also connect to older work by IBM Fellows Irene Greif, Bob Dennard, Rolf Landauer, and Charlie Bennett and to Mauro Martino's new AI-generated film. www.youtube.com/watch?v=CgqH...
Granite Guardian tops a new benchmark! research.ibm.com/blog/granite...
And the final product π
Asparagus time in Germany. This is an automated peeling machine. No AI π
All set up for demo time at IUI. We are showing a tool for GenAU-assisted hypotheses exploration. dl.acm.org/doi/10.1145/...
IBM Research on their way to IUI
Ah, and now there is a cool name for it :) www.zdnet.com/article/what...
Is there already a CHI paper about it? :)
We have two amazing keynotes this year at HAI-GEN 2025 to challenge our thinking on co-creative systems from an interaction perspective.
Hope to cu at IUI this year!
hai-gen.github.io/2025/program/
π£ The #CSCW2026 deadline (@acm-cscw.bsky.social) has been posted. Big change this year. There is **only one deadline** for 2026 and it is May 13, 2025. π£
Please spread the word!
#CSCW #CHI #HCI #socialcomputing
cscw.acm.org/2025/index.p...
Looking forward seeing your papers!!!