Klara Janouskova (@klara-cz)

Multimodal Large Language Models as Image Classifiers

Nikita Kisel, Illia Volkov @klara-cz.bsky.social Jiri Matas

tl;dr: if you evaluate good (chatGPT) model on a dirty (ImageNet) test set, it is bad. Yes, ImageNet test is bad nowadays. +insights from labeling.
arxiv.org/abs/2603.065...

10.03.2026 10:56 👍 7 🔁 1 💬 1 📌 0

I am glad somebody has appreciated it! 🐈

I am not gonna lie, I tried to have my dog there at first, but despite ImageNet's over 100 classes being dog breeds, they still somehow managed not to squeeze the australian shepherd in.

09.03.2026 21:08 👍 2 🔁 0 💬 0 📌 0

To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈

09.03.2026 20:08 👍 4 🔁 3 💬 1 📌 0

Work with Nikita Kisel, Illia Volkov and Jiri Matas, to be presented at #CVPR26, findings!

09.03.2026 20:08 👍 1 🔁 0 💬 0 📌 0

🤗 Finally, we show that these models aren’t just affected by annotation quality; they can help fix it. In a controlled verification study, annotators integrated model predictions in roughly half of the difficult cases, suggesting MLLMs can be useful tools for large-scale dataset curation.

09.03.2026 20:08 👍 1 🔁 0 💬 1 📌 0

To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈

09.03.2026 20:08 👍 4 🔁 3 💬 1 📌 0

We show that small changes in evaluation protocol, like choice of distractors, output mapping, even image order, significantly impact accuracy.

⚠️ But there’s a deeper issue: the data. ImageNet contains a lot of label noise, so even a perfect eval. protocol may not give a meaningful result.

09.03.2026 20:08 👍 1 🔁 0 💬 1 📌 0

Let me introduce our new paper: Multimodal Large Language Models as Image Classifiers

❓ Multimodal LLMs are increasingly used for visual tasks, but evaluating their image classification ability has produced conflicting conclusions.

Link: arxiv.org/html/2603.06...

09.03.2026 20:08 👍 11 🔁 3 💬 2 📌 1

He totally does, he is getting more snuggly every day ☺️

01.03.2026 11:30 👍 2 🔁 0 💬 0 📌 0

Morning walks 🐾

01.03.2026 09:30 👍 13 🔁 0 💬 1 📌 0

It also really does feel like reviewer psychology since they have not explicitly pointed it out as the issue - not being able to run the experiment again with different framing but same reviewers is tough :D

25.02.2026 12:08 👍 2 🔁 0 💬 0 📌 0

When you re-read the introduction of your freshly rejected paper that was somewhat rushed before the deadline and you are like: Ok, this is why. 🥲

25.02.2026 10:19 👍 5 🔁 0 💬 1 📌 0

Team 2/2 rejected, with one suggested for the findings workshop.

I am a bit sad because I feel they were rejected for the wrong reasons + I am tired of getting BR rating with no suggestions for rebuttal, but I am much more into ECCV than CVPR this year anyway. 😁

Good luck with resubmission! 🍀

21.02.2026 12:20 👍 5 🔁 0 💬 1 📌 0

I feel like for the first time in my (short) reviewing career, I may have helped a (IMO of course) nice paper get accepted despite other reviewer(s).

21.02.2026 11:59 👍 5 🔁 0 💬 1 📌 0

1/n Attention, Please! 🚀

Our work “Revisiting Attentive Probing Through the Lens of Efficiency” has been accepted at #ICLR2026.

We introduce Efficient Probing (EP) — a lightweight, multi-query attentive probing method for frozen encoders.

Paper + code at the end 👇

20.02.2026 15:03 👍 11 🔁 4 💬 1 📌 1

I was starting to wonder what do I do with my time now

05.02.2026 21:05 👍 3 🔁 0 💬 1 📌 0

Oh ok, that is a different level of wrong than I thought 🥲

05.02.2026 10:45 👍 2 🔁 0 💬 1 📌 0

I think most benchmarks are pretty noisy; it is just that for some (say ImageNet :)), enough people actually looked at the images and noticed.
To be fair, data annotation is HARD. I do agree people should at least try to do a better job and be responsive, of course :)
bsky.app/profile/klar...

05.02.2026 10:35 👍 5 🔁 0 💬 1 📌 0

What a beautiful day to be done with all deadlines! ☃️

This was my WFH lunch break today, if it is not clear why I do not live in Prague 😁

30.01.2026 15:49 👍 2 🔁 0 💬 0 📌 0

25 % left, a few more nice bedtime readings for me. :)

12.01.2026 08:20 👍 1 🔁 0 💬 0 📌 0

JAZZ HANDS!

I am currently at
R: Be very ready.
G: I am very ready. Be calm.
R: Am calm. You be calm.
G: NO YOU BE-

10.01.2026 22:25 👍 2 🔁 0 💬 1 📌 0

Having stopped about midway through Project Hail Mary and forbidding myself to resume until I finish my CVPR reviews was a pretty good motivation.

Also, if you have not read it yet, but you think you might enjoy it, go for it, you are in for a treat (and the movie is coming)! 🤓

10.01.2026 19:25 👍 6 🔁 0 💬 1 📌 0

I should have added it looks like this (a few lucky days a year 🤣), like today ☃️☺️

09.01.2026 08:40 👍 1 🔁 0 💬 0 📌 0

True, not many positions come with free canistherapy 🐶 (let’s ignore that he’s a teenager now). I hope my profile pic makes up for the regrettable omission and is self-explanatory!

08.01.2026 20:23 👍 2 🔁 0 💬 0 📌 0

Imagine this: Prague 🏰, a top CV lab, learning all the things we work on at VRG, regular cake at coffee breaks (hope you are not on a diet, but we also have a free gym on site) 🍰, excellent filter coffee ☕, and - last but not least - working with Giorgos.

There’s a postdoc opening. Don’t miss out 🙂

08.01.2026 20:15 👍 6 🔁 1 💬 2 📌 0

Recently, Illia received an award for the research he has been doing with us.

Most people would think about sth to buy for themselves. He donated it all to support his home. 🇺🇦

#DoNotForget

27.12.2025 14:28 👍 6 🔁 0 💬 0 📌 0

I have been getting pretty good and diverse assignments (my work is a bit "all over the place") before, max 1 paper on X per conf :) But now it is 3 papers on my msc topic X and before, I got my bsc stuff for WACV.
It accumulated and I had to vent a bit but it is not enough to make me grumpy (yet) 😁

18.12.2025 21:32 👍 0 🔁 0 💬 1 📌 0

I am fine reviewing a paper on it here and there, just not the whole batch like this cvpr 🥹 But I have not though of this, might use it next time, thanks!

18.12.2025 12:17 👍 2 🔁 0 💬 1 📌 0

The curse of doing research as an undergrad: publish one paper on topic X, then spend your entire PhD reviewing papers on X. There’s a reason I changed topics. 🫠

At least one paper in my batch actually looks very interesting though :)

18.12.2025 09:15 👍 5 🔁 0 💬 1 📌 0

⏰📍: Tomorrow, 11 am, Hadfield Hall - come say hi to Nikita presenting!

23.11.2025 18:49 👍 0 🔁 0 💬 0 📌 0

Klara Janouskova

Latest posts by Klara Janouskova @klara-cz