Kevin Miller (@kjmiller013)

🎶 and there are no cats in America, and the streets are paved with cheese 🎵

05.11.2025 11:35 👍 1 🔁 0 💬 0 📌 0

glad to hear she takes after her great-grandmother

16.10.2025 11:27 👍 1 🔁 0 💬 0 📌 0

Our method, ✨SPARC✨, significantly boosts performance on three different multilabel recognition datasets and nine different CLIP backbones and complements the strengths of existing white-box and training-based methods. Looking forward to presenting it at CVPR!

10.06.2025 04:05 👍 1 🔁 0 💬 0 📌 0

We also find that CLIP scores are impacted by image- and prompt-level bias. Simple standardization is surprisingly effective at removing these biases and boosting multilabel recognition performance.

10.06.2025 04:04 👍 0 🔁 0 💬 1 📌 0

We find that the second-highest score provides a better signal, and in general we get our best results by adaptively fusing all of the ranks using the direction of maximum variance.

10.06.2025 04:04 👍 0 🔁 0 💬 1 📌 0

How should we use these “compound” prompts? A natural choice would be to use the highest-scoring one, as it is likely the most descriptive. However, we find that this approach leads to false positives due to the “OR-gate” nature of CLIP scores.

10.06.2025 04:00 👍 0 🔁 0 💬 1 📌 0

Our question: How can we make VLMs better at multilabel recognition, without need for training or access to VLM internals?

Idea: Make each class’s prompt more descriptive by pairing with classes that tend to co-occur. E.g., instead of “cat”, try “cat and dog”, “cat and bed”, etc.

10.06.2025 04:00 👍 0 🔁 0 💬 1 📌 0

GitHub - kjmillerCURIS/SPARC: SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models - kjmillerCURIS/SPARC

Looking forward to presenting our paper “✨SPARC✨: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models” at #CVPR2025 this Friday!

Checkout our ✨code + paper + poster✨: github.com/kjmillerCURI...

10.06.2025 03:59 👍 0 🔁 0 💬 1 📌 0

Our method, ✨SPARC✨, significantly boosts performance on three different multilabel recognition datasets and nine different CLIP backbones and complements the strengths of existing white-box and training-based methods. We look forward to presenting our work at #CVPR2025 in June.

17.03.2025 05:16 👍 0 🔁 0 💬 0 📌 0

We also find that CLIP scores are impacted by image- and prompt-level bias. Simple standardization is surprisingly effective at removing these biases and boosting multilabel recognition performance.

17.03.2025 05:14 👍 0 🔁 0 💬 1 📌 0

We find that the second-highest score provides a better signal, and in general we get our best results by adaptively fusing all of the ranks using the direction of maximum variance.

17.03.2025 05:12 👍 0 🔁 0 💬 1 📌 0

A diagram with two example images, illustrating how using the highest score can lead to false-positives, while using the second-highest score can mitigate this problem.

How should we use these “compound” prompts? A natural choice would be to use the highest-scoring one, as it is likely the most descriptive. However, we find that this approach leads to false positives due to the “OR-gate” nature of CLIP scores.

17.03.2025 05:12 👍 0 🔁 0 💬 1 📌 0

Our question: How can we make VLMs better at multilabel recognition, without need for training or access to VLM internals?

Idea: Make each class’s prompt more descriptive by pairing with classes that tend to co-occur. E.g., instead of “cat”, try “cat and dog”, “cat and bed”, etc.

17.03.2025 05:10 👍 0 🔁 0 💬 1 📌 0

✨ Our paper "SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models" has been accepted to #CVPR2025! ✨

arxiv.org/pdf/2502.16911

Huge thanks to my amazing coauthors Aditya Gangrade, Samarth Mishra, Kate Saenko, and Venkatesh Saligrama.

28.02.2025 07:11 👍 3 🔁 1 💬 2 📌 0

Kevin Miller

Latest posts by Kevin Miller @kjmiller013