Kevin Miller's Avatar

Kevin Miller

@kjmiller013

CS PhD student at BU

12
Followers
46
Following
14
Posts
28.02.2025
Joined
Posts Following

Latest posts by Kevin Miller @kjmiller013

🎢 and there are no cats in America, and the streets are paved with cheese 🎡

05.11.2025 11:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

glad to hear she takes after her great-grandmother

16.10.2025 11:27 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Our method, ✨SPARC✨, significantly boosts performance on three different multilabel recognition datasets and nine different CLIP backbones and complements the strengths of existing white-box and training-based methods. Looking forward to presenting it at CVPR!

10.06.2025 04:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

We also find that CLIP scores are impacted by image- and prompt-level bias. Simple standardization is surprisingly effective at removing these biases and boosting multilabel recognition performance.

10.06.2025 04:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We find that the second-highest score provides a better signal, and in general we get our best results by adaptively fusing all of the ranks using the direction of maximum variance.

10.06.2025 04:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How should we use these β€œcompound” prompts? A natural choice would be to use the highest-scoring one, as it is likely the most descriptive. However, we find that this approach leads to false positives due to the β€œOR-gate” nature of CLIP scores.

10.06.2025 04:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our question: How can we make VLMs better at multilabel recognition, without need for training or access to VLM internals?

Idea: Make each class’s prompt more descriptive by pairing with classes that tend to co-occur. E.g., instead of β€œcat”, try β€œcat and dog”, β€œcat and bed”, etc.

10.06.2025 04:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - kjmillerCURIS/SPARC: SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models - kjmillerCURIS/SPARC

Looking forward to presenting our paper β€œβœ¨SPARC✨: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models” at #CVPR2025 this Friday!

Checkout our ✨code + paper + poster✨: github.com/kjmillerCURI...

10.06.2025 03:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our method, ✨SPARC✨, significantly boosts performance on three different multilabel recognition datasets and nine different CLIP backbones and complements the strengths of existing white-box and training-based methods. We look forward to presenting our work at #CVPR2025 in June.

17.03.2025 05:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We also find that CLIP scores are impacted by image- and prompt-level bias. Simple standardization is surprisingly effective at removing these biases and boosting multilabel recognition performance.

17.03.2025 05:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We find that the second-highest score provides a better signal, and in general we get our best results by adaptively fusing all of the ranks using the direction of maximum variance.

17.03.2025 05:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
A diagram with two example images, illustrating how using the highest score can lead to false-positives, while using the second-highest score can mitigate this problem.

A diagram with two example images, illustrating how using the highest score can lead to false-positives, while using the second-highest score can mitigate this problem.

How should we use these β€œcompound” prompts? A natural choice would be to use the highest-scoring one, as it is likely the most descriptive. However, we find that this approach leads to false positives due to the β€œOR-gate” nature of CLIP scores.

17.03.2025 05:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our question: How can we make VLMs better at multilabel recognition, without need for training or access to VLM internals?

Idea: Make each class’s prompt more descriptive by pairing with classes that tend to co-occur. E.g., instead of β€œcat”, try β€œcat and dog”, β€œcat and bed”, etc.

17.03.2025 05:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

✨ Our paper "SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models" has been accepted to #CVPR2025! ✨

arxiv.org/pdf/2502.16911

Huge thanks to my amazing coauthors Aditya Gangrade, Samarth Mishra, Kate Saenko, and Venkatesh Saligrama.

28.02.2025 07:11 πŸ‘ 3 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0