Gabriele Berton's Avatar

Gabriele Berton

@berton-gabri

Postdoc at Amazon on MLLM - ex CMU, PoliTo, IIT https://gmberton.github.io/

662
Followers
469
Following
142
Posts
20.11.2024
Joined
Posts Following

Latest posts by Gabriele Berton @berton-gabri

Post image

LLMs are the present of AI

Video understanding is the future

That's why we're organizing the 2nd VidLLM workshop at CVPR 2026!

We'll have paper submission and 3 challenges. More info coming soon!

21.01.2026 19:04 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
CompLLM: Compression for Long Context Q&A Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which ma...

I agree. In general though, it just seems like the tokens used in LLMs are far too fine-grained. Baking more info into each token can be done in other ways than rendering images too. For instance, @parskatt.bsky.social pointed me to CompLLM by @berton-gabri.bsky.social arxiv.org/abs/2509.19228

22.10.2025 12:05 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

This is exactly what I thought when deepseek OCR came out!
These two and many other works build on the same assumptions, and I don't understand how come we're still using ~one token per word in NLP

30.10.2025 03:50 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
DataDecide: How to Predict Best Pretraining Data with Small Experiments Because large language models are expensive to pretrain on different datasets, using smaller-scale experiments to decide on data is crucial for reducing costs. Which benchmarks and methods of making d...

different datasets can end up in the same cluster). Intuitively, the first method is cheaper, while the latter more expensive and better performing.

DataDecide: arxiv.org/abs/2504.11393
CLIMB: arxiv.org/abs/2504.13161

23.04.2025 15:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

data quality.

The main difference is that DataDecide splits the data according to its data source (usually training datasets are a collection of multiple datasets), while CLIMB creates clusters with each documents embeddings (meaning documents from ...

23.04.2025 15:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

large LLM on many subsets would be unfeasibly expensive).

Here some similarities and differences between these two papers:

Both papers split the whole available training data into subsets, train a small LLM on the subsets, and see how this performs: its performance is used as a proxy for ...

23.04.2025 15:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

How to select pre-training data for LLMs?

Two papers came out last week from AllenAI and Nvidia that do it in a similar way, building on the intuition that good data is good regardless the size of the LLM.

This intuition can be used to select good data in a cheap manner (training a ...

23.04.2025 15:34 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image Post image

To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition

Davide Sferrazza, @berton-gabri.bsky.social, @gabtriv.bsky.social, Carlo Masone

tl;dr:VPR datasets saturate;re-ranking not good;image matching->uncertainty->inlier counts->confidence

arxiv.org/abs/2504.06116

09.04.2025 03:35 ๐Ÿ‘ 5 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

When I read a paper, the only way I have to remember something about it six months from now is to use Anki

29.03.2025 16:02 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Probably nobody knows how to pronounce his name and so they avoid talking about him

26.03.2025 06:09 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

And it gets better... for MCoT (Multimodal Chain-of-Thought) they should say "in recent weeks" ๐Ÿ˜‚

26.03.2025 05:21 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

I find mindblowing that LLM papers should start saying "in recent months" instead of years. OpenAI O1 and DeepSeek R1 are literally a few months old

26.03.2025 05:18 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

The FastAPLoss gave us worse results than average, but again, it was preliminary results with batch size 32.

The SmoothAP and Recall@k are not in the PML so we didn't even consider them (we had already over 30 losses to try). It might be helpful to add your Recall@k to PML :)

20.03.2025 13:37 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Cool stuff :)

20.03.2025 13:36 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

bsky.app/profile/bert...

20.03.2025 13:24 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Yeah intuitively it makes sense to perturb the student's images, not sure why it doesn't work in the 2021 distillation paper.
Someone should make a benchmark for distillation across tasks...

20.03.2025 13:24 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

I believe Beyer et al 2021 distillation paper says the images should be the same for teacher and student

20.03.2025 03:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

๐Ÿš€ Big news! Just got my O-1 visa, booked my flight to San Francisco, and Iโ€™m really happy to join Amazon in Palo Alto! Ready for this exciting new chapter ๐Ÿš€

I'll be doing a PostDoc on Vision-Language Models!

20.03.2025 03:05 ๐Ÿ‘ 15 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

The line is so blurry...

Two images of the same car are the same instance? (yes)

If it's the same car but re-painted?

If it's the same car but re-made?

If it's two different cars, same model with same color?

If same model, different color?

Same brand, different model?

19.03.2025 18:28 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
GitHub - cvdfoundation/google-landmark: Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes. Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes. - cvdfoundation/google-landmark

Someone should add the GLDv2 dataset to the PML library datasets.
It should take a couple hours to write the code (maybe 10 minutes with cursor ๐Ÿ˜‚), you'd be a contributor to the most important metric learning library

github.com/cvdfoundatio...

kevinmusgrave.github.io/pytorch-metr...

19.03.2025 16:30 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Interesting work, happy to see people working on the field!

Also a bit disappointed not to see them comparing with methods that we found to be SOTA on the task, like RoMa and SIFT+LightGlue

19.03.2025 16:25 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I won't have time to run new experiments (starting a new job on Monday) but if anyone wants to add results with other losses or anything else I'm happy to update the paper :)

19.03.2025 16:22 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Interesting point, are you referring to e.g. the FastAPLoss?

To be fair, our preliminary results, which were used to select the shortlist of 12 losses (out of 34, all those in the pytorch-metric-learning library), were run on a batch size of 32, so there's a chance we missed out on good losses

19.03.2025 16:21 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I think I see your point, for you image retrieval is about retrieving an image of exactly the same object (e.g. exactly that one car, not a car of the same model)?

Then isn't that instance retrieval?

But anyway, naming conventions are very blurry in our field

19.03.2025 16:16 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Also, the paper is only on arxiv, we have no plans to submit, and the code is super simple

If anyone wants to add results we're pretty flexible with it, and we can add new authors

My main goal is to have a good reference paper for anyone doing retrieval, so I'm happy to update the paper as needed

19.03.2025 16:11 ๐Ÿ‘ 11 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

And I'd call GLD, Oxford, etc "landmark retrieval" ๐Ÿ˜†
To be fair they're all image retrieval datasets, but GLD-oxford and CUB-Cars are just different subcategories of it

The nice things about the datasets we used is that train-test sets are well defined, whereas e.g. oxford, paris have no train sets

19.03.2025 16:06 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I'll have to pay a visit ๐Ÿชด

19.03.2025 02:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

The one and only fern! Where is it?

While writing this I've realized that fern is an anagram of NeRF, definitely not a coincidence

18.03.2025 22:44 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
All You Need to Know About Training Image Retrieval Models Image retrieval is the task of finding images in a database that are most similar to a given query image. The performance of an image retrieval pipeline depends on many training-time factors, includin...

Arxiv: arxiv.org/abs/2503.13045

Code: github.com/gmberton/ima...

Pytorch Metric Learning Library: kevinmusgrave.github.io/pytorch-metric

18.03.2025 22:41 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

All this comes from a tiny yet powerful 400-LOC codebase, thanks to the PyTorch Metric Learning Library - whose developer is co-author of this paper!

So many thanks to co-authors Kevin Musgrave and Carlo Masone!

18.03.2025 22:41 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0