Our team in Google DeepMind Toronto is hiring a Student Researcher for Summer 2025 to work on projects in Video generative models and 3D Computer Vision. If you are interested, please apply at: forms.gle/Yj1jmbvjBFQC...
Our team in Google DeepMind Toronto is hiring a Student Researcher for Summer 2025 to work on projects in Video generative models and 3D Computer Vision. If you are interested, please apply at: forms.gle/Yj1jmbvjBFQC...
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
arxiv.org/abs/2411.19722
A thread 👇
1/
SfM failing on dynamic videos? 😠 RoMo to the rescue! 💪 Our simple method uses epipolar cues and semantic features for robustly estimating motion masks, boosting dynamic SfM performance 🚀 Plus, a new dataset of dynamic scenes with ground truth cameras! 🤯 #computervision
🧵👇