A factor of 10 billion since 2010 ๐ฎ
A couple of eye-opening slides form @sloeschcke.bsky.social's presentation at todayโs @belongielab.org meeting (1/2)
@sloeschcke
Working on Efficient Training, Low-Rank Methods, and Quantization. PhD at the University of Copenhagen ๐ฉ๐ฐ Member of @belongielab.org, Danish Data Science Academy, and Pioneer Centre for AI ๐ค ๐ sebulo.github.io/
A factor of 10 billion since 2010 ๐ฎ
A couple of eye-opening slides form @sloeschcke.bsky.social's presentation at todayโs @belongielab.org meeting (1/2)
๐ณ๐ฑ ๐ค๐๐ฎ๐น๐ฐ๐ผ๐บ๐บ ๐๐ ๐ฅ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต ๐๐ป๐๐ฒ๐ฟ๐ป๐๐ต๐ถ๐ฝ ๐ณ๐ฑ
Excited to join @qualcomm.bsky.social in Amsterdam as a research intern in the Model Efficiency group, where Iโll be working on quantization and compression of machine learning models.
Iโll return to Copenhagen in December to start the final year of my PhD.
๐ฏ Best Paper Award at CVPR workshop on Visual concepts for our (@doneata.bsky.social + @delliott.bsky.social) paper on probing vision/lang/ vision+lang models for semantic norms!
TLDR: SSL vision models (swinV2, dinoV2) are surprisingly similar to LLM & VLMs even w/o lang ๐
arxiv.org/abs/2506.03994
Thanks to my co-authors David Pitt, Robert Joseph George, Jiawwei Zhao, Cheng Luo, Yuandong Tian, Jean Kossaifi, @anima-anandkumar.bsky.social, and @caltech.edu for hosting me this spring!
Paper: arxiv.org/abs/2501.02379
Code: github.com/neuraloperat...
We also show strong results on other PDE benchmarks, including ๐๐๐ซ๐๐ฒ ๐๐ฅ๐จ๐ฐ and the ๐๐ฎ๐ซ๐ ๐๐ซ๐ฌ equation, demonstrating TensorGRaDโs broad applicability across scientific domains.
We test TensorGRaD on large-scale NavierโStokes at 1024ร1024 resolution with Reynolds number 10e5, a highly turbulent setting. With mixed-precision and 75% optimizer state reduction, it ๐ฆ๐๐ญ๐๐ก๐๐ฌ ๐๐ฎ๐ฅ๐ฅ-๐ฉ๐ซ๐๐๐ข๐ฌ๐ข๐จ๐ง ๐๐๐๐ฆ while cutting overall memory by up to 50%.
We also propose a ๐ฆ๐ข๐ฑ๐๐-๐ฉ๐ซ๐๐๐ข๐ฌ๐ข๐จ๐ง ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ strategy with weights, activations, and gradients in half precision and optimizer states in full precision, and empirically show that storing optimizer states in half precision hurts performance.
We extend low-rank and sparse methods to tensors via a ๐ซ๐จ๐๐ฎ๐ฌ๐ญ ๐ญ๐๐ง๐ฌ๐จ๐ซ ๐๐๐๐จ๐ฆ๐ฉ๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง that splits gradients into a low-rank Tucker part and an unstructured sparse tensor. Unlike matricized approaches, we prove our tensor-based method converges.
Recent methods reduce optimizer memory for matrix weights. This includes Low-rank and sparse methods from LLMs that work on matrices. But to use them for Neural Operators, weโd need to flatten tensors, which destroys their spatial/temporal structure and hurts performance.
These Neural Operators use tensor weights. However, optimizers like Adam store two full tensors per weight, making memory the bottleneck at scale.
TensorGRaD reduces this overhead by up to 75% (๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐ ), without hurting accuracy.
Scientific computing operates on multiscale, multidimensional (๐ญ๐๐ง๐ฌ๐จ๐ซ) ๐๐๐ญ๐. In weather forecasting, for example, inputs span space, time, and variables. Neural operators can capture these multiscale phenomena by learning an operator that maps between function spaces.
Check out our new preprint ๐๐๐ง๐ฌ๐จ๐ซ๐๐๐๐.
We use a robust decomposition of the gradient tensors into low-rank + sparse parts to reduce optimizer memory for Neural Operators by up to ๐๐%, while matching the performance of Adam, even on turbulent NavierโStokes (Re 10e5).
Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? Sรธren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)
Visited the beautiful UC Santa Barbara yesterday.
Thrilled to announce "Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation" is accepted as a Spotlight (5%) at #ICLR2025!
Our model MM-FSS leverages 3D, 2D, & text modalities for robust few-shot 3D segmentationโall without extra labeling cost. ๐คฉ
arxiv.org/pdf/2410.22489
More details๐
While Pasadena will be my home, Iโll also be making trips to Austin, the Bay Area, and San Diego. If youโre nearby and up for a chat, reach outโletโs meet up!
View from the office building
โ๏ธ Moved to Pasadena, California! โ๏ธ
For the next five months, Iโll be a Visiting Student Researcher at Anima Anandkumar's group at Caltech, collaborating with her team and Jean Kossaifi from NVIDIA on Efficient Machine Learning and AI4Science.
Screenshot of the course website for "SSL4EO: Self-Supervised Learning for Earth Observation"
Recordings of the SSL4EO-2024 summer school are now released!
This blog post summarizes what has been covered:
langnico.github.io/posts/SSL4EO...
Recordings: www.youtube.com/playlist?lis...
Course website: ankitkariryaa.github.io/ssl4eo/
[1/3]
New Starter Pack: Pioneer Centre for AI researchers
Come by our poster session tomorrow!
๐๏ธ West Ballroom A-D #6104
๐ Thu, 12 Dec, 4:30 p.m. โ 7:30 p.m. PST
@madstoftrup.bsky.social and I are presenting LoQT: Low-Rank Adapters for Quantized Pretraining: arxiv.org/abs/2405.16528
#Neurips2024
Copenhagen University and Aarhus University meet-up in Vancouver ๐ฉ๐ฐ๐จ๐ฆ
#NeurIPS2024
On my way to NeurIPS in Vancouver ๐จ๐ฆ
Looking forward to reconnecting with friends and meeting new people. Let me know if you are interested in efficient training, quantization, or grabbing a coffee!
#NeurIPS2024
Check out the work our lab in Copenhagen will be presenting at #NeurIPS2024 ๐
@neuripsconf.bsky.social @belongielab.org
Hereโs a starter pack with members of our lab that have joined Bluesky
Pre-NeurIPS Poster Session in Copenhagen.
Thanks to the Pioneer Centre for AI and @ellis.eu for sponsoring.
@neuripsconf.bsky.social
#neurips2024
Check out the ELLIS Pre-NeurIPS Fest event today in...๐ฉ๐ฐCopenhagen!
ELLIS Unit Copenhagen is holding their event at the Pioneer Center for AI showcasing #NeurIPS posters and other Denmark-affiliated papers in #AI and #ML.
More info: bit.ly/4fRFrAh
A photo of Boulder, Colorado, shot from above the university campus and looking toward the Flatirons.
I'm recruiting 1-2 PhD students to work with me at the University of Colorado Boulder! Looking for creative students with interests in #NLP and #CulturalAnalytics.
Boulder is a lovely college town 30 minutes from Denver and 1 hour from Rocky Mountain National Park ๐
Apply by December 15th!
Thanks to the ๐ฃ๐ถ๐ผ๐ป๐ฒ๐ฒ๐ฟ ๐๐ฒ๐ป๐๐ฟ๐ฒ ๐ณ๐ผ๐ฟ ๐๐ for organizing this event as part of the ๐๐๐๐๐ฆ ๐ฃ๐ฟ๐ฒ-๐ก๐ฒ๐๐ฟ๐๐ฃ๐ฆ ๐๐ฒ๐๐! ๐
#NeurIPS2024
Join us for the ๐ฃ๐ฟ๐ฒ-๐ก๐ฒ๐๐ฟ๐๐ฃ๐ฆ ๐ฃ๐ผ๐๐๐ฒ๐ฟ ๐ฆ๐ฒ๐๐๐ถ๐ผ๐ป in Copenhagen!
๐๏ธ ๐ช๐ต๐ฒ๐ป: 16:00โ18:00, Nov. 22, 2024
๐ ๐ช๐ต๐ฒ๐ฟ๐ฒ: Entrance Hall, Gefion, รster Voldgade 10, 1350 Kรธbenhavn K.
Present or explore European contributions to NeurIPS 2024 and connect with colleagues.
๐ ๐๐ป๐ณ๐ผ & ๐๐ถ๐ด๐ป-๐๐ฝ: www.aicentre.dk/events/pre-n...
LoQT will be presented at NeurIPS 2024! ๐
This research was funded by @DataScienceDK, and @AiCentreDK and is a collaboration between @DIKU_Institut, @ITUkbh, and @csaudk