Nicolas Beltran-Velez's Avatar

Nicolas Beltran-Velez

@velezbeltran

Machine Learning PhD Student @ Blei Lab & Columbia University. Working on probabilistic ML | uncertainty quantification | LLM interpretability. Excited about everything ML, AI and engineering!

1,806
Followers
1,002
Following
84
Posts
17.11.2024
Joined
Posts Following

Latest posts by Nicolas Beltran-Velez @velezbeltran

Post image

πŸŽ“ Hats off to the 2025 IICD graduates: Yining Ma Junze Huang Yichi Yang Ruilin Dai Boan Zhu Cameron Park @jlfan.bsky.social & Achille Nazaret!
Wishing you all the best in your next chapter β€” we’re proud of you! πŸ’™ #Columbia2025
@bleilab.bsky.social @khanhndinh.bsky.social @elhamazizi.bsky.social

21.05.2025 13:19 πŸ‘ 5 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation Knowledge distillation (KD) is a core component in the training and deployment of modern generative models, particularly large language models (LLMs). While its empirical benefits are well documented-...

this is probably not the complete picture of KD, but i can definitely sleep better after writing down and confirming this minimal working explanation.

arXiv: arxiv.org/abs/2505.13111

(3/4)

20.05.2025 12:17 πŸ‘ 4 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Post image

I received a review like this five years ago. It’s probably the right time now to share it with everyone who wrote or got random discouraging reviews from ICML/ACL.

28.03.2025 19:55 πŸ‘ 65 πŸ” 5 πŸ’¬ 1 πŸ“Œ 3
Post image

First 11 chapters of RLHF Book have v0 draft done. Should be useful now.

Next:
* Crafting more blog content into future topics,
* DPO+ chapter,
* Meeting with publishers to get wheels turning on physical copies,
* Cleaning & cohesiveness
rlhfbook.com

26.02.2025 16:35 πŸ‘ 48 πŸ” 9 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ”₯ Benchmark Alert! MotifBench sets a new standard for evaluating protein design methods in motif scaffolding.
Why does this matter? Reproducibility & fair comparison have been lackingβ€”until now.
Paper: arxiv.org/abs/2502.12479 | Repo: github.com/blt2114/Moti...
A thread ⬇️

19.02.2025 20:49 πŸ‘ 41 πŸ” 17 πŸ’¬ 1 πŸ“Œ 5
Post image

The HuggingFace/Nanotron team just shipped an entire pretraining textbook in interactive format. huggingface.co/spaces/nanot...

It’s not just a great pedagogic support, but many unprecedented data and experiments presented for the first time in a systematic way.

19.02.2025 19:12 πŸ‘ 39 πŸ” 9 πŸ’¬ 0 πŸ“Œ 0
Post image

I just wanted to see what it looked like 😭

19.02.2025 02:26 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Good God, please. I just want some gradients that don't vanish 😭

17.02.2025 03:01 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I was hoping that recent events would lead to a mass exodus from X. Many have left, but most of the ML and LLM people have not.

I have lost a lot of respect for the ML community.

05.02.2025 05:58 πŸ‘ 72 πŸ” 4 πŸ’¬ 9 πŸ“Œ 2
Video thumbnail

Now that bluesky has gifs (it didn't work?), I can share (again) my educational notebook on discrete flow matching (by Itai Gat et al.). Also please check the original article and official implementation by Meta!

🐍 github.com/gle-bellier/...
🐍 github.com/facebookrese...
πŸ“„ arxiv.org/abs/2407.15595

05.02.2025 16:54 πŸ‘ 15 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend ...

Really excited about this! We note a connection between diffusion/flow models and neural/latent SDEs. We show how to use this for simulation-free learning of fully flexible SDEs. We refer to this as SDE Matching and show speed improvements of several orders of magnitude.

arxiv.org/abs/2502.02472

05.02.2025 14:38 πŸ‘ 50 πŸ” 10 πŸ’¬ 0 πŸ“Œ 0
This is a scatterplot with the following key features:

Axes:
The x-axis represents "Interest in AI," with values ranging approximately from -2 to 2.
The y-axis represents "Willingness to Tolerate Closed, Autocratic Systems," also ranging from about -2 to 2.
Data Points:
Black dots dominate the plot, distributed across all four quadrants, indicating diverse positions on both variables.
A few red dots labeled "my peeps" are clustered in the bottom-right quadrant, signifying high interest in AI but low tolerance for closed, autocratic systems.
Blue Lines:
The plot includes horizontal and vertical blue lines at zero, dividing it into four quadrants for visual reference.
This visualization highlights a subset of individuals ("my peeps") who stand out from the majority based on their distinct combination of interest and values.

This is a scatterplot with the following key features: Axes: The x-axis represents "Interest in AI," with values ranging approximately from -2 to 2. The y-axis represents "Willingness to Tolerate Closed, Autocratic Systems," also ranging from about -2 to 2. Data Points: Black dots dominate the plot, distributed across all four quadrants, indicating diverse positions on both variables. A few red dots labeled "my peeps" are clustered in the bottom-right quadrant, signifying high interest in AI but low tolerance for closed, autocratic systems. Blue Lines: The plot includes horizontal and vertical blue lines at zero, dividing it into four quadrants for visual reference. This visualization highlights a subset of individuals ("my peeps") who stand out from the majority based on their distinct combination of interest and values.

I have a sinking feeling that by 2029 I'm going to be faking a British accent so no one will think I was one of the *Americans* working on AI during the regime.

03.02.2025 01:24 πŸ‘ 111 πŸ” 10 πŸ’¬ 11 πŸ“Œ 0

NGL, it's kind of surprising that more people haven't migrated here, especially given what Musk has been doing these days. I don't get it.

03.02.2025 02:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months.

Policy gradient chapter is coming together. Plugging away at the book every day now.

rlhfbook.com/c/11-policy-...

01.02.2025 22:05 πŸ‘ 156 πŸ” 20 πŸ’¬ 2 πŸ“Œ 1

Please stop anthropomorphizing language models, it makes them feel really bad

29.01.2025 23:20 πŸ‘ 70 πŸ” 2 πŸ’¬ 3 πŸ“Œ 0
Preview
From the fednews community on Reddit Explore this post and more from the fednews community

This comments section is the first time I've felt even a shred of hope in eight days.

29.01.2025 05:41 πŸ‘ 20421 πŸ” 3851 πŸ’¬ 574 πŸ“Œ 648
Preview
Democracy 2025 | The united legal frontline in the fight for our democracy Democracy 2025 is the strategic hub to protect people and their rights should the Trump-Vance administration seek to unlawfully strip away freedoms and prosperity.

democracy2025.org/response-center keeps track of it.

27.01.2025 17:10 πŸ‘ 5 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Nazi salutes and speaking at neo-Nazi rallies seems bad. There's history that we should learn from.

26.01.2025 00:42 πŸ‘ 115 πŸ” 5 πŸ’¬ 3 πŸ“Œ 0

Something I really like about NLP research is that it makes everything super intuitive. This week I have been thinking about variational inference in NLP and a lot of the things that seemed to require mathematical intuition just become trivial when thinking about language. So cool:)

25.01.2025 21:52 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Tests!! :)

25.01.2025 19:50 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

But the memory needed for the value function kills the ones that don't have good GPUs 😭

25.01.2025 15:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

New randomized, controlled trial by the World Bank of students using GPT-4 as a tutor in Nigeria. Six weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions.

And it helped all students, especially girls who were initially behind.

15.01.2025 20:58 πŸ‘ 354 πŸ” 88 πŸ’¬ 15 πŸ“Œ 27

I mostly use copilot for writing code (as auto complete), gpt4-o for boiler plate, and o1 for serious debugging or boilerplate with some complexity or a lot of requirements. I also use o1 for quick but slightly involved experiments but not as often.

08.01.2025 19:36 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I use chatgpt over Google for a lot of things because it is really good at fuzzy queries + data aggregation from many sources. I feel that as long as you double check results it is much faster and convenient.

07.01.2025 00:40 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Does anyone have any good resources to learn about quantization? Any essential papers to read and resources about how to use/quantize models in practice are greatly appreciated!

28.12.2024 16:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

That seems very reasonable no? I assumed the public set was used by most (almost all?) algorithms that have been benchmarked against the task. Isn't that the case? (legitimate question)

22.12.2024 00:41 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

1-> 2 -> 3 -> 3.5 -> 4 -> 4o -> o1 -> o3

I guess we need AGI just to figure out how to name things

20.12.2024 19:17 πŸ‘ 71 πŸ” 6 πŸ’¬ 7 πŸ“Œ 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

19.12.2024 00:55 πŸ‘ 74 πŸ” 31 πŸ’¬ 2 πŸ“Œ 0

🧡 Excited to share #Echidna, a Bayesian framework for quantifying the impact of gene dosage on phenotypic plasticity: tinyurl.com/296kf7hf!
With @elhamazizi.bsky.social and @mingxz.bsky.social, we integrate scRNA-seq & WGS to uncover how CNAs drive tumor evolution and transcriptional variability.

18.12.2024 13:30 πŸ‘ 15 πŸ” 6 πŸ’¬ 2 πŸ“Œ 2