Erfan Mirzaei's Avatar

Erfan Mirzaei

@erfunmirzaei

Researcher @PontilGroup.bsky.social| Ph.D. Student @ellis.eu, @Polytechnique, and @UniGenova. Interested in (deep) learning theory and others. https://erfunmirzaei.github.io/

465
Followers
159
Following
19
Posts
24.11.2024
Joined
Posts Following

Latest posts by Erfan Mirzaei @erfunmirzaei

Post image

Almost 5 years in the making... "Hyperparameter Optimization in Machine Learning" is finally out! πŸ“˜

We designed this monograph to be self-contained, covering: Grid, Random & Quasi-random search, Bayesian & Multi-fidelity optimization, Gradient-based methods, Meta-learning.

arxiv.org/abs/2410.22854

17.12.2025 09:54 πŸ‘ 13 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0

πŸ‘‡ While we wait for the OpenReview drama to settle, here is something that actually solves problems. πŸ˜…

A definitive guide to HPO from my lab mates. Don't let your hyperparameters be a mystery (unlike your reviewers).
#MachineLearning #HPO

28.11.2025 17:35 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

If you’re curious about the intersection of statistical learning theory, sampling-based optimization, generalization in deep learning, and PAC-Bayesian analysis, check out our paper.We’d love to hear your thoughts, feedback, or questions. If you spot interesting connections to your work, let’s chat!

14.11.2025 14:11 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

😱 A second, equally striking factor: by applying a single scalar calibration factor computed from the data, the resulting upper bounds become not only tighter for true labels but also better aligned with the test error curve.

14.11.2025 14:11 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ™€ One surprising insight: Generalization in the under-regularized low-temperature regime (Ξ² > n) is already signaled by small training errors in the over-regularized high-temperature regime.

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Empirical results on MNIST and CIFAR-10 show:
1) Non-trivial upper bounds on test error for both true and random labels
2) Meaningful distinction between structure-rich and structure-poor datasets

The figures: Binary classification with FCNNs using SGLD using 8k MNIST images

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We show that it can be effectively approximated via Langevin Monte Carlo (LMC) algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD), and crucially,

πŸ“Ž Our bounds remain stable under this approximation (in both total variation and Wβ‚‚ distance).

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Then comes our first contribution:
βœ… We derive high-probability, data-dependent bounds on the test error for hypotheses sampled from the Gibbs posterior (for the first time in the low-temperature regime Ξ² > n).
Sampling from the Gibbs posterior is, however, typically difficult.

14.11.2025 14:11 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This leads naturally to the Gibbs posterior, which assigns higher probabilities to hypotheses with smaller training errors (exponentially decaying with loss).

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

To probe this question, we turn to randomized predictors rather than deterministic ones.
Here, predictors are sampled from a prescribed probability distribution, allowing us to apply PAC-Bayesian theory to study their generalization properties.

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In the figure below from the famous paper, the same model achieves nearly zero training error on both random and true labels. Therefore, the key to generalization must lie within the structure of the data itself.
arxiv.org/abs/1611.03530

14.11.2025 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such a...

🧡Thermodynamics Reveals the Generalization in the Interpolation Regime

In the realm of overparameterized NNs, one can achieve almost zero training error on any data, even random labels, that yield massive test errors.
So, how can we tell when such a model truly generalizes?
arxiv.org/abs/2510.06028

14.11.2025 14:11 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ“’ Upcoming Talk at Our Lab

We’re excited to host Arthur Bizzi from EPFL for a research talk next week!

Title: Towards Neural Kolmogorov Equations: Parallelizable SDE Learning with Neural PDEs

πŸ—“ Date: November 19
⏰ Time: 16:00 CET
πŸ“ Galileo Sala, CHT @iitalk.bsky.social

14.11.2025 14:03 πŸ‘ 5 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

This paves the way for more data-dependent generalization guarantees in dependent-data settings.

02.05.2025 18:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Technique highlights:
πŸ”Ή Uses blocking methods
πŸ”Ή Captures fast-decaying correlations
πŸ”Ή Results in tight O(1/n) bounds when decorrelation is fast

Applications:
πŸ“Š Covariance operator estimation
πŸ”„ Learning transfer operators for stochastic processes

02.05.2025 18:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our contribution:
We propose empirical Bernstein-type concentration bounds for Hilbert space-valued random variables arising from mixing processes.
🧠 Works for both stationary and non-stationary sequences.

02.05.2025 18:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Challenge:
Standard i.i.d. assumptions fail in many learning tasks, especially those involving trajectory data (e.g., molecular dynamics, climate models).
πŸ‘‰ Temporal dependence and slow mixing make it hard to get sharp generalization bounds.

02.05.2025 18:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🚨 Poster at #AISTATS2025 tomorrow!
πŸ“Poster Session 1 #125

We present a new empirical Bernstein inequality for Hilbert space-valued random processesβ€”relevant for dependent, even non-stationary data.

w/ Andreas Maurer, @vladimir-slk.bsky.social & M. Pontil

πŸ“„ Paper: openreview.net/forum?id=a0E...

02.05.2025 18:35 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1

1/ πŸš€ Over the past two years, our team, CSML, at IIT, has made significant strides in the data-driven modeling of dynamical systems. Curious about how we use advanced operator-based techniques to tackle real-world challenges? Let’s dive in! πŸ§΅πŸ‘‡

15.01.2025 14:34 πŸ‘ 5 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

An inspiring dive into understanding dynamical processes through 'The Operator Way.' A fascinating approach made accessible for everyoneβ€”check it out! πŸ‘‡πŸ‘€

15.01.2025 10:31 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers in large language modeling, offering linear scaling with…

Excited to present
"Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues"
at the M3L workshop at #NeurIPS
https://buff.ly/3BlcD4y

If interested, you can attend the presentation the 14th at 15:00, pass at the afternoon poster session, or DM me to discuss :)

10.12.2024 22:52 πŸ‘ 9 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

In his book β€œThe Nature of Statistical Learning” V. Vapnik wrote:
β€œWhen solving a given problem, try to avoid a more general problem as an intermediate step”

12.12.2024 17:19 πŸ‘ 8 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

Excited to share our lab's amazing contributions at NeurIPS this year! Check out our papers and stay inspired! πŸš€πŸ“š #NeurIPS2024

10.12.2024 06:18 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Could add me to the list?

04.12.2024 22:29 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Hi Gaspard. I wonder what you are currently working on in regard to sequence models and world models. Since I have similar interests as you, and in the lab, we had worked on the intersection of the topics (bsky.app/profile/marc...).

27.11.2024 14:43 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Hi πŸ‘‹ We're glad to be here on @bsky.app and looking forward to engaging in this community. But first, learn a little more about us...

#ELLISforEurope #AI #ML #CrossBorderCollab #PhD

21.11.2024 10:37 πŸ‘ 121 πŸ” 18 πŸ’¬ 3 πŸ“Œ 1