Alex Luedtke (@alexluedtke)

🚨A Researcher's Guide to Empirical Risk Minimization

I put together a guide on regret theory for empirical risk minimization (ERM) as I understand it.

The goal was to compile results and proof techniques I’ve found useful in my own work. I hope people find it useful more broadly

26.02.2026 16:40 👍 8 🔁 3 💬 0 📌 0

Fig 4 from Zhang, Lee, Liu "Statistical Learning Theory in Lean 4: Empirical Processes from Scratch" The dependency graph of the formalizations. Diagram shows proof of Dudley's entropy integral with preceding lemmas and Gaussian Lipschitz concentration likewise, feeding into a Gaussian Complexity inequality and an error bound for critical radius then used to prove sharp minimax error rates for linear regression.

Never have I felt more like my job will soon by taken by AI. Statistical learning theory in Lean: concentration inequalities, Dudley's entropy integral, and local Gaussian complexity bounds.
30000 lines of code, over 1000 lemmas, formalizing Wainwright and Boucheron et al arxiv.org/abs/2602.02285

03.02.2026 19:20 👍 32 🔁 5 💬 0 📌 4

By that metric, the AlphaFold paper should be about (*checks math*) 93M words long. 😉

20.01.2026 16:02 👍 3 🔁 0 💬 0 📌 0

Inference on Local Variable Importance Measures for Heterogeneous Treatment Effects We provide an inferential framework to assess variable importance for heterogeneous treatment effects. This assessment is especially useful in high-risk domains such as medicine, where decision makers...

New paper: Inference on Local Variable Importance Measures for Heterogeneous Treatment Effects (with Peter B. Gilbert & @alexluedtke.bsky.social).

Preprint: arxiv.org/abs/2510.18843.

15.12.2025 11:15 👍 4 🔁 2 💬 1 📌 0

Gemini

04.12.2025 21:54 👍 4 🔁 0 💬 0 📌 1

I'm excited to dig into this new work on numerically approximating efficient influence functions. The main idea seems to be to use a Fourier-type approximation, rather than the kernel-smoother approximation used in earlier approaches.

openreview.net/pdf/cfeab45d...

07.11.2025 15:32 👍 6 🔁 0 💬 0 📌 0

HMS - Postdoctoral Fellow in Health Care Policy - Statistical Methods The Department of Health Care Policy at Harvard Medical School seeks a highly motivated postdoctoral fellow to join a PCORI-funded project developing new methods to improve the transparency of individ...

📢 Postdoc opening in stats @harvardmed.bsky.social! Build variable-importance methods so patients know why a treatment is/isn't expected to work for them. Funding from PCORI (non-federal).

Application review starts Dec 1. Details here:

academicpositions.harvard.edu/postings/15365

22.10.2025 20:57 👍 1 🔁 0 💬 0 📌 0

Our method can take existing generative models and use them to produce counterfactual images, text, etc.

From a technical perspective, our approach is doubly robust and can be wrapped around state of the art approaches like diffusion models, flow matching, and autoregressive language models.

24.09.2025 20:42 👍 3 🔁 0 💬 0 📌 0

Title page for paper: DoubleGen: Debiased Generative Modeling of Counterfactuals arXiv:2509.16842 (stat) Alex Luedtke, Kenji Fukumizu

Selected attributes that are more common in smiling (n = 78 080) than in non-smiling (n = 84 690) CelebA faces. If a model is trained only on the smiling subset, it tends to over-produce these attributes instead of showing how the full population would look if everyone smiled. Table: Lipstick Makeup Female* Earrings No-beard Blonde Smiling 56 % 47 % 65 % 26 % 88 % 18 % Not smiling 38 % 30 % 52 % 12 % 79 % 12 % Overall 47 % 38 % 58 % 19 % 83 % 15 %

Counterfactual smiling celebrities generated by a traditional diffusion model trained on only smiling faces (top) and a DoubleGen diffusion model (bottom). Columns contain coupled samples, with the random seed set to the same value before generation. The stars mark the most qualitatively different pairs. What’s visible: two horizontal rows, each showing twelve AI-generated smiling portraits. Starred columns highlight the biggest shifts: in those pairs, DoubleGen produces faces with traits under-represented among smiling faces in the original data. Non-starred columns look nearly identical between the two rows.

New paper on generative modeling of counterfactual distributions! We give a way to answer "what if" questions with generative models.

For example: what would faces look like if they were all smiling?

arxiv.org/abs/2509.16842

24.09.2025 20:42 👍 8 🔁 1 💬 1 📌 0

Same - me since I was 4. CGM is fantastic.

09.09.2025 13:35 👍 2 🔁 0 💬 1 📌 0

Carlos Cinelli, Avi Feller, Guido Imbens, Edward Kennedy, Sara Magliacane, Jose Zubizarreta
Challenges in Statistics: A Dozen Challenges in Causality and Causal Inference
https://arxiv.org/abs/2508.17099

26.08.2025 05:56 👍 10 🔁 4 💬 0 📌 0

I want to advertise some relatively recent work which I really like, and have been fortunate to play a small role in.

The paper is titled "A New Proof of Sub-Gaussian Norm Concentration Inequality" (arxiv.org/abs/2503.14347), led by Zishun Liu and Yongxin Chen at Georgia Tech.

19.08.2025 08:28 👍 36 🔁 9 💬 1 📌 0

Neat AI product for improving technical writing.

Tried it on a 50 page draft of a causal ML paper. Of its top 10 comments, 4 concerned minor technical issues I'd missed (notation error, misapplication of definition, etc.). In my experience, vanilla chatbots wouldn't have caught these.

24.07.2025 05:48 👍 5 🔁 1 💬 0 📌 0

Starting to look like I might not be able to work at Harvard anymore due to recent funding cuts. If you know of any open statistical consulting positions that support remote work or are NYC-based, please reach out! 😅

04.06.2025 19:02 👍 152 🔁 96 💬 11 📌 7

I've advised 15 PhD students—10 were international students. All graduates continue advancing U.S. excellence in research and education. Cutting off this pipeline of talent would be shortsighted.

23.05.2025 03:36 👍 8 🔁 2 💬 0 📌 0

I'm a current Harvard graduate student and I found out today that I had my NSF GRFP terminated without notification. I was awarded this individual research fellowship before even choosing Harvard as my graduate school

22.05.2025 21:38 👍 897 🔁 313 💬 45 📌 12

Had a great time presenting at #ACIC on doubly robust inference via calibration

Calibrating nuisance estimates in DML protects against model misspecification and slow convergence.

Just one line of code is all it takes.

19.05.2025 00:02 👍 19 🔁 1 💬 1 📌 2

Thanks for the pointer! We'll check it out

01.05.2025 21:37 👍 0 🔁 0 💬 0 📌 0

Our main insight is that smooth divergences - like the Sinkhorn - behave locally like an MMD, and so it suffices to compress with respect to that criterion. This insight draws from recent works studying distributional limits of Sinkhorn divergences (Goldfeld et al., Gonzalez-Sanz et al.).

30.04.2025 12:59 👍 2 🔁 0 💬 0 📌 0

We build on earlier coreset selection works that compress with respect to maximum mean discrepancy (MMD), including kernel thinning (Dwivedi and @lestermackey.bsky.social) and quadrature (Hayakawa et al.).

30.04.2025 12:59 👍 2 🔁 0 💬 3 📌 0

We pay special attention to the Sinkhorn divergence from optimal transport. Using our method, CO2, a dataset of size n can be compressed to about size log(n) without meaningful Sinkhorn error.

30.04.2025 12:59 👍 3 🔁 0 💬 1 📌 0

The Sinkhorn reconstruction error in various dimensions (left) and dataset sizes (right). In the first plot the sample size is fixed at n=25,000, and for the latter the dimension is fixed at d=10. The proposed compression method, CO2, outperforms random sampling in all settings considered.

Q-Q plots of the Sinkhorn reconstruction error (left) and l1 error between the label proportions (right) of the compressed data as compared to random samples. The proposed compression method, CO2, outperforms random sampling in all settings considered.

New paper, led by my student Alex Kokot!
We study dataset compression through coreset selection - finding a small, weighted subset of observations that preserves information with respect to some divergence.
arxiv.org/abs/2504.20194

30.04.2025 12:59 👍 11 🔁 1 💬 2 📌 0

The NIH overhead cut doesn't just hurt universities.

It's deadly to the US economy.

The US is a world leader in tech due to the ecosystem that NIH and NSF propel. It drives innovation for tech transfer, creates a highly-skilled sci/tech workforce, and fosters academic/industry crossfertilization.

08.02.2025 02:03 👍 1346 🔁 512 💬 30 📌 20

Agreed. And when misspecified, the MLE is estimating a Kullback-Leibler projection of the true distribution onto the misspecified model (and is consistent for that as n->infinity).

24.01.2025 18:13 👍 3 🔁 0 💬 1 📌 0

Thrilled to share our new paper! We introduce a generalized autoDML framework for smooth functionals in general M-estimation problems, significantly broadening the scope of problems where automatic debiasing can be applied!

22.01.2025 13:54 👍 19 🔁 7 💬 1 📌 0

Papers I Liked 2024 | David Childers This has been another year where I felt like I slacked on my reading, and that probably is genuinely true for the tumultuous last half, but my read folder lists 154, so I can pick out a few that I lik...

My traditional end-of-year review: some papers I read and liked in 2024.
donskerclass.github.io/post/papers-...

31.12.2024 14:51 👍 32 🔁 6 💬 2 📌 2

Welcome, @danielawitten.bsky.social!

24.11.2024 11:04 👍 5 🔁 0 💬 0 📌 1

New paper! arxiv.org/pdf/2411.14285

Led by amazing postdoc Alex Levis: www.awlevis.com/about/

We show causal effects of new "soft" interventions are less sensitive to unmeasured confounding

& study which effects are *least* sensitive to confounding -> makes new connections to optimal transport

22.11.2024 04:39 👍 59 🔁 14 💬 3 📌 0

👋 In Tokyo this academic year, on sabbatical at the Institute of Statistical Mathematics.

In town and interested in causal ML? Would love to grab coffee and chat.

12.11.2024 10:51 👍 7 🔁 2 💬 0 📌 0

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two.

"The Elements of Differentiable Programming"

link: arxiv.org/abs/2403.14606

Basically: "autodiff - it's everywhere! what is it, and how do you use it?" seems like a good resource for anyone interested in data science, machine learning, "ai," neural nets, etc

#blueskai #stats #mlsky

02.04.2024 00:31 👍 32 🔁 13 💬 0 📌 0

Alex Luedtke

Latest posts by Alex Luedtke @alexluedtke