Ben Grimmer (@profgrimmer)

For the second morning this week, one of my phd students defended (successfully!) 🎉🎓🎉
Today, Alan Luner defended his excellent work, "On Large-Scale Optimization: Optimal Methods and Computer-Assisted Algorithm Design".

I promise this is the last such announcement for the year

11.03.2026 18:57 👍 4 🔁 0 💬 0 📌 0

This morning my PhD student Thabo Samakhoana defended his thesis (successfully!) 🎉🎓🎉
Was a great five years working with him towards his thesis "On Optimal Smoothings and their Applications to Optimization and Deep Learning"

09.03.2026 22:20 👍 6 🔁 0 💬 0 📌 0

This paper generated me new office decorations as well. Below is the strongly convex set we designed that is provably hard for all Frank-Wolfe methods (at least for two steps).

The paper builds this "evil" shape in d dimensions able to counteract any d/2 step method

27.02.2026 20:41 👍 4 🔁 0 💬 0 📌 0

Lower Bounds for Linear Minimization Oracle Methods Optimizing over Strongly Convex Sets We consider the oracle complexity of constrained convex optimization given access to a Linear Minimization Oracle (LMO) for the constraint set and a gradient oracle for the $L$-smooth, strongly convex...

For anyone interested in our lower bound result for Frank-Wolfe methods in Nemirovski and Yudin "zero-chain" lower bounding style, a link: arxiv.org/abs/2602.22608

27.02.2026 14:22 👍 3 🔁 0 💬 0 📌 1

Smoothness and strong convexity are dual to each other, but here we have smoothness of the objective and strong convexity of the constraint set. Linear minimization examines the support function of the set, a dual object

I don't see the connection, but there ought to be symmetry!

27.02.2026 14:22 👍 2 🔁 0 💬 1 📌 0

In constrained optimization by linear minimization oracle methods (Frank-Wolfe), strong convexity of the set was shown to accelerate convergence to also be O(1/T^2) by Garber and Hazan in 2015. Yesterday, I posted a paper giving the matching lower bounds

So why are these two settings mirrored?

27.02.2026 14:22 👍 1 🔁 0 💬 1 📌 0

A small digression on something I find strange in accelerated convex optimization theory:

Since the 80s, in unconstrained minimization by gradient methods, smoothness is known to allow a fast O(1/T^2) convergence rate by Nesterov. Nemirovski and Yudin give matching lower bounds.

27.02.2026 14:22 👍 1 🔁 0 💬 1 📌 0

The pattern continues. We can fractally build an N=31 self-dual pattern constructed of two N=15 patterns or 4 N=7 patterns, carefully sewn together (pun intended).

I'll stop sewing after I finish N=63 :)
Stay tuned for an upcoming paper where this has unexpected algorithmic/engineering value

12.02.2026 18:05 👍 2 🔁 0 💬 1 📌 0

The partition {1,2}{3} above is "self-dual". To get this note, I numbered the 1', 2', 3' nodes counterclockwise. We can use this self-dual partition of size N=3 to build a self-dual partition of N=7 recursively.

Physically self-dual == "dream-catcher" mirrors blue and green
3/4

12.02.2026 18:05 👍 0 🔁 0 💬 1 📌 0

A noncrossing partition of {1 ... N} requires if you connect grouped numbers in a circle by string, no strings cross. Blue is {1,2}{3}.
The "Kreweras" dual of this adds numbers 1', ... N', between each, taking the maximal noncrossing partition of them. Green is dual to blue.
2/4

12.02.2026 18:05 👍 0 🔁 0 💬 1 📌 0

5 circles with yarn between them representing non-crossing partitions and their duals

Lately, non-crossing partitions have shown up out of nowhere in my research, which have a lovely duality structure. This inspired some good art and fractals :)

Wanted to share the fun here (just sharing the pretty art for now, the research story will come in due time)
1/4

12.02.2026 18:05 👍 3 🔁 1 💬 1 📌 0

Optimization Letters Optimization Letters covers all aspects of optimization, including theory, algorithms, computational studies, and applications. This journal provides an ...

Happy to announce that my work "On optimal universal first-order methods for minimizing heterogeneous sums" just received the Optimization Letters Best Paper Prize.

link.springer.com/journal/1159...

This work is part of a larger trend, fighting the brittleness of classic smooth/nonsmooth models.

02.02.2026 15:42 👍 8 🔁 0 💬 0 📌 1

Sunday morning spent setting up my office in the new @hopkinsdsai.bsky.social building. I gained a good amount more wall space, so I have the freedom to grow my collections again

11.01.2026 17:42 👍 7 🔁 0 💬 0 📌 0

Join us in advancing data science and AI research! The Johns Hopkins Data Science and AI Institute Postdoctoral Fellowship Program is now accepting applications for the 2026–2027 academic year. Apply now! Deadline: Jan 23, 2026. Details and apply: apply.interfolio.com/179059

19.12.2025 13:29 👍 11 🔁 9 💬 0 📌 5

An Elementary Proof of the Near Optimality of LogSumExp Smoothing We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, ...

A link for those interested in reading 🤓
arxiv.org/abs/2512.10825
(4/4)

13.12.2025 04:35 👍 3 🔁 0 💬 0 📌 0

The "bad" news: Despite being *nearly* optimal, we show for fixed small dimensions that strictly better smoothings exist, approximating the max function more closely and attaining our lower bound. So LogSumExp is only nearly, not exactly, minimax optimal. (3/4)

13.12.2025 04:35 👍 3 🔁 0 💬 1 📌 0

LogSumExp is within 20% of a lower bound we derive on how good *any* similar smoothing can be. The proof just combines inequalities for smooth convex functions, no heavy machinery needed.

The good news: We aren't leaving much on the table by choosing logSumExp. (2/4)

13.12.2025 04:35 👍 0 🔁 0 💬 1 📌 0

My student Thabo Samakhoana and I have been obsessed with smoothings lately. The softmax/logSumExp smoothing seems to be the standard everywhere in ML and optimization.
So, in what sense is this choice "optimal"?

We found some "elementary" answers, both good and bad news (1/4)

13.12.2025 04:35 👍 4 🔁 0 💬 1 📌 0

For those interested in reading 🤓
arxiv.org/pdf/2511.14915

20.11.2025 03:45 👍 1 🔁 0 💬 0 📌 0

This polynomial characterization opens a lot of new directions in algorithm design. As a 3D printing enthusiast I was quick to want to visualize the set of optimal methods

Below is the region (living in 6 dimensions) of optimal 3-step methods that happens to sit nicely in 3D 4/

20.11.2025 03:45 👍 3 🔁 0 💬 1 📌 0

Our new work provides a complete description of all minimax-optimal methods. We give a set of polynomial equalities that every optimal method must satisfy ("H invariants") and similarly a needed set of polynomial ineq ("H certificates")

Together these are "if and only if"!! 3/

20.11.2025 03:45 👍 2 🔁 0 💬 1 📌 0

This is a classic type of problem; fixed points are a broad modelling tool, capturing, for example, gradient descent

In terms of algorithm design (my interest): In recent years the community pinned down an optimal method (Halpern) but showed that infinitely many others exist 2/

20.11.2025 03:45 👍 1 🔁 0 💬 1 📌 0

A new paper out with TaeHo Yoon and Ernest Ryu:
We looked at the design of optimal fixed-point algorithms.
That is, seeking to approximately solve T(y)=y using as few evaluations of the operator T() as possible. Maximally efficient methods are "minimax optimal" 1/

20.11.2025 03:45 👍 6 🔁 0 💬 1 📌 0

It's all performance estimation under the hood :)
That tool does wonders for conceptual framing

19.11.2025 13:38 👍 1 🔁 0 💬 0 📌 0

Some links for those interested 🤓
Smooth convex: arxiv.org/abs/2412.06731
Adaptive smooth convex: arxiv.org/abs/2510.21617
Nonsmooth convex: arxiv.org/abs/2511.13639

18.11.2025 14:58 👍 4 🔁 0 💬 1 📌 0

We have done a wide range of numerics for smooth, convex settings where our resulting subgame perfect gradient methods SPGM compete with state-of-the-art L-BFGS methods and beat existing adaptive gradient methods in iter and realtime.

I am excited about the future here :)
4/

18.11.2025 14:58 👍 2 🔁 0 💬 1 📌 0

In a series of works with the newest showing up on arxiv TODAY, we show that this strengthened standard is surprisingly attainable!

Today we proved a method of Drori and Teboulle 2014 is a subgame perfect subgradient method and designed a new, subgame perfect proximal method 3/

18.11.2025 14:58 👍 4 🔁 0 💬 1 📌 0

Rather than asking to do the best on the worst-case problem, we should be asking that, as it seems first-order information, our alg updates to do the best against the worst problem **with those gradients**
This demands a dynamic form of optimality, called subgame perfection. 2/

18.11.2025 14:58 👍 5 🔁 0 💬 1 📌 0

Lately, I have been obsessed with developing theoretically based optimization algorithms that actually attain the best practical performance.
Alas, the classic model of minimax optimal methods is overly conservative; it overfits to tune its worst-case.
We found a path forward 1/

18.11.2025 14:58 👍 15 🔁 4 💬 2 📌 0

Enjoyed being part of the Brin Mathematical Research Center's summer school on Scientific Machine Learning last week. Many very good talks and always nice to visit UMD!

14.08.2025 20:28 👍 2 🔁 0 💬 0 📌 0

Ben Grimmer

Latest posts by Ben Grimmer @profgrimmer