Paul Chang (@mummitrollet)

Moms, Models and Medicine | Magnus Ross A good friend of mine is deep in the world of startups and spends a lot of his time doing idea validation—that is, trying to understand if there is a market for a given idea or product. Despite the fact that in the startup world success is eventually judged by sales or profits, whereas in ML4H it is more likely to be adoption by clinicians and, hopefully, an associated improvement in clinical outcomes, both rely on designing something that people actually want and will use. Therefore, I think many of the tools that help entrepreneurs validate ideas can be repurposed to help researchers undertake projects with real impact.

I wrote something about building systems people actually want as an academic in ML. It's pretty much an open letter to 6-months-ago me.

magnusross.github.io/posts/moms-m...

29.08.2025 13:12 👍 2 🔁 2 💬 1 📌 1

❗️ We just expanded our capacity of B200 SXM6 180GB servers – available in the DataCrunch Cloud Platform.

The best thing is…

You can deploy the Blackwell platform without approvals.

Just sign in, select the instance type, and start your deployment:

cloud.datacrunch.io?utm_source=b...

25.06.2025 17:40 👍 0 🔁 1 💬 0 📌 0

Also pretty cool to see open source community building on top of each other!

30.05.2025 08:07 👍 1 🔁 0 💬 0 📌 0

The paper also suggests Group Tied Attention (GTA), which works in the opposite direction and draws inspiration from MLA, incorporating those techniques into GQA.

30.05.2025 08:07 👍 0 🔁 0 💬 1 📌 0

The technique called Grouped Latent Attention (GLA) can now be split across devices according to the group, providing higher throughput without a drop in performance by maintaining high arithmetic intensity and achieving better parallelism.

30.05.2025 08:07 👍 0 🔁 0 💬 1 📌 0

Well, the paper suggests a hybrid method. What about using MLA and adding groups?

30.05.2025 08:07 👍 0 🔁 0 💬 1 📌 0

Instead, one must make a copy of the latent component across GPUs, which feels wasteful.

30.05.2025 08:06 👍 0 🔁 0 💬 1 📌 0

This is where MLA is somewhat awkward, and GQA scores some points back. MLA uses a single large latent head that must be replicated across all tensor-parallel GPUs, which means that sharding the attention computations across GPUs cannot be done.

30.05.2025 08:06 👍 0 🔁 0 💬 1 📌 0

First of all, a confession! In the blog titled 'Multi-Head Latent Attention: Benefits in Memory and Computation', we didn't tell the whole story—the benchmarking on a single GPU. In reality, for DeepSeek V3-style models, parallelization is needed.

30.05.2025 08:05 👍 0 🔁 0 💬 1 📌 0

The paper focuses on designing more effective decoding attention for inference in light of Multi-head Latent Attention (MLA) and Group Query Attention (GQA).

30.05.2025 08:05 👍 0 🔁 0 💬 1 📌 0

Hardware-Efficient Attention for Fast Decoding LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decodi...

A new paper just dropped from Tri Dao(🐐)'s lab!

arxiv.org/abs/2505.21487

Here is my hot take!

30.05.2025 08:04 👍 1 🔁 1 💬 1 📌 0

🆕 Inference API for FLUX.1 Kontext [max] & [pro] are now available on DataCrunch!

We are an infrastructure partner of Black Forest Labs for Kontext, a suite of generative flow matching models for text-to-image and image-to-image editing.

Learn more: datacrunch.io/managed-endp...

29.05.2025 20:50 👍 0 🔁 1 💬 1 📌 0

Symposium AI - Summer Inference · Luma Join 250 leading AI builders for an epic night in Helsinki! Symposium AI events bring together top AI talent, researchers, and engineers who are actively…

🚨 Summer Inference by Symposium AI is happening next Wednesday, June 4, at 16:00-22:00.

🇫🇮 This event will bring together 250 AI engineers, researchers, and founders under one roof in Helsinki.

🔗 You can still grab one of the last remaining seats: lu.ma/x5hhj79x

26.05.2025 13:37 👍 1 🔁 1 💬 0 📌 0

All About Rooflines | How To Scale Your Model When we run algorithms on hardware, we're bounded by three things: how fast it can do math (OPs/second), the bandwidth available for moving data around (bytes/second), and the total memory available t...

Some links that helped me to understand the roofline model.

jax-ml.github.io/scaling-book...

kipp.ly/transformer-...

09.05.2025 08:01 👍 0 🔁 0 💬 0 📌 0

Multi-Head Latent Attention: Benefits in Memory and Computation Multi-Head Latent Attention (MLA) vs. Group Query Attention (GQA): Transformer inference optimization in DeepSeek V3 with lower KV cache and higher FLOPs/s.

datacrunch.io/blog/multi-h...

The blog post explains these terms and how they relate to algorithm intensity. Let us know if you have any questions or spot errors.
#MLSky

09.05.2025 07:59 👍 0 🔁 0 💬 1 📌 0

However, more is at play; revisiting Kipply's infamous Transformer Inference Arithmetic article shows that the MLA mechanism used during inference is now compute-bound 🖥️ and not memory-bound 💾.

09.05.2025 07:58 👍 0 🔁 0 💬 1 📌 0

Looking at the projections involved in DeepSeeek's attention (MLA) of the KV cache automatically makes one think it means less memory needed in HBM, preventing dreaded out-of-memory errors 👿 .

09.05.2025 07:58 👍 0 🔁 0 💬 1 📌 0

Algorithm hardware co-design was a big reason the whale 🐋(DeepSeek) made such a splash 💦 with its V3 and R1 releases.

09.05.2025 07:52 👍 2 🔁 0 💬 1 📌 0

Cost-aware simulation-based inference Simulation-based inference (SBI) is the preferred framework for estimating parameters of intractable models in science and engineering. A significant challenge in this context is the large computation...

"Cost-aware simulation-based inference" is accepted at AISTATS 2025.

Check out our poster #205 on Sunday May 4th in Hall A-E if you are in Phuket. Finland's rising star @huangdaolang.bsky.social will be there to assist you :D

arxiv.org/abs/2410.07930

@fxbriol.bsky.social @samikaski.bsky.social

02.05.2025 06:45 👍 18 🔁 5 💬 2 📌 1

This is very true! Go and speak to people in more old-school businesses and you quickly realize that with current models you could already do so much.

27.04.2025 09:13 👍 4 🔁 0 💬 0 📌 0

I don’t mean to be a broken record but AI development could stop at the o3/Gemini 2.5 level and we would have a decade of major changes across entire professions & industries (medicine, law, education, coding…) as we figure out how to actually use it & adapt our systems.

AI disruption is baked in.

26.04.2025 21:57 👍 227 🔁 21 💬 13 📌 3

1/ If you are at ICLR / AABI / AISTATS, check out work from our lab and collaborators on *inference everywhere anytime all at once*!

Go talk to my incredible PhD students @huangdaolang.bsky.social & @chengkunli.bsky.social + amazing collaborator Severi Rissanen.

@univhelsinkics.bsky.social FCAI

27.04.2025 08:53 👍 22 🔁 5 💬 1 📌 0

The AI Researcher's Guide to a Non-Boring Bluesky Feed | Naomi Saphra How to migrate to bsky without a boring feed.

I wrote something up for AI people who want to get into bluesky and either couldn't assemble an exciting feed or gave up doomscrolling when their Following feed switched to talking politics 24/7.

26.04.2025 01:31 👍 341 🔁 92 💬 23 📌 21

1/10🔥 New paper alert in #AABI2025 Proceedings!

Normalizing Flow Regression (NFR) — an offline Bayesian inference method.

What if you could get a full posterior using *only* the evaluations you *already* have, maybe from optimization runs?

22.04.2025 11:01 👍 23 🔁 6 💬 1 📌 1

@aidanscannell.bsky.social

21.04.2025 17:32 👍 1 🔁 0 💬 0 📌 0

CODEML Workshop Championing Open-source Development in Machine Learning.

Tired of your open-source ML work not getting the academic recognition it deserves? 🤔 Submit to the first-ever CodeML workshop at #ICML2025! It focuses on new libraries, improvements to established ones, best practices, retrospectives, and more.
codeml-workshop.github.io/codeml2025/

16.04.2025 10:15 👍 35 🔁 6 💬 0 📌 4

Average cost for a student is 86,000$ a year just saying 😜

16.04.2025 08:35 👍 1 🔁 0 💬 0 📌 0

Congrats Pierre!

16.04.2025 06:07 👍 1 🔁 0 💬 1 📌 0

This is so true!

16.04.2025 06:02 👍 0 🔁 0 💬 0 📌 0

Sounds fun! I want to hear about it when you are back!

16.04.2025 05:08 👍 2 🔁 0 💬 0 📌 0

Paul Chang

Latest posts by Paul Chang @mummitrollet