Phillip Isola's Avatar

Phillip Isola

@phillipisola

Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI. https://web.mit.edu/phillipi/

5,624
Followers
91
Following
84
Posts
28.09.2024
Joined
Posts Following

Latest posts by Phillip Isola @phillipisola

I agree. For example, I support bans on certain kinds of military and surveillance use. I'm generally pro regulation of this tech.

06.03.2026 15:01 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Oh maybe this is better put like this:

"Ask not if AI is good or bad, ask what you can do to make it better"

06.03.2026 05:40 πŸ‘ 11 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The AI discourse sometimes seems to center on "Is AI good or is it bad?"

I find this framing unproductive. AI is not a fixed thing.

I would prefer to ask "How might we use this technology for good, and mitigate the bad?"

What a shame if the best use we can come up with is no use at all.

06.03.2026 05:29 πŸ‘ 37 πŸ” 5 πŸ’¬ 4 πŸ“Œ 2

I agree, I don't think it's a small difference, or rather this isn't some finicky hyperparameter. Different merics make different claims about the kinds of structure that are converging vs not.

I think it's actually more than topology that is converging, and also includes local-ish geometry...

25.02.2026 18:44 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The reason was because we were seeing noisy trends with CKA. We didn't realize the extent of the bias but had lots of misgivings with it as a metric. Local similarity (mknn) showed clearer trends and made more sense to us. I think the ARH paper does a nice job justifying this more clearly.

25.02.2026 03:39 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This isn't quite right: the PRH paper used the same measure of local similarity, mknn, as the new paper advocates (in fact we introduced it). The main paper results were entirely using mknn. In the appendix we did compare to CKA.

But I agree, lots to refine in how we do rep similarity analysis!

25.02.2026 02:11 πŸ‘ 8 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

Today we present a new framework for measuring human-like general intelligence in machines: studying how and how well they play and learn to play all conceivable human games compared to humans. We then propose the AI Gamestore a way to sample from popular human games to evaluate AI models.

23.02.2026 15:49 πŸ‘ 20 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0
Post image

Our grad-level "Deep Learning" course (MIT's 6.7960) is now freely available online through OpenCourseWare: ocw.mit.edu/courses/6-79...

Lecture videos, psets, and readings are all provided.

Had a lot of fun teaching this with @sarameghanbeery.bsky.social and @jeremybernste.in!

11.02.2026 17:51 πŸ‘ 121 πŸ” 38 πŸ’¬ 3 πŸ“Œ 2

That's true, I haven't seen anyone saying otherwise.

24.01.2026 23:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Hmm, that quote seems reasonable to me. A paper can have mistakes in some aspects and be valid in other aspects.

I'm not advocating for a particular policy or lack of penalty. Just I think it would help to approach this with nuance.

24.01.2026 19:09 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I directionally agree. Many citations are of the form "here is a paper we didn't read, and that you, dear reader, need not read, but that is within a (large) epsilon ball of our current work." Those citations play a role in credit/novelty assignment, but can perhaps be better done by (future) LLMs.

22.01.2026 07:13 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Yeah I think that's likely and what I mean by "reality" isn't just physics but also cultural reality. And there's still a question of whether we are converging to a factual representation of human affairs or to a subjective view of the world shaped by human biases...

08.01.2026 02:57 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

That's a good point but to clarify we didn't train on the Wikipedia dataset, that's just the dataset used to test on. So the limitation that Efros is pointing out is not about traiing on similar data but about testing on data that "favors" finding similarity.

07.01.2026 23:10 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Super accessible write up on what we and others have been up to on representational convergence in AI models, and the platonic representation hypothesis, along with contrary views.

I'm a big fan of Quanta Magazine, so it was very cool to see them cover this!

07.01.2026 22:26 πŸ‘ 23 πŸ” 5 πŸ’¬ 0 πŸ“Œ 1

Impromptu NeurIPS meetup: "representational convergence by the beach." We will meet at ballroom 20c (near lunch) 2pm Fri and walk over to Marina. Will chat about platonic reps, fractured reps, or anything else about where all these models are heading.

Anyone is welcome to join!

04.12.2025 21:19 πŸ‘ 20 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Right, but why stop at GPT 5.1? Arguably you need to make a serious attempt using GPT 7, or 8, or ...

I think this gets at the heart of why it's so hard to make claims about limitations of broad classes of AI. It's much harder to show that a system can't do X than it is to show that it can do X.

30.11.2025 05:42 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Writing Advice from Matt Stone & Trey Parker @ NYU | MTVU's "Stand In"
Writing Advice from Matt Stone & Trey Parker @ NYU | MTVU's "Stand In" YouTube video by Fabian Valdez

This reminds me of my favorite talk giving advice, which is from Matt Stone and Trey Parker: www.youtube.com/watch?v=vGUN...

29.10.2025 21:09 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Join us TODAY for the 3rd Perception Test Challenge perception-test-challenge.github.io @iccv.bsky.social

Ballroom B, Full day

Amazing lineup of speakers: Ali Farhadi, @alisongopnik.bsky.social, Phlipp Krahenbul, @phillipisola.bsky.social

19.10.2025 18:14 πŸ‘ 7 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1

I agree, I just want to push back on this being pseudoscience, I feel like that's too strong a critique.

But just for the chance of a meal in Paris, happy to take that bet and probably end up wrong :)

17.10.2025 19:35 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I agree that just knowing a lot of facts is not everything. But it seems like their benchmark includes lots more than that: working memory, reasoning, perception, etc?

17.10.2025 18:40 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I get that people might disagree with the framing / marketing. But what makes you feel it is pseudoscience? I only skimmed it.

17.10.2025 17:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
The Platonic Universe: Do Foundation Models See the Same Sky? We test the Platonic Representation Hypothesis (PRH) in astronomy by measuring representational convergence across a range of foundation models trained on different data types. Using spectroscopic and...

I agree, I think at certain scale modality alignment happens without additional explicit incentives. At smaller scale, explicit alignment can be necessary.

This paper shows some effect of alignment increasing with scale, for a domain closer to remote sensing: www.arxiv.org/abs/2509.19453

13.10.2025 16:59 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Right! It's a text only LLM.

13.10.2025 16:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This work is with an amazing team including @sophielwang.bsky.social, @thisismyhat.bsky.social, Sharut Gupta, @shobsund.bsky.social, Chenyu Wang, and Stefanie Jegelka.

9/9

10.10.2025 22:13 πŸ‘ 17 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

More broadly, I think confusion has been created by forming hard distinctions between different modalities, especially between text and sensory data. These distinctions can obscure commonalities. We take the rhetorical stance of erasing the distinctions, and seeing where this leads.

8/9

10.10.2025 22:13 πŸ‘ 22 πŸ” 0 πŸ’¬ 2 πŸ“Œ 1
An Observation on Generalization
An Observation on Generalization YouTube video by Simons Institute for the Theory of Computing

This work was partially inspired by Ilya Sutskever's talk here: www.youtube.com/watch?v=AKMu...

If you concatenate datasets, the model β€œshould” figure out all the synergies and cross-modal relationships, then exploit them to make better inferences. We now have some evidence this can happen.

7/9

10.10.2025 22:13 πŸ‘ 23 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Architecture for Unpaired Multimodal Learner.

Architecture for Unpaired Multimodal Learner.

Suppose you have separate datasets X, Y, Z, without known correspondences.

We do the simplest thing: just train a model (e.g., a next-token predictor) on all elements of the concatenated dataset [X,Y,Z].

You end up with a better model of dataset X than if you had trained on X alone!

6/9

10.10.2025 22:13 πŸ‘ 23 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
Diagram showing paired vs unpaired data

Diagram showing paired vs unpaired data

In β€œBetter Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models,” we study a question I’ve wanted to make progress on for years: can you learn useful multimodal representations from *unpaired* data?

5/9

10.10.2025 22:13 πŸ‘ 24 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

In short: you can β€œjust ask” an LLM to act (a bit) like an image model or an audio model.

This tells us that LLMs know more about the sensory world than we might suspect; you just have to find ways to elicit the knowledge.

4/9

10.10.2025 22:13 πŸ‘ 27 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.

Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.

In β€œWords That Make Language Models Perceive,” we find if you ask an LLM to β€œimagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to β€œimagine hearing,” its representation becomes more like that of an auditory model.

3/9

10.10.2025 22:13 πŸ‘ 35 πŸ” 3 πŸ’¬ 1 πŸ“Œ 1