This call is still open. I am looking to recruit, as well as many other faculty at Cornell. We review folders as they come, and will send offers until all positions are filled.
Please share with your network 🙏
@yoavartzi.com
LM/NLP/ML researcher ¯\_(ツ)_/¯ yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io
This call is still open. I am looking to recruit, as well as many other faculty at Cornell. We review folders as they come, and will send offers until all positions are filled.
Please share with your network 🙏
We have a mailing list for big announcements:
groups.google.com/g/colm-annou...
We use it very sparingly, roughly 1-2 times a year
Call for papers -- due March 31, 2026 (abstracts due March 26)
colmweb.org/cfp.html
Call for workshops -- due April 14, 2026
colmweb.org/cfw.html
Hence, this is an interesting and important benchmark. Through a simple environment, it exposes a fairly fundamental flaw in current models
This is not surprising, and aligns with other findings in the literature regarding visual reasoning and manipulation
The prompts do provide rudimentary illustration. The stateful version allows the model to see the outcome of its own actions, technically allowing it to infer the physics. Generally though, the result for LLMs out of the box is negative.
Most of the experiments are not with VLMs, but with a diverse set of RL methods.
Do LLMs understand physics? They definitely generate outputs that seem to indicate so.
Submit to COLM! Deadline of March 31. This llama gets to enjoy his holidays and isn't stressed out just yet...
Zoe presented this paper at NeurIPS D+B: it's all knots(🪢🪢🪢!?), no language tokens were harmed (or reinforced) in the process
It's such a fun and creative paper, a real mind twist ;)
You really get to think carefully about visual intelligence looking at these knots 🪢
Hi all, I will be at #NeurIPS2025 to present my work on stress-testing looooooong visual reasoning with KnotGym🥨
Let's talk, whether or not your VLM that can see 14 million possible futures like Doctor Strange
COLM is going to San Francisco for 2026!
🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square
Website and CFPs for papers and workshops coming up soon!
This is maybe counterintuitive to the original intention of just index the chaos to make it accessible. I guess that ideal of search softened a long time ago
That's definitely part of it, because this digestions has deeper history. Search engine indexing also seems just easier, so companies opt to it, even pre AI-overview-everything
Re peer-rev --> pre-print servers: arXiv is a simple uniform place to store. Indexing engines love it, so if you want something to be searchable, nothing is better. To make things worse, at times it seems like journals/proceedings almost play a game of hide-and-seek with PDFs
Re position papers: I don't think anyone can deny how effective some of these papers became for citations counts
Is this all just a big practical joke for ChatGPT? I have been told god doesn't play dice with the world, but I guess AGI does :)
It's a Thursday though ....
All available here:
lm-class.org
ChangeLog here:
lm-class.org/CHANGELOG.md
Pushed a big update to LM-class (v2025.2) -- this second version makes a much more mature resource
Many refinements of lecture slides + significant improvements to the assignments
Many thanks to @ch272h.bsky.social, Yilun Hua, and Shankar Padmanabhan for their work on the assignments
This kind of ad-hoc adaptation is hard in general of LLMs, but you can post-train to it for some degree
arxiv.org/abs/2508.06482
I suspect contemporary ASR models have the same backbone, so maybe applicable too
More broadly, there is a lot of interesting stuff to do in this space of adaptation
I am potentially recruiting a postdoctoral fellow through this program. If interested, name me as a mentor, and ping me to let me know that you are applying! The process includes some sort of interview, so I can try to squeeze a few of these in advance (it will help a lot!)
Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.
Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971
Cornell (NYC and Ithaca) is recruiting AI postdocs, apply by Nov 20, 2025! If you're interested in working with me on technical approaches to responsible AI (e.g., personalization, fairness), please email me.
academicjobsonline.org/ajo/jobs/30971
Wild
There's the legit gaming, which is just optimizing for the metrics and breaking them. Then there's the really fake stuff, like citation rings. You would thing citation translate to bitcoins with the level of creativity and effort that people put into it
The top citer has >1k papers, with a PhD from 2007. That's one hell of a steady rate ¯\_(ツ)_/¯
It's pretty crazy how the entire citation game has been manipulated. It's enough to give a quick look at Semantic Scholar for Bengio, who GScholar just gave 1M citations. SScholar gave 0.5M, but it's not only the number, it's the top citers
Recent IVADO talk is now on YouTube:
www.youtube.com/watch?v=ozHk...
Paper here:
Pre-training Limited Memory Language Models with Internal and External Knowledge
Linxi Zhao et al.
arxiv.org/abs/2505.15962
It definitely doesn't seem to hold in process, which lacks any similar regulation or structure. The (sci-fi-ish?) argument is that one cannot disentangle deployment/impact from development (i.e., one cannot shut it down).