Martin = one of the kindest people I know! Donβt miss this opportunity to learn from one of the best in their field!
Martin = one of the kindest people I know! Donβt miss this opportunity to learn from one of the best in their field!
I'm hiring a new lab manager for my lab @ UCSD! For more info on the lab, check out our website: lillab.ucsd.edu
Target start date is June 1 (flexible) and application deadline is March 26. Please share with anyone you think might be a good fit!
Apply here: employment.ucsd.edu/laboratory-c...
π’ PhD position in Developmental Language Modelling
(PLZ RT)
What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social
Thanks to everyone who gave us feedback: @lampinen.bsky.social, Ellie Pavlick, @glupyan.bsky.social, @phillipisola.bsky.social, and others!
Work with Tianyang Xu, @mudtriangle.com, Karen Livescu, and Greg Shakhnarovich!
This relates more broadly to literature reconciling how meaning obtained from relational grounding in language interacts with that obtained from other forms of grounding (see Mollo and Millere/@raphaelmilliere.com) and lays out a research program on the role of category coherence in learning!
11/
This suggests that representations learned from language are structured so as to expect incoming category information to cohere in a specific way in order to show cross-modal generalization!
10/
Results from counterfactual shuffling experiments. Models tend to generalize equally well when the coherence was preserved and not so well when it was disrupted, even in the absence of all hypernyms.
If models were generalizing arbitrarily, then we shouldnβt see any differences in their performance across these settings (i.e., no matter what, crow == bird). However, we find that models seem to only generalize when the training data preserves category coherence!
9/
Macro F1 scores on unseen images vs. Visual coherence across the 53 hypernym categories for the Qwen3-1.7B backbone (at 100% ablation). r (Pearsonβs correlation) = .43, indicating positive relation.
By coherence we mean the visual similarity between members of the same category, which we calculate using the DINOv2 embeddings used in our VLM training. Even in the original configuration, we found models to perform better on categories that were visually more coherent
8/
Examples of image-leaf mappings resulting from our counterfactual shuffles, in comparison with the original configuration (top). VC indicates the visual coherence of the category under the data configuration. VC for birds in the original set: .30; for within-category shuffles: .30; for across-category shuffle: .12.
To test this, we created counterfactual data: 1) where category-label pairings were shuffled across categories (πͺ= βrobinβ; πΈ= βcrowβ) and 2) where they were shuffled within categories (π¦
=βrobinβ; π¦=βcrowβ). These swaps also manipulate the categoriesβ visual coherence
7/
figure depicting two hypotheses that models might entertain β 1) arbitrary prediction of hypernyms regardless of what the input looks like during supervision; 2) sensitivity to the fact that the category (e.g., birds) is not visually coherent.
Are LMs simply executing something like βIf crow THEN bird?β regardless of what the image shows? E.g., if during supervision we label images of kayaks as βcrowβ would the model still generalize to birds or does the model expect categories to have some level of coherence?
6/
Main results (see fig 4 in the paper). Salient result: models tend to generalize to hypernyms without any evidence encountered during training, suggesting that they show cross-modal generalization.
Having established these preconditions to our task, we then find that models are also able to generalize (non-trivially) to hypernyms without ever having βseenβ them explicitly, suggesting that LM representations support cross-modal generalization!
5/
left: Plot showing that models using the DINOv2 Encoder, which has never seen text information tend to generalize similar to those using the SigLIP encoder, which has seen text information. Right: Table showing that both Qwen3 LMs to demonstrate non-trivial hypernymy knowledge.
We establish that this paradigm works in the first place with a vision encoder that has never been trained on language data (i.e., β SigLIP β
DINO), that the models learn the task on the lower-level categories themselves, and that the LMs indeed have taxonomic knowledge
4/
3 papers on hypernym acquisition in models (Hearst, 1992; Geffet and Dagan, 2005) and humans (Wilson et al., 2023) - see paper for details.
Taxonomic knowledge is interesting because of number of hypotheses about the learnability of category knowledge from linguistic cues, for both computational models and humans. Evidence of cross-modal generalization would lend strong support for these hypotheses!
3/
Figure depicting an instance of our experiments. During training, the projector is deprived of explicit supervision on high-level categories (hypernyms, e.g., animal) at various amounts, and is trained to detect the presence (and absence) of lower-level categories (e.g., koala), keeping the image encoder and the LM backbone frozen. After training, the VLM is tested for generalization to hypernym categories, given previously unseen images.
We use a VLM-training paradigm (frozen vision encoder w/o language training mapped to frozen LM) where we partially supervise on lower level categories during training, and then test if the LM recovers hypernymy knowledge from what it has seen in language data.
2/
title section of the paper: βCross-Modal Taxonomic Generalization in (Vision) Language Modelsβ by Tianyang Xu, Marcelo Sandoval-CastaΓ±eda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.
What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?
Excited to share new work understanding βCross-modal taxonomic generalizationβ in (V)LMs
arxiv.org/abs/2603.07474
1/
I want to unwatch this
@tylerachang.bsky.social and I will be presenting the Goldfish as an oral at #LREC2026 in Mallorca! π΄
Short post on what I call the "no-magic approach to understanding intelligent systems" β the philosophy I think of as motivating our work on understanding intelligence without resorting to magical thinking about AI or humans!
infinitefaculty.substack.com/p/the-no-mag...
π¨New Paper!π¨ How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertaintyβ¦ π§΅(1/10)
Check out our special theme: new missions for NLP research!
Whatβs a paper that made you think that way π
I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science ππ
Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!
Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!
Congratulations Andreas!!
Some days you finish 5 meta-reviews in ~one go, and some days you take 1.5 days to complete one meta-review. Such is the AC life!
Woohoo, will be in touch soon!
Wow!! Good luck with whatever it is you do next β so excited for you!!
Watch slow horses already!!
Japonaise and Jahunger mentioned in same thread π my fav places in Boston!
South by Semantics Workshop: "New horizons in evaluating pragmatic competence in language models", Jennifer Hu (Johns Hopkins University), March 6, 2026.
I'm looking forward to @jennhu.bsky.social's South by Semantics talk next week at UT Austin! She'll discuss "micro-pragmatics" inferences and world modeling in language models π€
Our department is hiring an Assistant Teaching Professor!! This is a joint-appointed position with Computational Social Sciences (css.ucsd.edu). It's 75+ degrees F and sunny today, just thought I'd mention apol-recruit.ucsd.edu/JPF04461