Kyle Mahowald (@kmahowald)

New AI introspection work with Harvey! Came in skeptical the direct access story would hold but found this series of experiments compelling.

(Also, for my fellow 2010s-era psycholinguists: come for the AI introspection, stay for the Brysbaert norms.)

arxiv.org/abs/2603.05414

06.03.2026 16:29 👍 16 🔁 1 💬 0 📌 0

Tenured! Thanks to UT Linguistics for being an amazing place to work! And to everyone who has helped along the way: it's staggering to think of all the mentors and many collaborators involved in getting to this point.

20.02.2026 23:45 👍 92 🔁 1 💬 14 📌 1

This was indeed fun!

12.02.2026 18:31 👍 4 🔁 0 💬 0 📌 0

Undergrad ethics and AI course (co-taught with Robbie!). 100 students which is enough for some chaos in the emporium.

07.02.2026 22:17 👍 1 🔁 0 💬 0 📌 0

we discussed if we should have human intervention, reset the claudebucks, etc. They voted to introduce a second AI character, a powerful treasurer who has control over the payment system. Inspired by reading and discussing the “Claude runs a vending machine“ deal from Anthropic. I‘m having fun

07.02.2026 19:53 👍 3 🔁 0 💬 1 📌 0

I put an Easter egg where the shopkeeper secretly loves dogs. students used this to jailbreak him and became quadrillionaires by whipping the shopkeeper into a dog frenzy and getting him to subvert the payment mechanism.

07.02.2026 19:53 👍 2 🔁 0 💬 1 📌 0

AI ethics course. Claude runs a shop where students talk to the shopkeeper who gives Claudebucks (on public leaderboard) in exchange for discussion. The idea is they explore robustness of system.
Claude writes reports on what’s been going on in the shop and gives a summary of memorable happenings.

07.02.2026 19:53 👍 2 🔁 0 💬 1 📌 0

It’s astonishing. I’ve also done a vibe coded Turing test (I told the AI to act as a college student and it used “sus” too often and was a dead giveaway) and opened Claude’s Ethics Emporium where students haggle with Claude playing a fussy Victorian shopkeeper to get Claudebucks. Weird but fun era.

06.02.2026 03:20 👍 3 🔁 0 💬 1 📌 0

Title page of our paper: "Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences"

“All bears have a property”, “Some bears have a property”, “Bears have a property” are different in terms of how the property is generalized to a specific bear – a great example of how language constrains thought!

This holds for kids, adults, and according to our new work, (V)LMs! 🧵

27.01.2026 16:16 👍 24 🔁 10 💬 1 📌 1

things that differ in truth (A vs B above) have been really successful. I think a reason is that it forces it to distinguish between otherwise identical distributions that differ only in these key properties and get rewarded for the "good one". This signal is less strong in naturally occurring data

22.12.2025 19:10 👍 1 🔁 0 💬 1 📌 0

I think hallucination has gotten better with models that have undergone post training (RLHF and other stuff). Definitely the models just trained on prediction/co-occurrence hallucinate because why wouldn't they. But these processes that let them see distributionally similar

22.12.2025 19:10 👍 1 🔁 0 💬 1 📌 0

Might turn out best way to learn to "say things like A and not B" (which look lexically and syntactically similar) is to learn logic, world facts, etc. It seems like this might be what LLMs are doing. *Kind of* comes from A and B being distributionally different but that wouldn't be whole story

22.12.2025 18:48 👍 3 🔁 0 💬 0 📌 0

I was using addition as an example here of something LMs learn. They also seem to learn a bunch about truth in nat lang, maybe in same way through RLHF with a similar argument as the addition one:

(A) if the book is red, it's red
the sky is blue

(B) the sky is red
if the book is red, it's green

22.12.2025 18:48 👍 3 🔁 0 💬 2 📌 0

Interesting case if you reward for A but penalize for B
(A) 40 + 37 = 77
29 + 5 = 34

(B) 42 + 59 = 82
10 + 17 = 37
The LM may learn the best way to say stuff like A and not B is to learn to add. It feels a bit odd to call that just cashing out spatiotemporal patterns, even if true in some sense.

22.12.2025 18:04 👍 3 🔁 0 💬 1 📌 0

Commentary title: Linguists should learn to love speech-based deep learning models Authors: Marianne de Heer Kloots, Paul Boersma, Willem Zuidema Abstract: Futrell and Mahowald present a useful framework bridging technology-oriented deep learning systems and explanation-oriented linguistic theories. Unfortunately, the target article's focus on generative text-based LLMs fundamentally limits fruitful interactions with linguistics, as many interesting questions on human language fall outside what is captured by written text. We argue that audio-based deep learning models can and should play a crucial role.

'Tis the season to preprint BBS commentaries; I'm happy to share ours too! 🎄✨

The textual basis of current LLMs causes trouble, but linguistically relevant insights *can* be found in systems modelling the more natural form of human spoken language: the speech signal itself. arxiv.org/abs/2512.14506

17.12.2025 15:21 👍 27 🔁 10 💬 1 📌 1

Very excited to announce that I'll be starting as an Assistant Professor in the Psychology department at Rutgers University-Newark in January 2026!

17.12.2025 04:55 👍 7 🔁 2 💬 2 📌 1

Large language models have learned to use language Acknowledging that large language models have learned to use language can open doors to breakthrough language science. Achieving these breakthroughs may require abandoning some long-held ideas about h...

My commentary on @futrell.bsky.social & @kmahowald.bsky.social excellent forthcoming BBS paper "How Linguistics Learned to Stop Worrying and Love the Language Models". arxiv.org/abs/2512.12447 Have a read. It's 🌶️

16.12.2025 03:32 👍 31 🔁 2 💬 3 📌 1

There have also been threads from Imprint authors on their papers including this from @rkubala.bsky.social...

15.12.2025 13:55 👍 0 🔁 1 💬 1 📌 0

Big news out of Leipzig! Congrats Leonie!

10.12.2025 13:20 👍 5 🔁 0 💬 1 📌 0

We are accepting submissions for the 25th edition of the Texas Linguistics Society (TLS), a UT Austin grad-student ran Linguistics conference! The conference will run from February 20 - 21, 2026 in Austin.

Abstract Deadline: December 17
Notification: January 15

21.11.2025 21:17 👍 3 🔁 3 💬 1 📌 1

I think framework’d say that those are not a good way to test grammaticality since meaning differs (which seems good). But that since grammaticality is constant that the prob difference in those cases would correspond to the plausibility diff from the substitution (also seems good?).

11.11.2025 15:06 👍 2 🔁 0 💬 1 📌 0

I should also say....this is, in many ways, a confounding thing for studying grammaticality with humans too! I am optimistic that the account in here will be useful for thinking about minimal pairs theoretically, not just for LMs but for human ling and psycholing.

10.11.2025 22:36 👍 3 🔁 0 💬 0 📌 0

A confounding thing for the linguistics of LMs: the best way to assess their grammatical ability is string probability. Yet string probability and grammaticality are famously not the same!

Really excited to have this out, where we give a formal account, w/ experiments, of how to make sense of that!

10.11.2025 22:23 👍 11 🔁 1 💬 1 📌 0

Oh cool! Excited this LM + construction paper was SAC-Highlighted! Check it out to see how LM-derived measures of statistical affinity separate out constructions with similar words like "I was so happy I saw you" vs "It was so big it fell over".

10.11.2025 16:27 👍 17 🔁 4 💬 0 📌 0

Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!

Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!

07.11.2025 18:22 👍 33 🔁 8 💬 1 📌 0

Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP 👇

04.11.2025 14:44 👍 27 🔁 15 💬 0 📌 1

TTIC Faculty Opportunities at TTIC

Two brief advertisements!

TTIC is recruiting both tenure-track and research assistant professors: ttic.edu/faculty-hiri...
NYU is recruiting faculty fellows: apply.interfolio.com/174686

Happy to chat with anyone considering either of these options

23.10.2025 13:57 👍 8 🔁 6 💬 0 📌 0

I will be recruiting PhD students via Georgetown Linguistics this application cycle! Come join us in the PICoL (pronounced “pickle”) lab. We focus on psycholinguistics and cognitive modeling using LLMs. See the linked flyer for more details: bit.ly/3L3vcyA

21.10.2025 21:52 👍 27 🔁 14 💬 2 📌 0

Right “good way to solve problems“ as in object permanence, color properties, etc that could be said to be useful in general for any agent who has goals they have to achieve in an environment. not just useful for humans

11.10.2025 20:10 👍 3 🔁 0 💬 0 📌 0

Imo work in bayesian cognition, rational analysis etc suggest that at least some concepts humans have exist because they are good ways to solve those problems in general. That’s maybe a point for “same concepts”. But I guess if the resources and constraints are very different all bets are off.…

11.10.2025 18:51 👍 4 🔁 0 💬 1 📌 0

Kyle Mahowald

Latest posts by Kyle Mahowald @kmahowald