Check out our special theme: new missions for NLP research!
Check out our special theme: new missions for NLP research!
Title card of our paper: "Which course? Discourse! Teaching Discourse and Generation in the Era of LLMs" by Junyi Jessy Li, Yang Janet Liu, Valentina Pyatkin, and William Sheffield.
Nearly 2 years ago, @jessyjli.bsky.social, @janetlauyeung.bsky.social, @valentinapy.bsky.social, and I decided that it's time to bring discourse structure to the center of NLP teaching.
Check out @asher-zheng.bsky.social's work on quantifying strategic language in dialogue, just appeared in the Dialogue and Discourse journal.
We study non-cooperative moves that are subtle to capture, where modern AI still have trouble comprehending.
Work w/ David_Beaver
Title page of our paper: "Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences"
βAll bears have a propertyβ, βSome bears have a propertyβ, βBears have a propertyβ are different in terms of how the property is generalized to a specific bear β a great example of how language constrains thought!
This holds for kids, adults, and according to our new work, (V)LMs! π§΅
π¨Be careful with LLMs when you ask health related questions -- even when the model relies on "evidence"! Kaijie's paper reveals a key weakness and the tricky balance between safety and faithfulness π
Accepted at EACL - excited about Morocco!
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."
New work to appear @ TACL!
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? π§΅π
Incredibly honored to serve as #EMNLP 2026 Program Chair along with @sunipadev.bsky.social and Hung-yi Lee, and General Chair @andre-t-martins.bsky.social. Looking forward to Budapest!!
(With thanks to Lisa Chuyuan Li who took this photo in Suzhou!)
Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!
Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!
Think your LLMs βunderstandβ words like although/but/therefore? Think again!
They perform at chance for making inferences from certain discourse connectives expressing concession
Test your models and see if they just memorize or truly understand!
PLSemanticsBench - where formal meets informal!
arxiv.org/abs/2510.03415
Team: Aditya Thimmaiah, Jiyang Zhang, Jayanth Srinivasa, Milos Gligoric
So what's really happeningβοΈ
LLMs aren't interpreting rules -- they're recalling patterns.
Their "understanding" is promising... but shallow.
π‘It's time to test semantics, not just syntax.π‘
To move from surface-level memorization β true symbolic reasoning.
Change the rules -- swap (+ with -) or replace (+ with novel symbols) operators -- and accuracy collapses.
Models that were "near-perfect" drop to single digits. π¬
π¨ Does your LLM really understand code -- or is it just really good at remembering it?
We built **PLSemanticsBench** to find out.
The results: a wild mix.
β
The Brilliant:
Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested loops! π€―
βThe Brittle: π§΅
Find my students and collaborators at COLM this week!
Tuesday morning: @juand-r.bsky.social and @ramyanamuduri.bsky.social 's papers (find them if you missed it!)
Wednesday pm: @manyawadhwa.bsky.social 's EvalAgent
Thursday am: @anirudhkhatry.bsky.social 's CRUST-Bench oral spotlight + poster
Weβre hiring faculty as well! Happy to talk about it at COLM!
Can we quantify what makes some text read like AI "slop"? We tried π
Iβm at #COLM2025 from Wed with:
@siyuansong.bsky.social Tue am introspection arxiv.org/abs/2503.07513
@qyao.bsky.social Wed am controlled rearing: arxiv.org/abs/2503.20850
@sashaboguraev.bsky.social INTERPLAY ling interp: arxiv.org/abs/2505.16002
Iβll talk at INTERPLAY too. Come say hi!
On my way to #COLM2025 π
Check out jessyli.com/colm2025
QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373
EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219
RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179
Traveling to my first @colmweb.orgπ
Not presenting anything but here are two posters you should visit:
1. @qyao.bsky.social on Controlled rearing for direct and indirect evidence for datives (w/ me, @weissweiler.bsky.social and @kmahowald.bsky.social), W morning
Paper: arxiv.org/abs/2503.20850
Here is a genuine one :) CosmicAIβs AstroVisBench, to appear at #NeurIPS
bsky.app/profile/nsfs...
All of us (@kanishka.bsky.social @kmahowald.bsky.social and me) are looking for PhD students this cycle! If computational linguistics/NLP is your passion, join us at UT Austin!
For my areas see jessyli.com
Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring?
Check out @sebajoe.bsky.socialβs feature on β¨AstroVisBench:
π£ NEW HCTS course developed in collaboration with @tephi-tx.bsky.social: AI in Health Communication π£
Explore responsible applications and best practices for maximizing impact and building trust with @utaustin.bsky.social experts @jessyjli.bsky.social & @mackert.bsky.social.
π»: rebrand.ly/HCTS_AI
Would be great to chat at COLM!
long range narrative understanding, even basic fact checking that humans easily get near perfect on, has barely improved in LMs over years novelchallenge.github.io
The top shows the title and authors of the paper: "Whither symbols in the era of advanced neural networks?" by Tom Griffiths, Brenden Lake, Tom McCoy, Ellie Pavlick, and Taylor Webb. At the bottom is text saying "Modern neural networks display capacities traditionally believed to require symbolic systems. This motivates a re-assessment of the role of symbols in cognitive theories." In the middle is a graphic illustrating this text by showing three capacities: compositionality, productivity, and inductive biases. For each one, there is an illustration of a neural network displaying it. For compositionality, the illustration is DALL-E 3 creating an image of a teddy bear skateboarding in Times Square. For productivity, the illustration is novel words produced by GPT-2: "IKEA-ness", "nonneotropical", "Brazilianisms", "quackdom", "Smurfverse". For inductive biases, the illustration is a graph showing that a meta-learned neural network can learn formal languages from a small number of examples.
π€ π§ NEW PAPER ON COGSCI & AI π§ π€
Recent neural networks capture properties long thought to require symbols: compositionality, productivity, rapid learning
So what role should symbols play in theories of the mind? For our answer...read on!
Paper: arxiv.org/abs/2508.05776
1/n
Yes, at least need other data (like Echos in AI), quality measure (LitBench), also what we did in QUDsim was to make sure the stories are from posts pre-LLM to prevent AI stories. Further, The way they measure style + semantic diversity doesn't align with how they define it (only capture lexical)
I agree this thread's headline claim seems premature. Let me add our recent ACL Findings paper, with Dexter Ju and @hagenblix.bsky.social, which found syntactic simplification in at least some LMs, in a novel domain regeneration setting: aclanthology.org/2025.finding...
Nice, reading level, syntactic complexity, and sentence structures are great angles to study this!!