(4/n)
More info here!
Read our paper: arxiv.org/abs/2509.21155
Paper site: cshaib.github.io/syntax_domai...
Thank you to all my wonderful co-authors; happy to continue chatting about any of this!
(4/n)
More info here!
Read our paper: arxiv.org/abs/2509.21155
Paper site: cshaib.github.io/syntax_domai...
Thank you to all my wonderful co-authors; happy to continue chatting about any of this!
(3/n) Perhaps more strikingly, unintended syntactic-domain correlations can be exploited to bypass model refusals (e.g., OLMo-2-Instruct 7B here)
(2/n) This has important implications for model generalization and safety! We show that this occurs in instruction-tuned models, and propose an evaluation to test for this type of brittleness.
(1/n) Models learn to rely on *syntactic templates* (frequent patterns of POS tags) that co-occur with particular domains.
LLMs can inadvertently learn "If I see this syntactic pattern itβs domain X" rather than "If I see this semantic content, do task Y."
Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini
Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a β¨spotlight!
+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun
(7/7) For more details, please check out our pre-print!
(6/7) LLMs are terrible at detecting their own slop: GPT-5, Deepseek-V3, and o3-mini rarely assign a label of "slop" (avg. 6% of documents), whereas humans marked 34% of texts as "slop."
(5/7) We lack good/reliable automatic text metrics for 3 of the 5 most important slop features: relevance, coherence, and tone. :-(
(4/7) Different domains have different slop signatures. In news articles, coherence, density, relevance, and tone issues predict slop. In Q&A tasks, it's factuality and structure. Context matters!
(3/7) Humans can spot "sloppy text", but may have differing thresholds on overall assessments. But our annotators consistently flagged the same problematic passages, suggesting we know it when we see it...
(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.
"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?
In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!
arxiv.org/abs/2509.19163
π§΅ (1/7)
(5/7) We lack good/reliable automatic text metrics for 3 of the 5 most important slop features: relevance, coherence, and tone. :-(
(4/7) Different domains have different slop signatures. In news articles, coherence, density, relevance, and tone issues predict slop. In Q&A tasks, it's factuality and structure. Context matters!
(3/7) Humans can spot "sloppy text", but may have differing thresholds on overall assessments. But our annotators consistently flagged the same problematic passages, suggesting we know it when we see it...
(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.
I'm searching for some comp/ling experts to provide a precise definition of βslopβ as it refers to text (see: corp.oup.com/word-of-the-...)
I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! π
π’ Can we trace a small distilled model back to its teacher? π€New work (w/ @chantalsh.bsky.social, @silvioamir.bsky.social & @byron.bsky.social) finds some footprints left by LLMs in distillation! [1/6]
π Full paper: arxiv.org/abs/2502.06659