Also to your question I have no idea how an LLM does it given their high complexity. I can only argue based on interacting with them on niche non-linguistic tasks (specifically advanced math) that they (the most recent models from 2026 ) do seem to be exceptionally good at mathematical reasoning.
14.03.2026 16:57
π 0
π 0
π¬ 2
π 0
I don't think there is a separation but math is not evidently a language like English is a language. Otherwise linguistics and mathematical reasoning would be mutually transferable skills which I really don't think are (I am reading Syntactic theory of which Bender is a coauthor; very different)
14.03.2026 16:48
π 0
π 0
π¬ 0
π 0
Ugh I did say nothing to do with language which was very stupid of me. Thanks for spotting that. It would be nice to get past a day without saying or doing 5 idiots things, but too much to ask clearly (I have severe ADHD in my defense )
14.03.2026 16:30
π 0
π 0
π¬ 0
π 0
E.g. I am an applied mathematician and math is sorta a language but I have zero expertise *in* language or linguistics.
14.03.2026 16:26
π 0
π 0
π¬ 1
π 0
I said that this is far beyond language not that it has nothing to do with language. Our brain actually explicitly separates linguistic reasoning from analytical and mathematical reasoning. Work by Hope Kean from fedorenko's lab at mit (way down on that thread) www.biorxiv.org/content/10.1...
14.03.2026 16:24
π 0
π 0
π¬ 1
π 0
Whoops sorry that is Kean's other PhD paper. This is the right reference www.biorxiv.org/content/10.1...
14.03.2026 16:19
π 0
π 0
π¬ 0
π 0
In other words I think you are simply out of your area of expertise because LLMs/VLMs are not simply about language or computational linguistics. It's a bit like me deciding I can "speak the language of math and code so I can reason about Language" which I cannot actually -outside my expertise.
14.03.2026 16:02
π 0
π 0
π¬ 1
π 0
I work in a niche area (intersection of applied math, climate dynamics and planetary atmospheres) and they are so data poor that a simple pattern matcher should simply fail and yet the frontier models are astonishing at back & forth "brainstorming". How could you evaluate these given your expertise?
14.03.2026 15:58
π 0
π 0
π¬ 1
π 0
Consequently calling them "synthetic text extruders", I believe, as you often do, is not even the right framing. I am an applied mathematician and can reason with them on novel mathematical theoretical ideas which have *nothing* to do with language (only true for models since Opus 4.5 and GPT 5.x)
14.03.2026 15:56
π 0
π 0
π¬ 2
π 0
LLMs/VLMs, however, are complex dynamical systems that emulate reasoning systems (code, math, language) more to do with climate AI emulators - they are far beyond just language. Fields medal winner Terrance Tao has a full breakdown on using them for novel math proofs - this is far beyond language.
14.03.2026 15:53
π 0
π 0
π¬ 1
π 0
I've read stochastic parrots (brilliant work but dated for the current crop of models which are not even just LLMs) and I am sure you are an expert in your field. I don't think you're "responding to perceived threats to your work". I also just started reading Syntactic Theory and it is very good!
14.03.2026 15:51
π 0
π 0
π¬ 1
π 0
Telling a graduate student to pursue an idea is *also* cognitive offloading actually.
14.03.2026 15:33
π 2
π 0
π¬ 1
π 0
I am an academic researcher in STEM (I am an applied adjacent PhD; I also code a lot) and I can check every single line of code or equations generated by these models. 6 months ago they were simply not good enough for me. Over the past two months everything seems to have changed - it's a wild shift.
12.03.2026 23:47
π 0
π 0
π¬ 0
π 0
Yup, Sonnet is worse at arithmetic evidently.
12.03.2026 22:12
π 0
π 0
π¬ 0
π 0
This was actually true till the one before last model - e.g. Sonnet 4. But this is true less and less. The extended thinking models almost never get calculations wrong because they have access to tools so they simply write a python script and do it for you.
12.03.2026 21:11
π 1
π 0
π¬ 0
π 0
You mean the decimal place accuracy? Good question. These models are ultimately stochastic (but not, I think, parrots lol) so there will be variance across repeated experiments.
12.03.2026 21:09
π 0
π 0
π¬ 1
π 0
Maybe Claude is throttling the free version...? Or it is a smaller model..?
12.03.2026 21:04
π 2
π 0
π¬ 1
π 0
I turned off extended thinking, and switched to Sonnet 4.6
12.03.2026 20:59
π 0
π 0
π¬ 1
π 0
Opus 4.6 trivially does this (as I expect it to). I just attached your image to it, no prompt.
12.03.2026 20:57
π 0
π 0
π¬ 0
π 0
Thanks, was this Sonnet 4.6 extended or the base? Am just curious if I can replicate this. (I will send you the claude link of the test). I haven't used the base models in ages.
12.03.2026 20:13
π 2
π 0
π¬ 1
π 0
...that LLMs are statistically wrong/making up every third word, is simply not a supportable statistic for any of the modern LLMs (which are actually an elaborate code harness on top of an LLMs). Everyone here seems to be attacking chatgpt 3.5/4 which was released...2 years ago?
12.03.2026 20:01
π 5
π 0
π¬ 2
π 0
This claim was true 2 years ago.
This is demonstrably not true for any of the frontier models over a very large number of examples across multiple fields of research. I really don't understand how people can make these claims so contra to reality, not to mention *easily tested*.
12.03.2026 19:57
π 2
π 0
π¬ 1
π 0
I have no idea who this person is and have never interacted with them and I am also blocked. Incredible.
12.03.2026 19:52
π 2
π 0
π¬ 0
π 0
A few of my collaborators are associate editors in top journals in my field. They report a large no. of slop submissions but no. of quality submissions seem largely the same. Most slop gets removed at editor level, never reaches reviewers - I review ~20 papers a year, see no difference in quality.
11.03.2026 01:46
π 0
π 0
π¬ 0
π 0
It depends. Top journals in each field have greater readership and correspondingly greater scrutiny. Odds of errors being caught are higher. As you go lower in journal impact, the probability of slop increases. For my work. I release all data and code for data and analysis at time of publication.
11.03.2026 00:59
π 0
π 0
π¬ 1
π 0
Trust is contra to the scientific method and I don't trust *myself*, grad students, postdocs or collaborators on their claims ( I expect them to NOT trust me). This rule applies triply to AI models. You obviously think I'm a gullible idiot, and everything I say convinces you more! Which might be!
10.03.2026 23:01
π 0
π 0
π¬ 1
π 0
My guess is that the people tricking themselves are going to be basically most of the scientific community soon; my wild estimate (extrapolating from personal anecdotes) on the current number is > 50% of all scientific researchers in the US who use them extensively. (I work at UCLA)
10.03.2026 14:50
π 0
π 0
π¬ 1
π 0
But useful is a personal matter isn't it? I am a researcher in STEM (not CS or AI) and now I don't actually know anybody who doesn't use these tools extensively for their work. I'm just reporting what people tell me based on a big conference I was at recently.
10.03.2026 14:29
π 0
π 0
π¬ 1
π 0
Fair point. I think it's easier to think of these models as giant repositories of easily extracted knowledge & concepts from almost every single idea we have ever written about in any form. "Intelligence" is a loaded word but being incredibly useful is what matters in the end, i think/guess.
10.03.2026 14:22
π 0
π 0
π¬ 1
π 0