John F Wu (@jwuphysics)

On benchmarks, yes, but I don't think it passes the vibe check. Absolutely phenomenal for 4B and easily supersedes older 70B class models.

However, on instruction following, it still struggles compared to 4o (even 4o-mini). And ofc on factual QA it also doesn't do nearly as well

07.03.2026 02:04 👍 4 🔁 0 💬 0 📌 1

I'm real bullish on AI capabilities. It's humans I'm skeptical of: I'm really worried about careless use of AI which is basically all I've seen so far

06.03.2026 01:25 👍 4 🔁 1 💬 0 📌 0

Let me guess, the same 41% who voted against Hillary in 2016 because "she's a war hawk"?

04.03.2026 21:27 👍 4 🔁 0 💬 0 📌 0

Knuth is the epitome of "idgaf what you think." Doesn't use email, but apparently uses Claude. Randomly wrote a book on the bible.

04.03.2026 13:01 👍 1 🔁 0 💬 0 📌 0

uh oh Donald Knuth is a Claude booster now

04.03.2026 12:05 👍 3 🔁 0 💬 0 📌 0

I kind of like Zitron because he had the intellectual humility to admit that he doesn't know a thing about software engineering, but he's also now staked his career on writing about how AI can't do anything (including software engineering).

28.02.2026 19:26 👍 4 🔁 0 💬 0 📌 0

Probably important to mention: you actually need to remember it yourself, too.

With LLM-assisted coding, I imagine that some people don't commit their "learnings" to memory, probably because AI took on an outsized role. This means they'll struggle to build on skills that were never internalized.

27.02.2026 03:52 👍 1 🔁 0 💬 0 📌 0

Optical Spectroscopy of Dwarf Galaxies at $z\sim 0.15$ in the COSMOS Field: Star Formation and Dust Properties We present a spectroscopic study of low-mass galaxies (LMGs;$10^8\leq\rm M_*/M_\odot\leq10^9$) at $z\sim0.15$ in COSMOS field, and compare it to a control sample of intermediate-mass galaxies (IMGs;$1...

Hey #Astro skeeters, check this out!

arxiv.org/abs/2602.22035

26.02.2026 19:00 👍 7 🔁 3 💬 1 📌 0

Why do we do astrophysics? At time of writing, large language models (LLMs) are beginning to obtain the ability to design, execute, write up, and referee scientific projects on the data-science side of astrophysics. What implic...

I work in astronomy and very little of that is verifiable/provable... so I'm a bit more sensitive to the failure modes of AI for science.

And that doesn't even get into the motivations of *why* I might (not) use AI to directly do my science, see e.g., arxiv.org/abs/2602.101...

26.02.2026 01:28 👍 0 🔁 0 💬 0 📌 0

Makes sense. Like I don't know where the ceiling on math or coding are. The hardest part is writing down a well-specified problem (and it seems this is the biggest roadblock for making AI assisted proofs/formal verification useful, much less designing adequate benchmarks).

26.02.2026 01:27 👍 0 🔁 0 💬 1 📌 0

Strongly agree, except that maybe we'll hit a slowdown on verifiable rewards (coding, math) and meander into the domain where the 9s of reliability require some other paradigm beyond RLVR / GRPO.

Anyway my point is that obviously these authors didn't feel the AGI enough to simply extrapolate...

26.02.2026 00:41 👍 1 🔁 0 💬 1 📌 0

For a while people kept saying "are you *feeling* the AGI?" which was kind of inane. But maybe the question is: do we understand the implications of the rate of change, and do we have accurate intuition for why progress might continue or stop?

26.02.2026 00:23 👍 2 🔁 0 💬 1 📌 0

ChatGPT screenshot: (Query) how many new clusters were detected by hunt and reffert (Response) In their Gaia DR3 all-sky clustering study, Emily L. Hunt and Sabine Reffert detected a substantial number of previously unreported star clusters. Specifically: They recovered 7 167 clusters overall from Gaia DR3 data. Out of those, 2 387 were candidate new objects that had no counterpart in existing literature prior to their work. Applying more stringent quality cuts yields a subset of 4 105 highly reliable clusters, of which 739 are new identifications (i.e., not previously catalogued). � arXiv +1 Thus, the number of new clusters reported by Hunt & Reffert is either 2 387 candidate new objects in the full catalogue or 739 new highly reliable clusters depending on the reliability threshold used. �

Ugh, not good.

FWIW, I got a better answer... got the same answer with Gemini 3.1 Pro.

But lack of reliability is a big reason why I don't currently don't trust these for literature search/citations.

25.02.2026 13:35 👍 1 🔁 0 💬 1 📌 0

The only way you can judge whether a paper is worth reading, unfortunately, is by reading it!

25.02.2026 03:07 👍 2 🔁 0 💬 0 📌 0

Strong agree. In astrophysics we have a high acceptance rate in our top journals. But that doesn't necessarily mean that the work is *meaningful*, only that it is maybe correct in terms of observational or theoretical rigor.

25.02.2026 03:07 👍 2 🔁 0 💬 1 📌 0

Kind of bonkers that every frontier model knows what "red/green TDD" means given that you basically made it up?

24.02.2026 03:09 👍 0 🔁 0 💬 1 📌 0

An image labeled “SN 2025 p h t in NGC 1637, Hubble W F C 3 2024 + Webb NIRCam 2024”. The majority of the image shows a face-on spiral galaxy speckled with myriad blue and red stars. The yellowish core of the galaxy forms a fuzzy oval tilted to the upper right. About halfway from the core to the edge of the image at about 4 o’clock, a small region is outlined with a white box. A shaded, nearly transparent white triangle extends to a pullout at upper right labeled “before explosion”, with short lines forming a crosshair that points to a red star at the center. Below this are three more square images, all with crosshairs at the same location. 1) Hubble August 2024, with nothing visible in the crosshairs, 2) Webb October 2024, with a red star in the crosshairs, 3) Hubble July 2025, with a blue supernova in the crosshairs.

A star has died! For the first time, astronomers have used #NASAWebb to identify which specific star exploded as a supernova. The star—located in galaxy NGC 1637—was a red supergiant surrounded by so much dust that it was invisible to Hubble: https://news.stsci.edu/4alt51V

23.02.2026 15:04 👍 110 🔁 30 💬 1 📌 3

BGE website that says "Sign in to Your Account" Below that it says "Success! Your email is *undefined*."

Couldn't log in to my gas/electricity provider, and the email address on file didn't work.

Clicked "Forgot email?" and entered my account number; was greeted by this...

19.02.2026 02:15 👍 4 🔁 0 💬 1 📌 0

A Horse Named Never Seldom did I reach the little mountain without him, the easy crests making valleys of indifferent grasses. We think of a horse less as the history of one…

Happy Lunar New Year! 🧧

Courtesy of the Poem of the Day: A Horse Named Never

www.poetryfoundation.org/poetrymagazi...

17.02.2026 18:45 👍 1 🔁 0 💬 0 📌 0

Drippy Orange Bird shirt 2 color Drippy Bird print on black Comfort Color Tee

I imagine you might already have or know, but Wazil's work is amazing!

I have the drippy bird longsleeve wazil.bigcartel.com/product/drip...

16.02.2026 23:35 👍 4 🔁 0 💬 1 📌 0

So I finally finished this book and it's really really good! They say never meet your heroes but getting to know her through her autobiography, her scientific outlook and politics hold up extremely well!

Some highlights for me:
🔭🧪🧵

15.02.2026 17:27 👍 25 🔁 4 💬 1 📌 1

tfw you discover human intelligence is made of meat

15.02.2026 14:33 👍 1 🔁 0 💬 0 📌 0

Thinking *got* it right. Not sure what I was trying to write there.

Also TIL Claude free tier allows for extended thinking!

Nevertheless it had a very pathetic showing by default. Good to know this is what most folks are experiencing in terms of AI!

15.02.2026 04:00 👍 1 🔁 0 💬 0 📌 0

You'll need to **drive** to the car wash! Even though 300 meters is a very short distance (about a 3-4 minute walk), you need your car to actually be at the car wash in order to get it washed. If you walked there, you'd arrive without your car. So drive over, get your car washed, and then you'll drive back home in a nice clean car. The short distance means it won't use much fuel, and you'll still get the benefit of the walk if you want to stroll somewhere else later!

Thinking enabled it right -- default free does not.

15.02.2026 03:53 👍 1 🔁 0 💬 1 📌 0

Walk! At only 300 meters (about 1,000 feet or roughly 3-4 city blocks), it would take you just 3-4 minutes to walk there. Driving would actually be less efficient since you'd spend time getting in the car, starting it, driving slowly for such a short distance, parking, and then walking back home anyway after dropping off the car. Plus, your engine won't even warm up properly in that distance. The walk will take barely any time and you'll get a bit of fresh air in the process.

Oof owie my claude

15.02.2026 03:51 👍 5 🔁 0 💬 0 📌 0

I don't pay for Claude (web version), so I wonder what the free tier would say ... hang tight, I'll screen cap the first response

15.02.2026 03:48 👍 3 🔁 0 💬 2 📌 0

Staff Writer The Atlantic is hiring a Staff Writer to cover extreme weather and natural disasters. This is a position for a journalist of the highest ambition who is eager to report from the ground on the realitie...

Come work with me

atlanticmedia.wd1.myworkdayjobs.com/Careers/job/...

13.02.2026 15:14 👍 4 🔁 3 💬 1 📌 1

Spent four years in that concrete-poured structure to the right! Good times 🥰

13.02.2026 15:38 👍 1 🔁 0 💬 0 📌 0

Excellent points made here, which I mostly agree with.

Even though I don't think LLMs are just "text-interpolators" as Hogg says,* I do feel strongly that they are not scientists, and cannot be made accountable in the way a scientist must.

* This is because of RL training objectives

12.02.2026 17:38 👍 12 🔁 3 💬 0 📌 0

In the CS domains, they no long allow position papers or review papers until they're accepted to a conference (they are refereed).

There are just too many low quality/LLM-written opinion pieces or reviews in CS written by folks who only have a shallow understanding of the field.

12.02.2026 17:33 👍 5 🔁 0 💬 1 📌 0

John F Wu

Latest posts by John F Wu @jwuphysics