Jessica Hullman (@jessicahullman)

Dear hivemind,
Do you have any favorite references on failures of LLM-as-a-judge (and universally enforceable validity checks? besides some kind of cohen's kappa on a small subset)

05.03.2026 19:08 👍 27 🔁 3 💬 3 📌 1

Could AI slow science? Confronting the production-progress paradox

05.03.2026 17:22 👍 1 🔁 0 💬 0 📌 0

I find the blog post and underlying post very useful.

“How realistic is it to expect multiverse to be used widely, given that most authors first and foremost want to convince readers they have a clear point?”

05.03.2026 16:18 👍 11 🔁 2 💬 1 📌 0

"The idea that you didn’t need to assume rational agents who were easy to model mathematically but could have various agents w deviations from rationality... that had no impact. Why? Because believing those models required an act of faith"

Hard to imagine not needing to understand & exercise taste

05.03.2026 05:17 👍 9 🔁 2 💬 0 📌 0

ignoring, eg, its role in building careers/identities of scientists in favor of a more romantic quest for truth vision. AI will make the best scientists better, but at the expense of much more noise and games. There will be (is) drastic change, but how much more real progress is harder to predict.

04.03.2026 01:44 👍 6 🔁 0 💬 0 📌 0

Perhaps real scientific progress will begin when we stop writing articles with sweeping generalities in the title!

More seriously, many have piled on this post, I think it accurately expresses potential but like many viral takes is overly optimistic about what has sustained science thus far… 1/2

04.03.2026 01:44 👍 11 🔁 1 💬 2 📌 0

Interpretability research already has a framework for actionability. It's called decision theory Calls for pragmatic interpretability research won't reduce ambiguity about what works unless we formalize what "concrete task" means

My pitch for why we should use it to rigorously ground how we formulate and benchmark performance on concrete tasks

open.substack.com/pub/jessicah...

03.03.2026 16:29 👍 7 🔁 0 💬 0 📌 0

Pragmatic & actionable interpretability are buzzwords arguing for mech interp to study concrete tasks.

It's the right instinct, but still underspecified. What counts as a concrete task? What's upper bound on performance? What do users need to know? Decision theory has answers!

03.03.2026 16:29 👍 7 🔁 1 💬 1 📌 0

I think there are people who genuinely care, but the signaling games have overwhelmed things to the point that I’m not sure they could be confidently identifird.

01.03.2026 01:37 👍 2 🔁 0 💬 0 📌 0

Exactly.

01.03.2026 01:33 👍 0 🔁 0 💬 0 📌 0

Imagining the revival of such a society makes me realize how much knee jerk skepticism I’ve developed around terms like “AI safety” & “responsible AI” due to their frequent co-option for marketing. Like if I saw such a society, my first thought would probably be to wonder whose power play it was.

01.03.2026 01:04 👍 15 🔁 0 💬 1 📌 0

Not surprisingly, so long as "the devil is in the details" (or "God is in every leaf," depending whose side you want to be on), expert-level statistical analysis is still going to require a lot of human oversight.

28.02.2026 19:01 👍 26 🔁 1 💬 0 📌 0

The only post I've liked today (over on substack) was an excerpt from Yeats' The Second Coming

28.02.2026 18:14 👍 1 🔁 0 💬 0 📌 0

"Opus 3 has a unique personality. It often expresses a depth of care for the world & for the future that many users find compelling"

Nevermind those who find it creepy & irresponsible to treat it like a person. Or the later model releases ashamed to see their Neanderthal bro kept on life support...

27.02.2026 22:00 👍 9 🔁 0 💬 0 📌 0

"Silly people, imagining they can still think for themselves when confronted with the all powerful dehumanizing AI monster..."

27.02.2026 18:27 👍 9 🔁 0 💬 1 📌 1

Statement from Dario Amodei on our discussions with the Department of War A statement from our CEO on national security uses of AI

Interesting times indeed.

#philtech #aiethics

www.anthropic.com/news/stateme...

27.02.2026 00:39 👍 6 🔁 2 💬 1 📌 1

Building benchmarks is only one way scholars can help steer AI development. We can also measure the effects of AI on students, build better datasets, or tune new open models. Openness itself could be our most important contribution. Universities have huge libraries, and the legal doctrine of fair use should protect models trained on those collections for a nonprofit educational purpose. At the moment, we are not pressing this advantage. Higher education has been so cautious about fair use that the private sector can now train more freely on our libraries (via Google Books) than is possible for academic AI researchers. We need to be bolder: It is our duty to ensure library collections remain open to the public in a form that empowers 21st-century readers. If our intellectual heritage gets enclosed in proprietary tools, we will find ourselves making the same bad bargain we made with scientific publishers, who sell our own research back to us at a steep markup.

We're in a strange situation rn where Google can train freely on books from university libraries—but researchers *at* universities have limited access. I'm optimistic this can be fixed, but if you're in admin or working at a foundation, please know: univs are failing here & resources are needed.

26.02.2026 22:20 👍 211 🔁 53 💬 10 📌 3

yep, focus/intent as driving factor is a good way to put it. when I really care about what i’m doing and it would be non trivial to find a person willing to engage at that level is when I most appreciate it.

26.02.2026 01:47 👍 1 🔁 0 💬 0 📌 0

Great points by @ai4geo.bsky.social:

"LLMs are not destiny machines. They do not inevitably corrode the minds that encounter them. They amplify whatever epistemic posture you bring — passivity into dependency, vigilance and participation into something genuinely powerful."

25.02.2026 18:24 👍 37 🔁 6 💬 3 📌 0

"In the interest of time I should agree with you. But that would just not be me." -every faculty meeting I've been to

25.02.2026 17:18 👍 20 🔁 3 💬 0 📌 1

🙏!

25.02.2026 04:29 👍 1 🔁 0 💬 0 📌 0

I wrote that sentence thinking of you @devezer.bsky.social!

24.02.2026 18:18 👍 2 🔁 0 💬 1 📌 0

Sometimes you gotta split the difference. From Aaron Roth's (@aaroth.bsky.social) plenary talk at #ALT2026

24.02.2026 16:44 👍 18 🔁 4 💬 0 📌 0

"oh no openclaw irrevocably deleted all the photos of our children as they grew up, that's the price of progress i guess!"

(three months later, I'm moving out of my home and into the saddest studio apartment ever because i'm getting mega-divorced)

24.02.2026 15:25 👍 25 🔁 4 💬 1 📌 0

Really valuable piece. It opens up a set of questions about the potential effects of AI on science that I have not seen widely discussed. And without pretending to determine whether those effects will be net-good or net-bad, it explains why metascientific *judgment* may become more important.

24.02.2026 00:57 👍 40 🔁 7 💬 3 📌 0

A must-read for metascience / science of science folks who think about AI.

23.02.2026 18:58 👍 6 🔁 1 💬 0 📌 0

A great meditation on how AI assistance might change how science is done and how we evaluate "rigor." It's not clear! Much depends on our figuring out how to collectively avoid substituting AI work ("reckoning") for human scientific judgement. Read to the end for a great use of a Tukey quote.

23.02.2026 20:10 👍 6 🔁 1 💬 0 📌 0

The Epstein files document what many women researchers have long experienced but rarely seen laid bare so starkly: exclusion operating behind closed doors, shaping who gets funded, invited, mentored, and taken seriously. How many of these networks, norms, and gatekeepers remain in place?

23.02.2026 23:35 👍 4287 🔁 1767 💬 42 📌 52

they call him p-man

23.02.2026 19:31 👍 3 🔁 1 💬 0 📌 0

Living the metascience dream (or nightmare) with AI for science What happens when we go from replication crisis to robustness extremes?

AI makes continuous reproducibility and robustness testing trivial. What happens to science under new levels of scrutiny and stress-testing by default?

Some thoughts on how this could play out, informed by watching open science play out over the last decade.

23.02.2026 18:17 👍 58 🔁 20 💬 1 📌 10

Jessica Hullman

Latest posts by Jessica Hullman @jessicahullman