Forest Gregg (@bunkum.us)

i also think that masked data language models learning how texts really corrupted could be the future of entity resolution.

06.03.2026 02:15 👍 0 🔁 0 💬 0 📌 0

i'm always saying this.

06.03.2026 02:14 👍 0 🔁 0 💬 1 📌 0

active generalized category discovery with a steorts-style microclustering prior seems like the future of entity resolution.

06.03.2026 02:10 👍 0 🔁 0 💬 1 📌 0

no offense but there are too many "labor labs"

06.03.2026 01:29 👍 1 🔁 0 💬 0 📌 0

man, labor notes is sold out already.

05.03.2026 22:51 👍 4 🔁 0 💬 0 📌 0

Graphic advertising a happy hour sponsored by the Indianapolis NewsGuild and the NewsGuild at NICAR 2026. The happy hour will run from 6-8 pm March 5 at O'Reilly's Irish Pub and Restaurant

Are you a union member or want to be and you're at #NICAR26?

The Indy NewsGuild and @newsguild.org are sponsoring a happy hour tonight from 6-8pm downtown. (appetizers provided!)

05.03.2026 11:51 👍 9 🔁 4 💬 0 📌 1

Banitsa - Wikipedia

it was not as good as what i had, but it was still wonderful.

05.03.2026 01:36 👍 2 🔁 0 💬 0 📌 0

i had banitsa for the first time a week ago, and it was amazing. tonight i'm making it at home, and i had not previously appreciated amount of cheese and butter.

04.03.2026 21:34 👍 2 🔁 0 💬 1 📌 0

Assorted stickers and documents on a desk. One group of stickers says "libfec" in green font. The other group says "Datasette" in purple. One document stack has the title "NICAR26 libfec manual". The document says "libfec cheatsheet".

I will be at #NICAR26 this week teaching classes!

For campaign finance folks: come to my Thursday 11:30am class on libfec, a new fast CLI tool for working with FEC data!

Also get limited-edition libfec stickers + a campfin "zine"!

(Bonus: find @simonwillison.net and I for Datasette stickers)

04.03.2026 01:07 👍 6 🔁 2 💬 0 📌 0

i do not doubt that huge parts of the quantitative social sciences are potentially automatable, partially because they already seemed as if they were.

03.03.2026 23:38 👍 21 🔁 1 💬 2 📌 0

really appreciate what @theorangeone.net has enabled!

03.03.2026 02:18 👍 2 🔁 0 💬 0 📌 0

GitHub - fgregg/django-tasks-local-db: A database + threadpool backend for Django 6's built-in tasks framework. A database + threadpool backend for Django 6's built-in tasks framework. - GitHub - fgregg/django-tasks-local-db: A database + threadpool backend for Django 6's built-in tasks framework.

and if you want a touch more safety:

03.03.2026 02:16 👍 1 🔁 0 💬 1 📌 0

GitHub - lincolnloop/django-tasks-local: A thread pool backend for Django 6's built-in tasks framework. A thread pool backend for Django 6's built-in tasks framework. - lincolnloop/django-tasks-local

this pretty dang close. github.com/lincolnloop/...

03.03.2026 02:09 👍 1 🔁 0 💬 1 📌 0

go look at the moon

02.03.2026 23:29 👍 11 🔁 1 💬 0 📌 1

GitHub - lincolnloop/django-tasks-local: A thread pool backend for Django 6's built-in tasks framework. A thread pool backend for Django 6's built-in tasks framework. - lincolnloop/django-tasks-local

oh shit! (complimentary)

02.03.2026 23:27 👍 1 🔁 0 💬 0 📌 0

someone who has ever used version control should invent a .po file format

02.03.2026 20:39 👍 2 🔁 0 💬 0 📌 0

that is a word.

02.03.2026 16:04 👍 0 🔁 0 💬 0 📌 0

within about 10 iterations it was doing over 99% accuracy. first few iterations missed important corner cases.

02.03.2026 13:38 👍 1 🔁 0 💬 0 📌 1

this exchange got me to set up a loop for a situation where i ground truth labels for thousands of very ambiguous short text identifiers of labor unions. i had claude draft a prompt for an subagent to find the real union in a local db, return its results, score it's accuracy, and refine the prompt.

02.03.2026 13:38 👍 1 🔁 0 💬 1 📌 0

neither one of those is my position.

01.03.2026 18:15 👍 1 🔁 0 💬 1 📌 0

a skill that expertly executes the dominant, expert-level search strategies for legal research is a useful skill! but it's not the same thing as covering the space of legal research.

01.03.2026 17:19 👍 0 🔁 0 💬 2 📌 0

RL would seem to have same problem it could learn from your feedback on one legal research question, but it's unlikely to cover the space unless the examples you give it cover the space.

01.03.2026 17:16 👍 0 🔁 0 💬 1 📌 0

a skills document that captures all that variation would be very very long because it would have captured all the different cases, but it could be general to problems in the domain.

01.03.2026 17:16 👍 1 🔁 0 💬 1 📌 0

let's posit that when claude does those procedures it's near expert level at it's executing them.

but there are legal research questions where those procedure will not work well, and bruenig will go to some other strategy or invent a new one.

01.03.2026 17:16 👍 2 🔁 0 💬 1 📌 0

the thing is that experts don't have just one procedure. let's posit that the skill bruenig wrote does capture the dominant procedure and maybe even a few fallback procedures he uses when the first doesn't work.

01.03.2026 17:16 👍 2 🔁 0 💬 1 📌 0

RL should soften but not eliminate the domain specific generalization problem.

01.03.2026 16:36 👍 1 🔁 0 💬 1 📌 0

most abstractly, these procedures encoded in skills can introduce their own generalization errors *within the target domain*

01.03.2026 15:51 👍 1 🔁 0 💬 1 📌 0

the instruction set of these skills seem like the same kind of problem (indeed i have experienced just this generalization problem for other skill like things i have built), and trusting that the problem can be effective with its appropriate domain is exactly the concern.

01.03.2026 15:47 👍 4 🔁 0 💬 1 📌 0

in my experience, it's just very easy to write tools that solve the problems you know the answer to that do not generalize to the ones you don't. a simple example writing a regex that solves all the cases you know about is almost certain going to fail on the ones you don't.

01.03.2026 15:47 👍 3 🔁 0 💬 1 📌 0

the thing i worry about building these kinds of tools is that they they are "overtrained" on the things that the builders know a lot about. like bruenig knows some parts of labor law a lot better than others and he adjusted the skill until it worked well on the parts he knows about.

01.03.2026 14:35 👍 7 🔁 0 💬 1 📌 0

Forest Gregg

Latest posts by Forest Gregg @bunkum.us