I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.
I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.
LiteLLM does an impressive job tracking meter prices for a wide variety of LLMs, but their documentation is a bit thin about how to use that info. Here's a short example of how I use a CustomLogger class to track costs across multiple LLM calls.
www.crosstab.io/articles/lit...
Excellent, definitely a π. The argument for social science measurement is even stronger, in light of Farrell, et al.'s March paper that we should view large models "as a new kind of cultural and social technology, allowing humans to take advantage of information other humans have accumulated."
Anyone got a good alternative to Pocket as a read later / stash a copy of an article tool?
I'll promise I will shut up about AI soon, but since so many asked I wrote down my agentic flow and also why I'm all the sudden writing Go. lucumr.pocoo.org/2025/6/12/ag...
Claude 3.7 Sonnet followed my text-to-SQL instructions flawlessly, but Claude Sonnet 4 just can't seem to get it right.
www.crosstab.io/articles/cla...
The flip side of this is that if you *are* behind it's never been easier to jump in and start swimming.
Unfortunately housing theory of everything is correct and you can't unsee it once you see it:
worksinprogress.co/issue/the-ho...
It's like it assumes it's running in fully autonomous mode in my IDE
Is it just me or does Claude 4 Sonnet seem super overeager with code in the chat UI?
I just want to know how some API's output is structured and Claude is giving me hundreds of lines of fuzzy deduplication, error trapping, the whole works.
Randomly came across this reddit post about a new document processing leaderboard.
So far, structured data extraction from documents is the killer app for VLMs but public benchmarks and leaderboards have been non-existent. Excited to see that changing.
www.reddit.com/r/MachineLea...
DSPy has a lot going for it but obfuscating how prompts are constructed creates problems. Beware the footguns!
www.crosstab.io/articles/dsp...
The now viral, incorrect meme that LLMs are just next token predictors is causing so much confusion
Plus prompt caching to avoid sending the whole table schema to the LLM on every call and sqlglot to validate the LLM output.
I extended @ramikrispin.bsky.social's excellent work to use Claude Sonnet 3.7 to translate natural language data queries into runnable SQL.
Along the way, I showed that Claude can do this even with English questions against a non-English dataset.
www.crosstab.io/articles/llm...
Yes! I'm looking forward to the Tal & Claude Sonnet 3.7 renaming of ice cream shops tour. I mean, "Big Spoon", really? What a waste!
I think we should all chime in and vote on the names of these joints. They're like little bites of ice cream for the mind that we all get to enjoy from afar.
My favorite so far: John's Water Ice
Runner up: Owowcow
The βPaper Skygestβ is a total validation of the bluesky thesis. Anyone can build a useful, tunable feed. Itβs a bit sparse right now but itβll be amazing once it takes off fully.
What exactly passes for a foundation model these days?
The brilliant Cosma Shalizi writing about LLMs is always worth reading:
www.programmablemutter.com/p/on-feral-l...
if you're a PhD student or postdoc working at the interface of personality psychology and CS/ML (construed broadly on both sides), and are interested in doing a full-time, remote, 3 - 6 month internship/residency at MidJourney, please DM me some kind of resume or CV-like thing
Highly recommended
A video of Pre-Training GPT-4.5 by OpenAI (46 minutes)
www.youtube.com/watch?v=6nJZ...
It turns out to be hard to evaluate natural language with natural language. What should we take away from the conundrum of LLM evaluation? www.argmin.net/p/evaluation...
The idea that poetics is more central to language than semantics or syntax jumped out to me.
Maybe we need to build a taste-based vocabulary for LLM benchmarks. We have all sorts of terms to describe how art, music, food, etc. make use *feel*. But with LLMs were stuck with "vibes".
I've been happy with Neon so far.
I don't get it, why does Meta prohibit people in the EU from using Llama 4 models?
www.llama.com/llama4/use-p...
Kicking the blog back into gear...
www.crosstab.io/articles/202...
Mirror, mirror, on the wall...
www.totsantcugat.cat/actualitat/s...
Meta just dropped Llama 4 on a weekend! Two new open weight models (Scout and Maverick) and a preview of a model called Behemoth - Scout has a 10 million token context
Best information right now appears to be this blog post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/