Tokenizers govern the allocation of computation. It's a waste to spend a whole token of compute predicting the "way" in "By the way". SuperBPE redirects that compute to predict more difficult tokens, leading to wins on downstream tasks!
21.03.2025 18:31
๐ 4
๐ 1
๐ฌ 0
๐ 0