Spent 1h back-and-forth with ChatGPT trying to pinpoint a configuration issue. It guessed all manners of reasonable issues, none of which were the right one.
Spent 5 min doing a reverse image search of the error. Someone on the Web had the same issue. Instant fix.
04.03.2026 17:37
👍 1
🔁 0
💬 0
📌 0
Friendly laser fire
@funranium.bsky.social is this a 3 digit count or 4 digit count of swear words situation?
27.02.2026 07:01
👍 1
🔁 1
💬 1
📌 0
Which website do you use to generate this video?
11.02.2026 15:32
👍 0
🔁 0
💬 1
📌 0
@duckduckgo.com Is there a way to add 1password to your browser? I don't see extensions.
10.02.2026 11:44
👍 0
🔁 0
💬 0
📌 0
That strongly implies that Mistral's next step is TTS. In fact, other tokens corroborate it: while [AUDIO] likely indicates that speech tokens follow, [REF] might indicate a reference voice pattern to copy, and [OUTPUT_AUDIO] might start converting text to audio.
06.02.2026 14:14
👍 0
🔁 0
💬 0
📌 0
It also outputs a [word] token, which fits the [STREAMING_WORD] token found in Voxtral 2.
Why have that?
For text-to-speech: there, when the model knows it has finished outputting the audio for a word, it generates the WORD action, so that we can feed it the next word to say.
06.02.2026 14:14
👍 0
🔁 0
💬 1
📌 0
But there are a lot more tokens in there that are unexplained!
To learn more, we can look at what inspired Mistral: the Kyutai Delayed Stream Modeling, arxiv.org/abs/2509.08753
It has the same delay design with the [pad] tokens.
06.02.2026 14:14
👍 1
🔁 0
💬 1
📌 0
Of course, the output does not contain exactly one word per text token, since the audio file does not contain exactly one word per 80ms.
The trick? Look at those new tokens: when the model needs to wait before outputting a word, it outputs a [STREAMING_PAD] token.
06.02.2026 14:14
👍 0
🔁 0
💬 1
📌 0
4. The audio token embedding history + delay tokens go through a Transformer to output a speech token. This is why the delay is variable: it can be any multiple of 80ms.
5. The history of speech token embeddings go through a Transformer to output a text token embedding → text token probs → text.
06.02.2026 14:14
👍 1
🔁 0
💬 1
📌 0
Look at its architecture:
1. The audio is cut into 80ms, which are sampled to 16 KHz (16000*0.08=1280 floats).
2. It is converted to a spectrogram,
3. A whisper-style encoder converts it to an audio token embedding through a convnet,
06.02.2026 14:14
👍 1
🔁 0
💬 1
📌 0
Shoutout to Voxtral 2, which really feels unparalleled in quality.
The interesting bit is its ability to do realtime transcription.
How does it do that, with a variable delay?
06.02.2026 14:14
👍 1
🔁 1
💬 1
📌 0
On Math, honestly, it is impressive how close it is to GPT-OSS 20B and Gemini 3 Flash, even when it does not beat them.
All in all, one of the best local models out there. Architecturally, one of the most innovative.
23.01.2026 14:59
👍 1
🔁 0
💬 1
📌 0
Reasoning is another big purpose. Using MLA, it may be quite good at in-context reasoning on a large corpus, even locally.
But it won't be far above leading local models like Ministral 3. Meanwhile API models like Gemini 3 and DeepSeek will surpass it at the same price.
23.01.2026 14:59
👍 1
🔁 0
💬 1
📌 0
Where GLM-4.7 Flash shines is when you feed it enormous inputs.
That is typical of agentic coding tools. It’s on the Pareto frontier there.
Better than GPT-OSS 20B, cheaper and faster than Devstral Small 2.
23.01.2026 14:59
👍 10
🔁 1
💬 1
📌 0
What happened to Z.ai servers in December, for them to suddenly have a spikey boost in token throughput?!
17.01.2026 23:47
👍 1
🔁 0
💬 0
📌 0
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
12.01.2026 04:12
👍 220
🔁 32
💬 8
📌 1
Phenomenal work.
I wonder about DroPE scaling laws: can it be executed after 4B pretraining tokens regardless of model size (and then the rest of pretraining does NoPE)? Or does it have to be done at the end of pretraining?
12.01.2026 14:39
👍 0
🔁 0
💬 0
📌 0
LLM Benchmark Aggregator & Estimator
As always, find these comparisons at metabench.organisons.com
and the announcement at www.minimax.io/news/minimax...
23.12.2025 22:04
👍 0
🔁 0
💬 0
📌 0
Other metrics don't improve as much… but the M2 baseline was already quite good.
Keep in mind that this model is much faster than the others around it, clocking at 100 tokens per second compared to similar ones doing 30 tokens/sec.
23.12.2025 22:04
👍 0
🔁 0
💬 1
📌 0
M2.1 from @MiniMax__AI has a welcome jump in agentic coding! It matches @Zai_org’s GLM-4.7 released yesterday, but at a lower cost.
23.12.2025 22:04
👍 2
🔁 1
💬 1
📌 0
LLM Benchmark Aggregator & Estimator
As always, the full leaderboard is here: metabench.organisons.com
And the announcement: z.ai/blog/glm-4.7
23.12.2025 13:32
👍 0
🔁 0
💬 0
📌 0
Other metrics are good, but the improvement is more marginal, such as in raw agentic use (typical of customer service):
23.12.2025 13:32
👍 0
🔁 0
💬 1
📌 0
As often, code training improves math as well, where we see a very positive jump!
23.12.2025 13:32
👍 0
🔁 0
💬 1
📌 0
Impressive jump on agentic coding according to its benchmarks! Now on par with Claude Opus 4.1 (from 5 months ago!), K2 Thinking, and GPT-5.2 Codex, at a lower cost.
A bit overshadowed by DeepSeek, whose DSA mechanisms achieve great cost cuts.
23.12.2025 13:32
👍 0
🔁 0
💬 1
📌 0
Looking at raw data: OpenAI claims a score of 44% on Terminal-Bench 2.0 for GPT-5.2 Codex.
Mistral gives GPT-5.1 Codex, the predecessor, a score of 52.8%, and Tbench gives it 57.8%.
Google gives GPT-5.1 (non-Codex) 47.6%, and matches it in Gemini 3 Flash.
19.12.2025 15:19
👍 0
🔁 0
💬 0
📌 0
There are few benchmarks yet for @OpenAI’s fresh GPT 5.2 Codex model.
Initial benchmarks from the announcement imply a drop below Gemini 3 Flash in agentic coding. In fact, the performance seems close to DeepSeek V3.2 at a 50x price jump.
19.12.2025 15:19
👍 1
🔁 0
💬 1
📌 0
LLM Benchmark Aggregator & Estimator
As usual, you can find the leaderboard here: metabench.organisons.com
and the model card: storage.googleapis.com/deepmind-med...
18.12.2025 13:48
👍 0
🔁 0
💬 0
📌 0
Raw agentic behaviour, typically used for customer support, is where it is the least competitive.
• On the high end, Claude Sonnet 4.5 edges it out.
• On the low end, Ministral 3 14B is cheaper for similar results.
Yet still there it is on the Pareto frontier.
18.12.2025 13:48
👍 0
🔁 0
💬 1
📌 0
In agentic coding, it does appear on the Pareto frontier as cheaper and equal to Claude Sonnet 4.5, but lower than GPT-5.1 Codex.
On low thinking, DeepSeek V3.2 beats it for the same price (and MiniMax M1, costlier but higher throughput). The cheaper Devstral 2 matches it.
18.12.2025 13:48
👍 0
🔁 0
💬 1
📌 0