Alexander Doria's Avatar

Alexander Doria

@dorialexander

LLM for the commons.

7,605
Followers
694
Following
1,954
Posts
02.09.2023
Joined
Posts Following

Latest posts by Alexander Doria @dorialexander

yes yes i know, it's just for synth pipelines i could use X thousand different bayesian estimates advancing in parallel.

07.03.2026 22:05 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

hmm. even with like launching many processes in parallel? could definitely have use cases for this.

07.03.2026 21:58 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Actually very much looking to revisit BRMS & co. It was dope (and should compile much better on modern gpus)

07.03.2026 21:48 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Forget about this, Bayesian is hot again.

07.03.2026 21:47 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

En tout cas on se dirige clairement vers de l'entraรฎnement sans texte mais toujours avec des donnรฉes structurรฉes. Lร  je suis en train de monter des environnements synthรฉtiques exclusivement ร  partir de Wikidata.

07.03.2026 20:46 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Il y a peut-รชtre une suite pour bientรดt.

07.03.2026 16:39 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Pas de souci. Public surtout international maintenantโ€ฆ

07.03.2026 16:33 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

(j'avoue que j'hรฉsite ร  sortir le sujet aux ayants-droits - on a encore une paix royale pour l'instant)

07.03.2026 16:31 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Sur l'usage du synthรฉtique pour l'entraรฎnement ? Bah pas mal de choses citรฉ dans mon billet (dont virtuellement tous les model report un peu rรฉcents/un minimum ouverts sur la question des donnรฉes). vintagedata.org/blog/posts/s...

07.03.2026 16:30 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Synthetic Pretraining | Vintage Data Old data, new models

More of a theory sketch than a paper but it seems to hold vintagedata.org/blog/posts/s...

(I might have something more soon)

07.03.2026 15:04 ๐Ÿ‘ 8 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

additional nail in the coffin of model collapse: better results so far on a model retrained on its own synthetic traces.

07.03.2026 13:17 ๐Ÿ‘ 61 ๐Ÿ” 6 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

Oui bon je lance les trucs et je regarde si รงa marche/donnรฉes font sens. On a clairement passรฉ un cap ces derniers moisโ€ฆ

07.03.2026 11:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(Claude Code fonctionne maintenant trรจs bien pour gรฉnรฉrer le script dโ€™infรฉrence. Quasi arrรชtรฉ de programmer en direct ce mois-ci)

07.03.2026 10:57 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Deux modรจles dโ€™OCR sur HuggingFace :) Ouvert/open source donc รงa tourne en local โ€” mรชme si en pratique peut-รชtre plus simple de faire tourner sur Colab.

07.03.2026 10:55 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Peut-รชtre overkill, mais dots ocr (je suis en train de processer tout HAL avec) ou Lighton-OCR. Trรจs fiable, gรจre aussi toute la partie layout.

07.03.2026 10:43 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Jamais rรฉussi ร  lire non plus. Et mรชme sentiment : pas vraiment de vie lร -dedans.

05.03.2026 21:17 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

oh yes, obviously, i can make this now

05.03.2026 00:05 ๐Ÿ‘ 18 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

who talk about cleanly?

04.03.2026 22:10 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Well 10 years of teaching itโ€ฆ Likely last time.

04.03.2026 22:09 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I guess Donald Knuth must have thought of that :)

04.03.2026 21:02 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

just realized that jupyter is probably dead as a concept. it's all md+scripts now.

04.03.2026 20:35 ๐Ÿ‘ 82 ๐Ÿ” 8 ๐Ÿ’ฌ 9 ๐Ÿ“Œ 7

more seriously: i still think "computation" is also happening internally (just in a smooth/transient way, not that dissimilar to actual math search prior formal verification)

04.03.2026 00:39 ๐Ÿ‘ 8 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I'm afraid this is anthropomorphizing. The proof was there all along in future training data.

03.03.2026 23:47 ๐Ÿ‘ 50 ๐Ÿ” 3 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1
Post image

Nothing to see, just very powerful pattern matching. www-cs-faculty.stanford.edu/~knuth/paper...

03.03.2026 23:36 ๐Ÿ‘ 216 ๐Ÿ” 44 ๐Ÿ’ฌ 11 ๐Ÿ“Œ 20

actually, yes.

03.03.2026 07:25 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Not sure for the US, but in Europe started very early on (even q3 2023) with their positioning on safety/alignment and avoiding the mess openai got into at the same time (GDPR blocks, etc.)

03.03.2026 00:29 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(Our next release will actually be personas)

02.03.2026 22:59 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Would also open up the much more interesting question of how to design and tune personas. Iโ€™m currently switching to agentic model training and simulated personas are everywhere, one of the absolute core original seed.

02.03.2026 22:58 ๐Ÿ‘ 18 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Oh been part-time there for a while now. Always good to have a platform plan b.

02.03.2026 20:20 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Models should design, models should populate, models should compile.

02.03.2026 17:46 ๐Ÿ‘ 14 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0