yes yes i know, it's just for synth pipelines i could use X thousand different bayesian estimates advancing in parallel.
yes yes i know, it's just for synth pipelines i could use X thousand different bayesian estimates advancing in parallel.
hmm. even with like launching many processes in parallel? could definitely have use cases for this.
Actually very much looking to revisit BRMS & co. It was dope (and should compile much better on modern gpus)
Forget about this, Bayesian is hot again.
En tout cas on se dirige clairement vers de l'entraรฎnement sans texte mais toujours avec des donnรฉes structurรฉes. Lร je suis en train de monter des environnements synthรฉtiques exclusivement ร partir de Wikidata.
Il y a peut-รชtre une suite pour bientรดt.
Pas de souci. Public surtout international maintenantโฆ
(j'avoue que j'hรฉsite ร sortir le sujet aux ayants-droits - on a encore une paix royale pour l'instant)
Sur l'usage du synthรฉtique pour l'entraรฎnement ? Bah pas mal de choses citรฉ dans mon billet (dont virtuellement tous les model report un peu rรฉcents/un minimum ouverts sur la question des donnรฉes). vintagedata.org/blog/posts/s...
More of a theory sketch than a paper but it seems to hold vintagedata.org/blog/posts/s...
(I might have something more soon)
additional nail in the coffin of model collapse: better results so far on a model retrained on its own synthetic traces.
Oui bon je lance les trucs et je regarde si รงa marche/donnรฉes font sens. On a clairement passรฉ un cap ces derniers moisโฆ
(Claude Code fonctionne maintenant trรจs bien pour gรฉnรฉrer le script dโinfรฉrence. Quasi arrรชtรฉ de programmer en direct ce mois-ci)
Deux modรจles dโOCR sur HuggingFace :) Ouvert/open source donc รงa tourne en local โ mรชme si en pratique peut-รชtre plus simple de faire tourner sur Colab.
Peut-รชtre overkill, mais dots ocr (je suis en train de processer tout HAL avec) ou Lighton-OCR. Trรจs fiable, gรจre aussi toute la partie layout.
Jamais rรฉussi ร lire non plus. Et mรชme sentiment : pas vraiment de vie lร -dedans.
oh yes, obviously, i can make this now
who talk about cleanly?
Well 10 years of teaching itโฆ Likely last time.
I guess Donald Knuth must have thought of that :)
just realized that jupyter is probably dead as a concept. it's all md+scripts now.
more seriously: i still think "computation" is also happening internally (just in a smooth/transient way, not that dissimilar to actual math search prior formal verification)
I'm afraid this is anthropomorphizing. The proof was there all along in future training data.
Nothing to see, just very powerful pattern matching. www-cs-faculty.stanford.edu/~knuth/paper...
actually, yes.
Not sure for the US, but in Europe started very early on (even q3 2023) with their positioning on safety/alignment and avoiding the mess openai got into at the same time (GDPR blocks, etc.)
(Our next release will actually be personas)
Would also open up the much more interesting question of how to design and tune personas. Iโm currently switching to agentic model training and simulated personas are everywhere, one of the absolute core original seed.
Oh been part-time there for a while now. Always good to have a platform plan b.
Models should design, models should populate, models should compile.