Back at it—system gave us 500 gems… and 10× more junk 😂. Quick tweaks and we’re nearly done with stage one: mining pretrain data from rare, cross-domain PDFs.
#AIpretrain #SpanAware #TokenizerFree #PDFMining #XSpanformer #DataCuration #OpenScience
#artificalintelligence
🧠 X-Spanformer ditched "improver"—now guided by 5-judge consensus 🗳️ to approve text for ox-bar span compilation. Cleaner segments. Swarm decides.
#ai #artificialintelligence #transformers #ltsm #computerscience #XSpanformer #TokenizerFree #SpanAware #SemanticEmbeddings #OxBarTheory #TauSystem 🍄
🚧 Building out the pretrain pipeline for X-Spanformer: github.com/p3nGu1nZz/x-... /// PDF segmentation + judge/improver enrichment for Tau2.0 tokenizer. Zero tokens. All spans. #AI #TokenizerFree #TauSystems #NLP #TransformerArchitecture #OpenSource #FungalLogic #SpanAware #XBarTheory
🧠 Back from break + back on code. Diving into X-Spanformer, a tokenizer-free, span-aware encoder built with X-bar theory magic.
🔗 github.com/p3nGu1nZz/x-...
#AI #software #BiomimeticComputing #TokenizerFree #StructuredLearning #NeuromorphicDesign #XBarTheory #OpenSource #SemanticEmbedding
Up next on stage, Dr. @edoardo-ponti.bsky.social ( @edinburgh-uni.bsky.social / NVIDIA)
🎤 “Adaptive Units of Computation: Towards Sublinear-Memory and Tokenizer-Free Foundation Models”
Fascinating glimpse into the next gen of foundation models.
#FoundationModels #NLP #TokenizerFree #ADSAI2025