Georg Heiler's Avatar

Georg Heiler

@geoheil.com

building socio-technical complex systems with data | geoheil.com

266
Followers
1,029
Following
147
Posts
09.11.2024
Joined
Posts Following

Latest posts by Georg Heiler @geoheil.com

Preview
Metaxy + Dagster-Slurm for Efficient Multimodal Pipelines | Georg Heiler How to combine Metaxy, Dagster-Slurm, Docling, and Ray to run incremental multimodal pipelines on sovereign AI infrastructure.

See here in action how EU sovereign HPC AI + ray allow to easily scale document processing with docling georgheiler.com/2026/02/22/m...

04.03.2026 15:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Metaxy + Dagster-Slurm for Efficient Multimodal Pipelines | Georg Heiler How to combine Metaxy, Dagster-Slurm, Docling, and Ray to run incremental multimodal pipelines on sovereign AI infrastructure.

Multimodal data handling is different! Especially with regards to complexity and cost. Daniel Gafni and I built Metaxy (docs.metaxy.io) to simplify Efficient Multimodal Pipelines

04.03.2026 15:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Magenta Telekom Case Study | Dagster Learn how Magenta Telekom replaced fragmented, manual data workflows with a modular, Dagster-powered platform that reduced onboarding from 3 months to 1 day and laid the foundation for AI-driven decis...

Read how Magenta (πŸ‘‹ @geoheil.com and @milicevica23.bsky.social ) uses Dagster+ to feel a bit of that joy: dagster.io/customers/ho...

25.02.2026 16:22 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Introduction - Metaxy A high level introduction to Metaxy.

And docs.metaxy.io/main/ also integrating with I.e. lance for again a different kind of versioning

25.02.2026 06:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Branching and Shallow Cloning in Lance: Towards a "Git for AI Data" A deep dive into how table formats handle version management for ML/AI experimentation, and how Lance unifies branching, tagging, and shallow clone on top of …

Don’t forget lance lancedb.com/blog/branchi... for multimodal fit for data

25.02.2026 06:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
The stacking workflow Stacked PRs. Stacked diffs. Stacked changes. A better workflow to manage pull requests.

great implementation for www.stacking.dev

20.02.2026 20:25 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - cesarferreira/stax: The fastest stacked-branch workflow for Git. Interactive TUI, smart PRs, safe undo. Written in Rust. The fastest stacked-branch workflow for Git. Interactive TUI, smart PRs, safe undo. Written in Rust. - cesarferreira/stax

github.com/cesarferreir... #rust #stacking is awesome

20.02.2026 20:24 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Hannes Werthner "The Role of Computer Science in the Age of AI (or Digital Humanism?)"
Hannes Werthner "The Role of Computer Science in the Age of AI (or Digital Humanism?)" YouTube video by Digital Humanism

www.youtube.com/watch?v=DGA0...

18.02.2026 08:54 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

An European sovereign GPU cloud does not come out of nowhere maybe this project can support making HPC systems more accessible. The recently started projects will take a long time to complete. I hope github.com/ascii-supply... will help.

07.01.2026 08:19 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Modern Architecture 101 for New Engineers & Forgetful Experts - Jerry Nixon - NDC Copenhagen 2025
Modern Architecture 101 for New Engineers & Forgetful Experts - Jerry Nixon - NDC Copenhagen 2025 YouTube video by NDC Conferences

#great talk www.youtube.com/watch?v=WRg1... on #architecture for #engineers

22.12.2025 21:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Scaling data pipelines @ Magenta Telekom - OSA Con 2025
Scaling data pipelines @ Magenta Telekom - OSA Con 2025 Presented by Georg Heiler at OSA Con 2025. Magenta Telekom ingests many terabytes of new data every day, and every downstream consumer wants it immediately. The real bottleneck turned out not to be…

The OSACon recordings are available now www.youtube.com/watch?v=31LH...

10.12.2025 15:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
PyTogether - Real-Time Collaborative Python IDE Collaborative Python IDE for students, educators, and teams.

#python #together pytogether.org nice

04.12.2025 21:29 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Reconstructing History with XTDB (Jeremy Taylor + James Henderson)
Reconstructing History with XTDB (Jeremy Taylor + James Henderson) YouTube video by CMU Database Group

interesting new #timeseries #database #xtdb xtdb.com see the great #cmu video for details www.youtube.com/watch?v=zzqD...

26.11.2025 11:07 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - mxschmitt/action-tmate: Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself. Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself. - mxschmitt/action-tmate

github.com/mxschmitt/ac... #tmux #action-tmate - really neat #debugging

11.11.2025 14:53 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
DEF CON 33 - Exploiting Shadow Data from AI Models and Embeddings - Patrick Walsh
DEF CON 33 - Exploiting Shadow Data from AI Models and Embeddings - Patrick Walsh This talk explores the hidden risks in apps leveraging modern AI systemsβ€”especially those using large language models (LLMs) and retrieval-augmented generation (RAG) workflows. We demonstrate how…

A great video about LLMs and the data they can provide to the world - even though perhaps they should not | www.youtube.com/watch?v=O7BI... - DEF CON 33 - Exploiting Shadow Data from AI Models and Embeddings - Patrick Walsh

06.11.2025 15:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Introducing Apache Foryβ„’ Rust: A Versatile Serialization Framework for the Modern Age | Apache Foryβ„’ TL;DR: Apache Fory Rust is a blazingly-fast, cross-language serialization framework that delivers ultra-fast serialization performance while automatically handling circular references, trait objects, ...

#rust #fory #serialization fory.apache.org/blog/fory_ru...

04.11.2025 12:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

The real AI win isn't superhuman agents, it's scaled mediocrity.
Doing less with less at massive scale unlocks tasks that were once uneconomical.
The magic is in aggregate value, not perfect outputs. Empower teams with practical AI tools.Β 
πŸ”— https://dlthub.com/blog/the-real-ai-win-scaled-mediocrity

17.10.2025 12:39 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - l-mds/dsc-dach-tutorial-dagster: Introduction to using and scaling dagster Introduction to using and scaling dagster. Contribute to l-mds/dsc-dach-tutorial-dagster development by creating an account on GitHub.

#dsc-dach #data it was a. pleasure to share an introductory workshop about spark and data pipelines. Thank you Aleks for the great collaboration!

Find the workshop files here if you want to follow along github.com/l-mds/dsc-da...

14.10.2025 14:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
DuckLake: Learning from Cloud Data Warehouses to Build a Robust β€œLakehouse” (Jordan Tigani)
DuckLake: Learning from Cloud Data Warehouses to Build a Robust β€œLakehouse” (Jordan Tigani) YouTube video by CMU Database Group

#duckdb #ducklake #cmu www.youtube.com/watch?v=z2Gh...

07.10.2025 19:14 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
feat: build SLURM integration for dagster by HPicatto Β· Pull Request #19 Β· ascii-supply-networks/dagster-slurm Type of Change feat: New feature fix: Bug fix docs: Documentation style: Code style refactor: Code refactor perf: Performance improvement test: Tests chore: Maintenance Description adds ...

Something about super and computing in the making anyone daring out there who wants to explore? Or folks who want to exchange ideas about SLURM, HET jobs and advanced resource management? github.com/ascii-supply...

07.10.2025 14:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

good point. I think I only have < 1 hour so BI/vis will have to wait a bit. But otherwise it would be a great addition

07.10.2025 08:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

#duckdb #dagster #ray #ducklake

07.10.2025 07:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Simple Sovereign Scalable Data Stack | Georg Heiler Tired of cloud lock-in and surprise bills? This talk shows how to build a fast, portable analytics stack around DuckDB and Dagster. Along the way of our journey to sovereignty and scale we touch on…

Simple Sovereign Scalable Data Stack georgheiler.com/event/tdwi-2... precursor: pypi.org/project/dags... github.com/dagster-io/c... if you want to see this in action join in NΓΌrnberg or Vienna for some sovereign, scalable data talks in the coming weeks

07.10.2025 07:02 πŸ‘ 6 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
When the duck quacks: Multimodal querying with FlockMTL
When the duck quacks: Multimodal querying with FlockMTL YouTube video by DuckDB

#duckdb #multimodal #rag www.youtube.com/watch?v=2qSZ... blobs.duckdb.org/events/duckd...

30.09.2025 15:09 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Katharine Jarmul - Anonymization: Why is it so hard? (PyData Prague #27)
Katharine Jarmul - Anonymization: Why is it so hard? (PyData Prague #27) YouTube video by PyData

#compliance #anonymization #python www.youtube.com/watch?v=EqQd...

25.09.2025 08:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen - Apache Sedona Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of...

#gis #medium-data #sedona #rust #datafusion sedona.apache.org/latest/blog/...

24.09.2025 18:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ“ˆ DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...

16.09.2025 11:55 πŸ‘ 52 πŸ” 22 πŸ’¬ 0 πŸ“Œ 3
Preview
Home | Data inconsistencies Data inconsistencies, architecuture and real world stories

A living Elo leaderboard for analytics/OLAP engines. Public benchmarks (TPC-DS/H, SSB, vendor & community posts) becomes a β€œmatch.” Upsets + context matter. Browse the board & poke holes: rebrand.ly/ey6y7hf

02.09.2025 07:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Multi-Agentic system Threat Modeling Guide v1.0 This guide builds on the OWASP Agentic AI – Threats and Mitigations publication, our master agentic threat taxonomy, by applying its threat taxonomy to real-world multi-agent systems (MAS). These…

#owasp now gearing up for #llm and #genai - Multi-Agentic system Threat Modeling Guide v1.0 genai.owasp.org/resource/mul...

22.08.2025 14:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

so we go back to faster than s3 alternatives?

20.08.2025 11:09 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0