Mooncake Labs (@mooncakelabs)

tldr: yes

22.01.2025 05:18 👍 1 🔁 0 💬 1 📌 0

GitHub - Mooncake-Labs/pg_mooncake: Iceberg/Delta Columnstore Table in Postgres Iceberg/Delta Columnstore Table in Postgres. Contribute to Mooncake-Labs/pg_mooncake development by creating an account on GitHub.

github.com/Mooncake-Lab... :)

12.12.2024 19:07 👍 3 🔁 0 💬 1 📌 0

Rox Platform Architecture | ROX Docs

amongst all the excitement around rox one thing stood out to us.

Their lakehouse was a big part of their secret sauce.
docs.rox.com/development/...

Lakehouses will become part of mainstream AI and agents. It's no longer just a 'big data' phenomenon.

things are just starting to get fun :)

20.11.2024 21:10 👍 0 🔁 0 💬 0 📌 0

We'd actually want to take a step further. Data engineering && science will converge.

That's our job.

19.11.2024 01:18 👍 0 🔁 0 💬 0 📌 0

great read! And we certainly agree with Chris.

The lake (+ ELT) is the center of a lot of this. It's why we're so excited about bringing it to every app && dev.

19.11.2024 01:15 👍 2 🔁 1 💬 1 📌 0

(11/11)
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake 🥮 will build the lakehouse loved by developers. pg_mooncake is a start in this direction.

07.11.2024 19:09 👍 1 🔁 0 💬 0 📌 0

(10/11)
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table

07.11.2024 19:09 👍 0 🔁 0 💬 0 📌 0

(9/11)
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.

07.11.2024 19:09 👍 0 🔁 0 💬 0 📌 0

(8/11)
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.

07.11.2024 19:09 👍 0 🔁 0 💬 0 📌 0

(7/11)
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.

07.11.2024 19:09 👍 0 🔁 0 💬 0 📌 0

(6/11)
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.

07.11.2024 19:08 👍 0 🔁 0 💬 0 📌 0

(5/11)
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.

07.11.2024 19:07 👍 0 🔁 0 💬 0 📌 0

(4/11)
Recently, we’ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they can’t write out data in those formats via transactional insert/update/deletes.

07.11.2024 19:07 👍 0 🔁 0 💬 0 📌 0

(3/11)
Enter DuckDB 🦆
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
It’s fast, can scale beyond a single node, and its syntax follows Postgres closely.

07.11.2024 19:07 👍 0 🔁 0 💬 0 📌 0

(2/11):
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization –– things pg isn’t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.

07.11.2024 19:07 👍 0 🔁 0 💬 0 📌 0

🧵 (1/11): This is our first post on BlueSky. And no better way than a technical deepdive on how we shipped pg_mooncake in 60 days:

pg_mooncake brings columnstore tables + DuckDB execution in Postgres for fast analytics.
This is going to be a long one 🍿🥮

07.11.2024 19:06 👍 6 🔁 0 💬 10 📌 0

Mooncake Labs

Latest posts by Mooncake Labs @mooncakelabs