Mooncake Labs's Avatar

Mooncake Labs

@mooncakelabs

the simple lakehouse.

19
Followers
28
Following
16
Posts
04.11.2024
Joined
Posts Following

Latest posts by Mooncake Labs @mooncakelabs

tldr: yes

22.01.2025 05:18 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - Mooncake-Labs/pg_mooncake: Iceberg/Delta Columnstore Table in Postgres Iceberg/Delta Columnstore Table in Postgres. Contribute to Mooncake-Labs/pg_mooncake development by creating an account on GitHub.

github.com/Mooncake-Lab... :)

12.12.2024 19:07 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Rox Platform Architecture | ROX Docs

amongst all the excitement around rox one thing stood out to us.

Their lakehouse was a big part of their secret sauce.
docs.rox.com/development/...

Lakehouses will become part of mainstream AI and agents. It's no longer just a 'big data' phenomenon.

things are just starting to get fun :)

20.11.2024 21:10 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We'd actually want to take a step further. Data engineering && science will converge.

That's our job.

19.11.2024 01:18 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

great read! And we certainly agree with Chris.

The lake (+ ELT) is the center of a lot of this. It's why we're so excited about bringing it to every app && dev.

19.11.2024 01:15 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

(11/11)
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake ๐Ÿฅฎ will build the lakehouse loved by developers. pg_mooncake is a start in this direction.

07.11.2024 19:09 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(10/11)
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table

07.11.2024 19:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(9/11)
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.

07.11.2024 19:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(8/11)
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.

07.11.2024 19:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(7/11)
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.

07.11.2024 19:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(6/11)
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.

07.11.2024 19:08 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(5/11)
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.

07.11.2024 19:07 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(4/11)
Recently, weโ€™ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they canโ€™t write out data in those formats via transactional insert/update/deletes.

07.11.2024 19:07 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(3/11)
Enter DuckDB ๐Ÿฆ†
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
Itโ€™s fast, can scale beyond a single node, and its syntax follows Postgres closely.

07.11.2024 19:07 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

(2/11):
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization โ€“โ€“ things pg isnโ€™t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.

07.11.2024 19:07 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

๐Ÿงต (1/11): This is our first post on BlueSky. And no better way than a technical deepdive on how we shipped pg_mooncake in 60 days:

pg_mooncake brings columnstore tables + DuckDB execution in Postgres for fast analytics.
This is going to be a long one ๐Ÿฟ๐Ÿฅฎ

07.11.2024 19:06 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 10 ๐Ÿ“Œ 0