tldr: yes
tldr: yes
amongst all the excitement around rox one thing stood out to us.
Their lakehouse was a big part of their secret sauce.
docs.rox.com/development/...
Lakehouses will become part of mainstream AI and agents. It's no longer just a 'big data' phenomenon.
things are just starting to get fun :)
We'd actually want to take a step further. Data engineering && science will converge.
That's our job.
great read! And we certainly agree with Chris.
The lake (+ ELT) is the center of a lot of this. It's why we're so excited about bringing it to every app && dev.
(11/11)
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake ๐ฅฎ will build the lakehouse loved by developers. pg_mooncake is a start in this direction.
(10/11)
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table
(9/11)
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.
(8/11)
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.
(7/11)
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.
(6/11)
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.
(5/11)
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.
(4/11)
Recently, weโve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they canโt write out data in those formats via transactional insert/update/deletes.
(3/11)
Enter DuckDB ๐ฆ
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
Itโs fast, can scale beyond a single node, and its syntax follows Postgres closely.
(2/11):
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization โโ things pg isnโt designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.
๐งต (1/11): This is our first post on BlueSky. And no better way than a technical deepdive on how we shipped pg_mooncake in 60 days:
pg_mooncake brings columnstore tables + DuckDB execution in Postgres for fast analytics.
This is going to be a long one ๐ฟ๐ฅฎ