Micah Wylde (@micahw.com)

S2 is incredibly cool, and now you can run it yourself!

21.01.2026 19:19 👍 10 🔁 2 💬 0 📌 0

I built a tool called LinkedOut to solve the "links in the air" problem during my talks. It’s a serverless data lake built on the @cloudflare.social Data Platform.

Ingest: Pipelines Store: R2 + Apache Iceberg Security: Access

Real-time analytics with zero egress fees. Full build video below.

21.01.2026 00:46 👍 2 🔁 1 💬 2 📌 0

It’s so dumb and also so good, just a big dude doing good in the world

13.10.2025 05:24 👍 5 🔁 0 💬 0 📌 0

That’s correct, the r2 sql project predated the arroyo acquisition, but we’ll be converging over time. Also honored to be mentioned in the same post as the possible dbt acquisition! Ours was…a bit smaller.

03.10.2025 05:08 👍 2 🔁 0 💬 0 📌 0

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.

Another brand new new feature is the R2 data catalog: blog.cloudflare.com/cloudflare-d...

Build something with Pipelines and R2 SQL. I suggest receiving OpenTelemetry data and then surfacing that in a web app (logs should be fairly straightforward), but there are tons of uses for this.

27.09.2025 02:13 👍 2 🔁 2 💬 1 📌 0

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage.

It's early, but I'm excited about direction that the Cloudflare Data Platform is taking. Trying to set up similar pipelines on other clouds would typically be $$$ and take tons of expertise. Managing kafka and multiple services for ingestion, compaction, etc blog.cloudflare.com/cloudflare-d...

25.09.2025 15:40 👍 8 🔁 1 💬 3 📌 0

The news is finally out! Cloudflare has a Data Platform! We're starting with serverless streaming pipelines (powered by arroyo), a managed Iceberg Catalog, and a new distributed SQL engine built on top of DataFusion

25.09.2025 16:44 👍 12 🔁 0 💬 0 📌 0

Sequin Postgres change data capture (CDC) to Kafka, SQS, webhooks, and more. Build real-time data replication, event workflows, and audit logging. Fast and easy setup.

Sequin (sequinstream.com) is doing this, but focused on Postgres

02.08.2025 20:26 👍 5 🔁 0 💬 1 📌 0

Oh wow, you’re totally right. Saw the repo but didn’t actually look inside.

25.06.2025 17:13 👍 0 🔁 0 💬 1 📌 0

GitHub - firebolt-db/firebolt-core: Firebolt Core is a free, self-hosted edition of Firebolt's distributed query engine (https://www.firebolt.io/); it provides high-performance data warehousing capabi... Firebolt Core is a free, self-hosted edition of Firebolt's distributed query engine (https://www.firebolt.io/); it provides high-performance data warehousing capabilities that can be deployed a...

Firebolt at least is source-available: github.com/firebolt-db/...

25.06.2025 15:17 👍 0 🔁 0 💬 1 📌 0

SF Apache DataFusion Meetup · Luma Join us for an evening of learning, networking, and diving into Apache DataFusion, the blazing-fast query execution framework for Rust-based data…

Reminder: San Francisco @ApacheDataFusio meetup tomorrow: lu.ma/uuxd443e

09.06.2025 03:03 👍 3 🔁 1 💬 0 📌 0

Cloudflare is at Snowflake Summit in San Francisco this week!

Swing by our booth 2605 to chat about the new Cloudflare R2 Data Catalog and how it can make your data management and analytics easier!

04.06.2025 20:46 👍 8 🔁 2 💬 0 📌 0

I want that! We have completely separate code for object store and local filesystem, even though the latter is really only used for testing and dev.

01.06.2025 03:43 👍 1 🔁 0 💬 0 📌 0

Absolutely! Parquet and iceberg support are coming, and we’ll consider Ducklake support if it starts getting traction.

30.05.2025 21:37 👍 4 🔁 0 💬 1 📌 0

Modern Data w/ Cloudflare + Friends · Luma Come talk about modern data formats, streaming ingestion, query engines and how you feel about Iceberg at Cloudflare's HQ. We'll be running a series of…

Next Monday after the Snowflake Summit keynote! Hang out on our beautiful roof with other cool data folks, and hear some great speakers from LanceDB, @mooncakelabs.bsky.social, Eventual, Marimo, Bobsled, and @cloudflare-dev.bsky.social!

lu.ma/dbq1hfij

27.05.2025 20:21 👍 1 🔁 0 💬 0 📌 0

A message about Earthly In the next three months, we will be phasing out our Earthly Satellite commercial services, including the Earthly Cloud Satellites, Self-Hosted Sat...

Better CI is a tough business… Earthly couldn’t make it work despite building a great product earthly.dev/blog/shuttin...

03.05.2025 00:50 👍 4 🔁 0 💬 1 📌 0

Everything You Need to Know About Incremental View Maintenance An overview of incremental view maintenance, why it’s useful, and how you can implement it.

Ok, y'all. This took me several weeks and a ton of help from @frankmcsherry.bsky.social and @lalithsuresh.bsky.social. I dug into timely dataflow, differential dataflow, and DBSP to get you up to speed on IVM engines and materialized views. Enjoy!

18.04.2025 18:30 👍 76 🔁 18 💬 4 📌 3

I’m only a week into life at @cloudflare-dev.bsky.social but already amazed by how much of Cloudflare is built _on_ Cloudflare. I’d never have guessed you could get so far with just workers + durable objects!

16.04.2025 15:02 👍 1 🔁 0 💬 0 📌 0

Arroyo is joining Cloudflare Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-sour...

Arroyo is joining @cloudflare.social! We're bringing Arroyo to the Developer Platform as a serverless stream processing system, and will also remain open-source and self-hostable. www.arroyo.dev/blog/arroyo-...

10.04.2025 15:05 👍 18 🔁 4 💬 2 📌 0

Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines We’ve just shipped our new streaming ingestion service, Pipelines — and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2.

Couple of big announcements from @cloudflare.social today for folk in #dataBS:

* Acquisition of Arroyo, launch of Pipelines for streaming ingestion: blog.cloudflare.com/cloudflare-a...
* Launch of R2 Data Catalog—a managed Apache Iceberg catalog for R2 blog.cloudflare.com/r2-data-cata...

10.04.2025 14:50 👍 9 🔁 3 💬 0 📌 0

Announcing Arroyo 0.14.0 Arroyo 0.14 is now available! This release introduces support for lookup joins, more powerful updating SQL, new syntax, structs in DDL, and more!

Arroyo 0.14.0 is now available, including new lookup joins, support for nested updating aggregates, struct types, new syntax, and a bunch of improvements and fixes: www.arroyo.dev/blog/arroyo-...

26.03.2025 16:59 👍 1 🔁 0 💬 0 📌 0

The Trump Administration Accidentally Texted Me Its War Plans U.S. national-security leaders included me in a group chat about upcoming military strikes in Yemen. I didn’t think it could be real. Then the bombs started falling.

I know by month 2 we're all inured to this stuff, but this is a beyond crazy mix of incompetence and illegality www.theatlantic.com/politics/arc...

24.03.2025 17:48 👍 0 🔁 0 💬 0 📌 0

SCO didn’t really turn evil, they were bought by Caldera which rebranded to SCO

19.03.2025 18:21 👍 0 🔁 0 💬 0 📌 0

With checkpoints slatedb is basically a streaming state backend in a box. Wish this had already existed when we started arroyo!

17.03.2025 20:37 👍 6 🔁 1 💬 0 📌 0

Amazing!

01.03.2025 00:08 👍 3 🔁 0 💬 0 📌 0

Arroyo is sitting at 3,999 stars... who's going to put us over the top github.com/ArroyoSystem...

01.03.2025 00:05 👍 3 🔁 0 💬 1 📌 0

I'll use a Python Jupyter notebook with DuckDB. You can convert results to a pandas dataframe then plot with matplotlib. ChatGPT is very good at writing the gluey Python bits.

27.02.2025 17:45 👍 2 🔁 0 💬 0 📌 0

Fast columnar JSON decoding with arrow-rs JSON is the most common serialization format used in streaming pipelines, so it pays to be able to deserialize it fast. This post covers in detail how the arrow-json library works to perform very effi...

You'd think that the key to being a fast streaming engine is like clever join algorithms, but it's mostly just being really good at JSON. Arroyo uses Arrow and the arrow-rs JSON decoder along with some streaming extensions. I think it's pretty cool, so I wrote up a long explanation of how it works

25.02.2025 17:41 👍 14 🔁 1 💬 0 📌 0

It combines a bunch of great services and tools to provide sub-minute-latency querying at a very low cost, including

* Redpanda serverless (Log storage)
* S3 (Object storage)
* Arroyo
* DuckDB

It went so well it felt worth documenting the process for other folks

23.01.2025 17:46 👍 1 🔁 0 💬 0 📌 0

Building a near-real-time data lake with the LOAD stack The LOAD stack (log storage/object storage/Arroyo/DuckDB) makes it easy to build an affordable real-time data lake with minimal operational overhead. This tutorial will guide you through the process o...

Our team at Arroyo recently needed to rebuild our (very ad-hoc) analytics infra to account for our growth. We spent some time working out the best way to set up a near-real-time data lake today, and ended up with a pretty sweet approach we're calling the LOAD stack: www.arroyo.dev/blog/buildin...

23.01.2025 17:43 👍 7 🔁 2 💬 1 📌 0

Micah Wylde

Latest posts by Micah Wylde @micahw.com