Abhay Bothra (@swe.dev)

King to c7?

20.12.2024 16:22 👍 2 🔁 0 💬 0 📌 0

Caveat: Some of these could be unique to Fennel’s architecture because of our reliance on Kafka for exactly-once semantics and recovery

06.12.2024 05:43 👍 0 🔁 0 💬 0 📌 0

Why use large batches at all? To amortize the cost of Kafka transactions, which we rely on for exactly-once semantics.

06.12.2024 05:43 👍 0 🔁 0 💬 1 📌 0

The latter also keeps memory utilization proportional to mini-batch size.

06.12.2024 05:43 👍 0 🔁 0 💬 1 📌 0

We got around that by internally sharding each batch of records and processing sub-shards in parallel.
We also break down our batches into mini-batches so output of the chain can be streamed to Kafka without waiting for the full batch execution to finish.

06.12.2024 05:43 👍 0 🔁 0 💬 1 📌 0

Cons: This architecture prevents concurrent/fully async operation of all operators since now each batch has to be processed in full by the operator chain before moving to the next batch, which was in turn preventing us from running full throttle even when CPU capacity was available.

06.12.2024 05:43 👍 0 🔁 0 💬 1 📌 0

Great thread from @micahw.com. Adding some of our own learnings from building this in Fennel.

An additional advantage for us was that it allowed us to keep data in columnar format for longer instead of converting back-and-forth between operators for serialization.

06.12.2024 05:43 👍 3 🔁 0 💬 1 📌 0

In hindsight, what would the right API for this look like?

27.11.2024 20:29 👍 1 🔁 0 💬 1 📌 0

Yes, I think they do this so that the ‘a’ region doesn’t become a hotspot. Was definitely surprising when I found out, but ultimately made sense.

27.11.2024 19:44 👍 3 🔁 0 💬 0 📌 0

Control Planes and the Death of the Cluster The recent (few years) rediscovery of control plane architecture, largely due to the success of Kubernetes, is changing the way people think about distributed systems. Not many years ago, everything "...

Clusters are getting squeezed from above by smarter control planes, and from below by cheap and consistent object storage.

www.linkedin.com/pulse/contro...

24.11.2024 20:04 👍 21 🔁 5 💬 2 📌 0

it occupies a very interesting point in the design space of caches, but the fact that you can’t immediately read your writes can be a problem that you still need to design for. I wonder if that is its undoing.
@jonhoo.eu might have more thoughts on this.

20.11.2024 16:45 👍 1 🔁 0 💬 1 📌 0

That was their implementation of Noria?

20.11.2024 08:10 👍 1 🔁 0 💬 1 📌 0

We’ve built an IVM engine at Fennel that allows python UDFs by leveraging a fleet of python workers for execution while keeping the other operators in Rust. Hope to write a lot more about the technical details soon. One problem that we’ve had to solve is to provide IVM with time travel.

20.11.2024 07:59 👍 3 🔁 0 💬 0 📌 0

TIL AWS un-launched S3 Select[1] as of July 25, 2024, presumably in favor of S3 Object Lambda[2]. RIP PushdownDB (arxiv.org/abs/2002.0...).

[1]: aws.amazon.com/blogs...
[2]: aws.amazon.com/s3/fe...

18.11.2024 04:15 👍 10 🔁 2 💬 3 📌 0

Abhay Bothra

Latest posts by Abhay Bothra @swe.dev