I’m keenly awaiting those blog posts, particularly around the architecture for achieving distribution and what it looks like for network communication and what gets stored and where, how much inter-AZ traffic etc.
I’m keenly awaiting those blog posts, particularly around the architecture for achieving distribution and what it looks like for network communication and what gets stored and where, how much inter-AZ traffic etc.
Thanks for this, Rob, much appreciated.
1. restate.dev as an alternative to traditional stream processing frameworks for implementing distributed business logic owned by multiple teams.
2. Use of Rust to write very fast multi-threaded processing logic and using Python as the glue for wiring it up.
One use case I’ve seen is to have an LLM provide suggested improvements to input text and to score the text against several metrics, then to provide an overall sum of the individual scores. “Give me a weighted vector representation of text for later post processing” could be useful when docs >> 1.
The current discourse on disaggregated compute and storage in data infra says local disk is bad. I think this lacks nuance. Local persistent disk (e.g. EBS) is bad. Ephemeral local disk should be embraced as a block cache for reads and staging of writes to durable storage. The cost/op is a win.
I made an infra engineer starter pack. Folks posting about databases, stream processing, durable execution, orchestrators, service meshes, and more.
go.bsky.app/SCZe42X
Agreed, it’s very painful. It’s feels a bit like the data model is broken where perhaps AWS is missing some sort of account-level default settings/permissions boundary that would auto-attach to newly created repos, so that pushing images would auto-create repos if repo is absent.
By the way, what’s your writing app of choice, please?
I need a widget to cross-post easily, have given you a lengthy response on Twitter. TLDR: Stripe created Markdoc to offer this sort of capability. markdoc.dev
I’m interested in the choice of 2 relational DBs too. I’m guessing they wanted SQL queries against partitioned data for v large datasets (time series), which DuckDb doesn’t do, apparently: db-engines.com/en/system/Co... Do you know how Cockroach performs in this use case?