's Avatar

@itsjoenaso

1,503
Followers
151
Following
21
Posts
28.10.2024
Joined
Posts Following

Latest posts by @itsjoenaso

If anyone out there self-hosts @dagster.io on GCP, I would love to connect. Looking for advice/ suggestions on on avoiding any pitfalls!

03.03.2025 21:28 👍 0 🔁 0 💬 0 📌 0

Whoa this is a really interesting acquisition. This feels like old school vs new school

17.12.2024 16:01 👍 0 🔁 0 💬 1 📌 0

I've had fairly limited responses on other platforms - hoping this crew has some thoughts!

Anyone have experience dynamically building dbt dependencies within the graph? ie. run the same pipeline with multiple params, but some runs require bespoke models that need to be swapped in for the default

17.12.2024 14:48 👍 0 🔁 0 💬 0 📌 0

For some reason this feels more difficult to reason about than a LookML ownership/ implementation but Im not sure why...

22.11.2024 18:04 👍 0 🔁 0 💬 0 📌 0

For those of you using @cubedev.bsky.social (or any semantic layer), who "owns" the models? Are you finding that data engineers are doing that, the consumers (ie. Frontend devs), or someone in the middle?

Should these models be treated as an extension of the transformation layer?

22.11.2024 18:03 👍 1 🔁 0 💬 1 📌 0

Is anyone here a part of the Data Engineer Things slack? Does it still exist?

21.11.2024 15:55 👍 2 🔁 0 💬 0 📌 0

Airflow optimizations irk me.

The fact that inline imports are an anti-pattern in all other python tooling but are an officially suggested method to improve your Airflow runtime is bonkers.

21.11.2024 15:53 👍 2 🔁 0 💬 0 📌 0

Automate The Boring Stuff. That book changed the trajectory of my career, no question

19.11.2024 19:43 👍 1 🔁 0 💬 1 📌 0

I'd like to think the most effective people in each of those roles have doing both for some time - the distinction existed more in marketing material, job postings and LinkedIn than in the real world for most.

Though I see a job posting for "data analytics engineer" the other day 🤦

18.11.2024 21:45 👍 2 🔁 0 💬 1 📌 0

My nespresso had the exact opposite effect on me haha

15.11.2024 19:55 👍 2 🔁 0 💬 1 📌 0

This sounds cool. Are they all single servings per day?

15.11.2024 19:00 👍 0 🔁 0 💬 1 📌 0

Oh yeah I meant the general hate for it out in the wild, not from you specifically!

11.11.2024 16:16 👍 1 🔁 0 💬 0 📌 0

I’ve never understood the hate for the medallion architecture. It’s just a different set of terms for some generalized patterns

11.11.2024 14:58 👍 1 🔁 0 💬 1 📌 0
The yaml document from hell As a data format, yaml is extremely complicated and it has many footguns. In this post I explain some of those pitfalls by means of an example, and I suggest a few simpler and safer yaml alternatives.

TIL about the Norway Problem

ruudvanasseldonk.com/2023/01/11/t...

05.11.2024 15:38 👍 0 🔁 0 💬 0 📌 0

Yes and no. I think Mongo has historically sold the idea of providing fast time to value, which the business loves. S3 as a data store is positioned as an ease of use thing but seldom consumed directly by end users.

“lake house” sounds cooler anyway ha

05.11.2024 14:15 👍 2 🔁 0 💬 1 📌 0

Seems too good to be true but I think all just jaded from the Twitter bots

05.11.2024 14:09 👍 0 🔁 0 💬 1 📌 0

I am just now discovering the convenience of VSCode linked to GitHub Codespaces and Im slightly embarrassed that I didn't do this sooner

04.11.2024 20:10 👍 1 🔁 0 💬 0 📌 0

This whole feature set kind of terrifies me but it is pretty cool

04.11.2024 16:45 👍 2 🔁 0 💬 1 📌 0

Not there yet but that will be the eventual output.

This actually helped point me in the right direction. Took some digging but it requires a combination of fields (timestamp + log sequence number)

30.10.2024 18:12 👍 0 🔁 0 💬 1 📌 0

Anyone well versed in Google DataStream CDC?

Im looking to stream Postgres changes to BigQuery; is there a way to get the timestamp BQ becomes aware of a row when writing changes to BQ directly?

Alternative is to write to GCS then ingest to BQ, but wondering if this is possible with direct to BQ

30.10.2024 15:05 👍 1 🔁 0 💬 2 📌 0

JSON support on Redshift is not nearly as friendly.

You cant as easily size up your compute on an ad-hoc basis so complexity of the workload matters.

Are you managing the cluster at all? There are semi-easy ways to tune performance, but it is also not fun if you don't know what you're doing.

29.10.2024 19:09 👍 5 🔁 0 💬 1 📌 0