How big should your data team be?
Data teams are often oversized. A company of 200 people rarely needs 15+ data staff, usually 5% of org size is enough
dataactionmentor.com/knowledge-ba...
How big should your data team be?
Data teams are often oversized. A company of 200 people rarely needs 15+ data staff, usually 5% of org size is enough
dataactionmentor.com/knowledge-ba...
Try to find a non-traditional role that is more suited to a future where engineering is very cheap. If you have an idea, try build it yourself. The experience of trying to found is more valuable than employee experience now and even more so in the coming years.
The amount you love someone is proportional to how often you Ghiblify their pictures.
This week I look at agents.
I think this is a new way to build where we donβt intentionally build code-based software.
open.substack.com/pub/davidsj/...
BERT and ERNIE! π
tracking.tldrnewsletter.com/CL0/https:%2...
I don't usually share photos of my family on social media for good reason, but I'm happy to share these ones!
This post encapsulates how I feel about the current state of LLMs and doomers etc. Really great read:
fly.io/blog/youre-a...
So when I've attended Snowflake summit before, I've usually written a blog post talking about the new features released, etc. Is someone going to do that this year, given I didn't go? π
#datasky #databs
It is possible to build machine learning systems which punch up instead of punching down.
Got a cool story about something in the data engineering space? You should π― submit it as a talk to Current 2025 in New Orleans π
Do it! Now! CfP is open until 15th June.
sessionize.com/current-2025...
(Pro-tip: you only need an abstract at this point; writing the talk can be later π
)
#dataBS
This is genuinely one thing you can rely on AI for.
It was actually very impressive. Lots of stuff I want to try.
At the London Data Practitioners Meetup with @pedramnavid.com @jayatillake.bsky.social @rittmananalytics.bsky.social and the London Dagster community
I also think people donβt use the tags as we have found each other. I almost exclusively use the popular with friends feed.
Itβs not but you donβt have to keep declaring ctes. May be able to have partial queries too.
Theyre still here just quieter than at the start. More of them though
Doctorβs orders π«‘
I still think this is the biggest prize in AI. If Siri could actually do most things you do on a phone manually...
9to5mac.com/2025/04/22/s...
Haha yes but he fits the bill.
@petefein.bsky.social
I wonder what the limit difference between CSV and Parquet would be under real conditions, where most queries only need a tiny subset of large datasets. You could probably handle >petabyte datasets on that EC2 machine with good partitioning of Parquet or using Iceberg.
Well, if it works, the real engineers can tidy it up or more likely do nothing and talk about code standards.
Has anyone tried Llama 4 Maverick yet? How big a machine does it need to run locally?
@simonwillison.net
Looks like Nintendo became the best at console FPS.
Oh no! Iβve been enjoying bluesky for the data stuff but can imagine that itβs swung very radically left on other topics.
@windsurfai.bsky.social
I've seen many blog posts and social posts by these supposed true artisans saying that they tried this method, and the output was subpar.
Well, maybe it would have taken just as long if you had just written the code, but for the rest of us, we now have an option to build without you.
Once again, we've devised a derogatory name for something many of us are doing: "Vibe coding".
Just like "Citizen Data Scientist", "Excel Data Analyst", and many other terms made to belittle by the supposed true artisans that came before.
open.substack.com/pub/davidsj/...
yeah but was there coffee down there, and if so was it any good?