Oh, that's mean! π€
@arikf.net
Data Scientist @ Atlassian Spent some time looking into software teams' effectiveness, and how software teams can use data to reflect and improve how they work. These days I'm trying to learn more about Statistical Process Control.
Oh, that's mean! π€
I was tagged the other day by someone kindly sharing my Python Rgonomics post and realized that my thesis of "tooling keeps improving" held up too well and some of the recos have changed
Hence, here's the 2025 update: www.emilyriederer.com/post/py-rgo-...
Now feat uv (vs pyenv, pdm) and Positron
Not a silly idea. Reminds of this arxiv.org/abs/1102.4374
Interviewer: can you explain this gap in your resume?
Data scientist: it's a confidence interval.
KNN + topic detection getting a big glow-up www.anthropic.com/research/clio
Converting SQL code to python equivalent would have been just another route towards spending hours on this π«
But yes, if it was only a single cell or a little amount of SQL code I would have just done it all in python and be done with it.
I tried passing it through the spark context with spark.conf.set() but the SQL cell didn't pick it up. Maybe I used the wrong commands?
Spent hours yesterday to get a SQL cell in a databricks job use a variable that was set in a prior python cell. Amazing how something that seems so trivial can be so convoluted. Is that what "business impact" looks like?
Good. Now do the rest of the internet.
I heard that in at least some of the cases it was due to commercial and film rights reasons www.npr.org/2023/06/07/1...
Dark Matter is based on a book by Blake Crouch.
How did I just find this?
wiki.postgresql.org/wiki/Don%27t...
#databs
Great stuff, and an excuse to recommend the book Storytelling with Data by Cole Nussbaumer Knaflic for anyone who wants to learn more on this.
One of my favourite pieces on the role of the data analyst comes from @rdpeng.org , who in turn quotes Tukey's "The Future of Data Analysis" from 1962 (!)
simplystatistics.org/posts/2019-0...
#databs
Definitely different setup to anything I experienced. I think that the aspects of team composition and targeted scope would still play a bigger role. Examples I've seen (www.synq.io/blog/data-te... or towardsdatascience.com/data-to-engi...) indicate a wide spread, so the context likely matters more.
Absolutely. Maybe we just have different notions of what data storytelling is? I see data storytelling as the competency of communicating effectively with data. That is different to using narratives as a substitute for data sense making.
Yes, I agree on that point, storytelling can be abused. But I don't think it means it can't be useful or applicable within a process control worldview.
Btw, from what I've seen in Donald Wheeler's books so far, I got the impression he'd classify top-down goal setting and metric tracking (which he calls "voice of the customer") as belonging to the Fantasy genre.
Storytelling is a communication competency. From my exposure to process control so far, its application can help you verify your stories are of the non-fiction variety.
I usually call it "data"
Adversarial and malicious are common terms in computer security.
one of the many things to like about atproto is that everything is authenticated and public
you don't have to trust bluesky, you can read from the relay yourself
it's not an API... more like a spinal tap into the central feed
and this is an amazing example of what you can do with that power
Another critical factor is their remit. Focusing attention on a few areas that can be served well is more sustainable than trying to spread thin a few data scientists to cover as much ground as possible. My experience was that being embedded in a team gives more leverage than providing a service.
Reference numbers can always provide some extra context, but I've seen other factors can play a much more critical role. For example, how much support does DS get from other functions (data platform eng, data engineers, ...) - this can have a huge impact on how data scientists spend their time.
What do you mean by data team? Data scientists? Data platform engineers? Data engineers? Machine learning engineers? What's the org context you're referring to? Startup? Corporate? I have some considerations in mind from a data scientist perspective, but I'm an IC, so take this with a grain of salt.
Should we pay attention to how AI is changing our work? Sure! But staying on top of changes and evolutions in the field is part of the job, and AI is only one of many things driving this constant shift.
bsky.app/profile/apre...
The image shows a graph plotting "Quality of question" vs "Strength of evidence." It features three paths: an ideal but unrealistic path starting from high quality question, a path showing statistical/ML progression from a poor question to a poor question with a precise answer, and the "job of the data scientist" path that progresses along both question quality and strength of evidence.
As data scientists, our greatest impact isn't in delivering more precise answers - it's in improving the quality of the questions being asked. That's why I'm not worried about AI "taking our jobs," though it can definitely help us do them better.