Arik Friedman (@arikf.net)

Oh, that's mean! 🤔

14.10.2025 12:03 👍 1 🔁 0 💬 0 📌 0

Python Rgonomics - 2025 Update | Emily Riederer Switching languages is about switching mindsets - not just syntax. New developments in python data science toolings, like polars and seaborn’s object interface, can capture the ‘feel’ that converts fr...

I was tagged the other day by someone kindly sharing my Python Rgonomics post and realized that my thesis of "tooling keeps improving" held up too well and some of the recos have changed

Hence, here's the 2025 update: www.emilyriederer.com/post/py-rgo-...

Now feat uv (vs pyenv, pdm) and Positron

27.01.2025 03:12 👍 83 🔁 17 💬 5 📌 0

Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was ...

Not a silly idea. Reminds of this arxiv.org/abs/1102.4374

28.12.2024 22:35 👍 1 🔁 0 💬 1 📌 0

Interviewer: can you explain this gap in your resume?

Data scientist: it's a confidence interval.

21.12.2024 01:08 👍 7 🔁 0 💬 0 📌 1

Clio: Privacy-preserving insights into real-world AI use A blog post describing Anthropic’s new system, Clio, for analyzing how people use AI while maintaining their privacy

KNN + topic detection getting a big glow-up www.anthropic.com/research/clio

13.12.2024 12:06 👍 51 🔁 9 💬 3 📌 1

Converting SQL code to python equivalent would have been just another route towards spending hours on this 🫠
But yes, if it was only a single cell or a little amount of SQL code I would have just done it all in python and be done with it.

30.11.2024 10:52 👍 0 🔁 0 💬 0 📌 0

I tried passing it through the spark context with spark.conf.set() but the SQL cell didn't pick it up. Maybe I used the wrong commands?

30.11.2024 07:05 👍 0 🔁 0 💬 0 📌 0

Spent hours yesterday to get a SQL cell in a databricks job use a variable that was set in a prior python cell. Amazing how something that seems so trivial can be so convoluted. Is that what "business impact" looks like?

30.11.2024 01:35 👍 1 🔁 0 💬 0 📌 0

Good. Now do the rest of the internet.

29.11.2024 19:26 👍 1 🔁 0 💬 0 📌 0

Sony and Marvel and the Amazing Spider-Man Films Rights Saga : Planet Money (Note: This episode originally ran back in 2022.)This past weekend, Spider-Man: Across the Spider-Verse had the second largest domestic opening of 2023, netting (or should we say webbing?) over $120 million in its opening weekend in the U.S. and Canada. But the story leading up to this latest Spider-Man movie has been its own epic saga.When Marvel licensed the Spider-Man film rights to Sony Pictures in the 1990s, the deal made sense — Marvel didn't make movies yet, and their business was mainly about making comic books and toys. Years later, though, the deal would come back to haunt Marvel, and it would start a long tug of war between Sony and Marvel over who should have creative cinematic control of Marvel's most popular superhero. Today, we break down all of the off-screen drama that has become just as entertaining as the movies themselves.This episode was originally produced by Nick Fountain with help from Taylor Washington and Dave Blanchard. It was engineered by Isaac Rodrigues. It was edited by Jess Jiang. The update was produced by Emma Peaslee, with engineering by Maggie Luthar. It was edited by Keith Romer. Help support Planet Money and get bonus episodes by subscribing to Planet Money+ in Apple Podcasts or at plus.npr.org/planetmoney.

I heard that in at least some of the cases it was due to commercial and film rights reasons www.npr.org/2023/06/07/1...

24.11.2024 20:30 👍 2 🔁 0 💬 0 📌 0

Dark Matter is based on a book by Blake Crouch.

24.11.2024 20:26 👍 1 🔁 0 💬 1 📌 0

23.11.2024 13:07 👍 22 🔁 4 💬 0 📌 0

Don't Do This - PostgreSQL wiki

How did I just find this?

wiki.postgresql.org/wiki/Don%27t...

#databs

22.11.2024 14:43 👍 32 🔁 6 💬 1 📌 2

Great stuff, and an excuse to recommend the book Storytelling with Data by Cole Nussbaumer Knaflic for anyone who wants to learn more on this.

21.11.2024 02:41 👍 0 🔁 0 💬 0 📌 0

One of my favourite pieces on the role of the data analyst comes from @rdpeng.org , who in turn quotes Tukey's "The Future of Data Analysis" from 1962 (!)
simplystatistics.org/posts/2019-0...
#databs

16.11.2024 22:01 👍 4 🔁 2 💬 2 📌 0

Data team as % of workforce: A deep dive into 100 tech scaleups What’s the right ratio of data roles in scaleups and why it should probably be higher than you think.

Definitely different setup to anything I experienced. I think that the aspects of team composition and targeted scope would still play a bigger role. Examples I've seen (www.synq.io/blog/data-te... or towardsdatascience.com/data-to-engi...) indicate a wide spread, so the context likely matters more.

18.11.2024 10:51 👍 0 🔁 0 💬 0 📌 0

Absolutely. Maybe we just have different notions of what data storytelling is? I see data storytelling as the competency of communicating effectively with data. That is different to using narratives as a substitute for data sense making.

18.11.2024 10:40 👍 1 🔁 0 💬 0 📌 0

Yes, I agree on that point, storytelling can be abused. But I don't think it means it can't be useful or applicable within a process control worldview.

18.11.2024 03:35 👍 1 🔁 0 💬 1 📌 0

Btw, from what I've seen in Donald Wheeler's books so far, I got the impression he'd classify top-down goal setting and metric tracking (which he calls "voice of the customer") as belonging to the Fantasy genre.

18.11.2024 03:23 👍 1 🔁 0 💬 1 📌 0

Storytelling is a communication competency. From my exposure to process control so far, its application can help you verify your stories are of the non-fiction variety.

18.11.2024 03:23 👍 2 🔁 0 💬 2 📌 0

I usually call it "data"

18.11.2024 03:10 👍 1 🔁 0 💬 0 📌 0

Adversarial and malicious are common terms in computer security.

17.11.2024 23:25 👍 1 🔁 0 💬 0 📌 0

Creating a LLM-as-a-Judge That Drives Business Results – A step-by-step guide with my learnings from 30+ AI implementations.

Have you seen hamel.dev/blog/posts/l... ?

17.11.2024 23:20 👍 13 🔁 0 💬 1 📌 0

Sky Zoo Stats on Bluesky, At Protocol, ...

one of the many things to like about atproto is that everything is authenticated and public

you don't have to trust bluesky, you can read from the relay yourself

it's not an API... more like a spinal tap into the central feed

and this is an amazing example of what you can do with that power

16.11.2024 21:16 👍 30 🔁 7 💬 0 📌 0

Another critical factor is their remit. Focusing attention on a few areas that can be served well is more sustainable than trying to spread thin a few data scientists to cover as much ground as possible. My experience was that being embedded in a team gives more leverage than providing a service.

17.11.2024 10:39 👍 0 🔁 0 💬 1 📌 0

Reference numbers can always provide some extra context, but I've seen other factors can play a much more critical role. For example, how much support does DS get from other functions (data platform eng, data engineers, ...) - this can have a huge impact on how data scientists spend their time.

17.11.2024 10:39 👍 0 🔁 0 💬 1 📌 0

What do you mean by data team? Data scientists? Data platform engineers? Data engineers? Machine learning engineers? What's the org context you're referring to? Startup? Corporate? I have some considerations in mind from a data scientist perspective, but I'm an IC, so take this with a grain of salt.

17.11.2024 10:39 👍 0 🔁 0 💬 1 📌 0

Should we pay attention to how AI is changing our work? Sure! But staying on top of changes and evolutions in the field is part of the job, and AI is only one of many things driving this constant shift.
bsky.app/profile/apre...

16.11.2024 22:01 👍 1 🔁 0 💬 0 📌 0

The image shows a graph plotting "Quality of question" vs "Strength of evidence." It features three paths: an ideal but unrealistic path starting from high quality question, a path showing statistical/ML progression from a poor question to a poor question with a precise answer, and the "job of the data scientist" path that progresses along both question quality and strength of evidence.

As data scientists, our greatest impact isn't in delivering more precise answers - it's in improving the quality of the questions being asked. That's why I'm not worried about AI "taking our jobs," though it can definitely help us do them better.

16.11.2024 22:01 👍 2 🔁 0 💬 1 📌 0

Arik Friedman

Latest posts by Arik Friedman @arikf.net