Eric Leung (@erictleung)

I remeber having an interview years ago about running ML models in production and got rejected because I didn't have these skills. I was surpsied to hear that this kind of work existed. but I think i'm slowly understanding now, especially with LLMs basically being that

10.03.2026 20:50 👍 0 🔁 0 💬 0 📌 0

the last API interface I played around with was Flask, which has similar syntax to FastAPI, but the nice features like automatic documentation and more has been a welcomed improvement

10.03.2026 20:42 👍 0 🔁 0 💬 1 📌 0

FastAPI FastAPI framework, high performance, easy to learn, fast to code, ready for production

learning some API development with FastAPI has been pretty slick fastapi.tiangolo.com

10.03.2026 20:42 👍 0 🔁 0 💬 1 📌 0

been useful to just try things out without having to boot up Python locally or just experimenting on the go on mobile

05.03.2026 22:04 👍 0 🔁 0 💬 0 📌 0

the official Python website with an interactive Python shell built into the front page, showing libraries like numpy and sci kit learn are installed

need a quick Python REPL, but you're away from your main computer? but you also need libraries like numpy and scikit-learn? the python.org homepage has got you covered

05.03.2026 22:04 👍 5 🔁 2 💬 1 📌 0

the autocomplete for everything is so amazing

05.03.2026 18:27 👍 0 🔁 0 💬 0 📌 0

GitHub - prompt-toolkit/ptpython: A better Python REPL A better Python REPL. Contribute to prompt-toolkit/ptpython development by creating an account on GitHub.

i may be behind the times regarding Python REPLs, but just found ptpython. it works super well! i'm impressed. the last innovation in Python REPLs i saw was years ago with ipython

github.com/prompt-toolk...

05.03.2026 18:26 👍 0 🔁 0 💬 1 📌 0

Taking a screenshot of a web page

need to reproducibly take a screenshot of a webpage using R? you can use the chromate package to do this, so cool

rstudio.github.io/chromote/art...

04.03.2026 00:41 👍 0 🔁 0 💬 0 📌 0

How Do You Validate a Marketing Mix Model (MMM)? The Complete Guide Validating an MMM is not about chasing perfect stats. It is about making sure your model is accurate, stable, causal, and believable.

how to validate a marketing mix model

www.stellaheystella.com/blog/how-do-...

02.03.2026 08:10 👍 0 🔁 0 💬 0 📌 0

A Simplified Approach to Propensity Modeling Build a Propensity Model Without the ML Overkill. This post breaks down a lightweight, production-ready approach to propensity modeling that any analytics engineer can run with Python, SQL, and a good...

another way to do propensity matching without regression and only using SQL is to do some bucketing

www.datacult.com/post/propens...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

psmpy Propensity score matching for python and graphical plots

Python package to do propensity matching

pypi.org/project/psmpy/

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

psmpy: Propensity Score Matching in Python! | Towards Data Science Performing propensity score matching in a python environment using a newly available library: psmpy (graphical plotting features...

preparing for questions on propensity matching, especially in the marketing world. use a regression based on each of the covariates you want to match case and controls with while trying to predict intervention as a way to match them

towardsdatascience.com/psmpy-propen...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Outlier - Wikipedia

trying to prepare questions about how to detect outliers and what you might do with them

en.wikipedia.org/wiki/Outlier

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Wikipedia page and similar external page on model validation, especially for regressions like looking at the residuals vs fitted values plot

en.wikipedia.org/wiki/Statist...

library.virginia.edu/data/article...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Weibull distribution - Wikipedia

because i was looking at some marketing roles and thinking about survival analysis, i thought about the Weibull distribution, which can be used to model the change in time-to-failure rate over time if needed

en.wikipedia.org/wiki/Weibull...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Multicollinearity in Regression Analysis: Problems, Detection, and Solutions Multicollinearity is when independent variables in a regression model are correlated. I explore its problems, testing your model for it, and solutions.

more emphasis on multicollinearity, with detecting them using Variance Inflation Factors (VIF) and dealing with them by dropping correlated variables or centering the variables

statisticsbyjim.com/regression/m...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

False discovery rate - Wikipedia

false discovery rate is rate of type I errors in null hypothesis testing when conducting multiple comparison, which is defined as FDR = FP / (FP + TP)

en.wikipedia.org/wiki/False_d...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Model selection - Wikipedia

more info on selecting a model from regression or machine learning

en.wikipedia.org/wiki/Model_s...

02.03.2026 08:10 👍 1 🔁 0 💬 1 📌 0

Chapter 7 Exploring Data | Data Science at the Command Line, 2e After all that hard work (unless you already had clean data lying around), it’s time for some fun. Now that you have obtained and scrubbed your data, you can continue with the third step of the...

i considered looking at terminal CLI ways of exploring data. not sure if I'll use it much, but otherwise, this is a good reference book for doing data science at the command line

jeroenjanssens.com/dsatcl/chapt...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

8 Simple Techniques to Prevent Overfitting | Towards Data Science Overfitting occurs when the model performs well on training data but generalizes poorly to unseen data. Overfitting is a very common problem in Machine Learning and there has been an extensive range o...

overfitting techniques

- Hold-out
- Cross-validation
- Data augmentation (transform data to make more)
- Feature selection
- L1 / L2 regularization
- Remove layers/number of units per layer (in neural network)
- Dropout (of connections in NN)
- Early stopping

towardsdatascience.com/8-simple-tec...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Testing the assumptions of linear regression

regression diagnostics and assumptions, like linearity and additivity of the independent variables, statistical independence of errors, homoscedasticity of errors, and normality of errors

people.duke.edu/~rnau/testin...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

List of statistics articles - Wikipedia

a generic reference for Wikipedia and all the statistics articles on there

en.wikipedia.org/wiki/List_of...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

What Is a Statistical Model and How to Build One - Do My Stats Discover how a statistical model helps analyze data and unlock insights, but the key steps to building one are essential to master.

just in case i needed to talk about statistical model building. it was good to catch up on terms like multicollinearity, overfitting, model evaluation metrics (like BIC and AUC), feature selection and feature engineering

domystats.com/advanced-met...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Elements and Principles of Data Analysis The data revolution has led to an increased interest in the practice of data analysis. As a result, there has been a proliferation of "data science" training programs. Because data science has been pr...

a more conceptual and philosophical read about the elements of data analysis and data science

arxiv.org/abs/1903.076...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Exploratory Data Analysis (EDA) Techniques with Pandas Explore how to perform effective Exploratory Data Analysis (EDA) using Pandas, a powerful Python library. Learn data loading, cleaning, visualization, and advanced EDA techniques.

for those technical live-coding interviews, a quick overview of what pandas can do (i haven't used this in a bit, so this was a good review for me), like

df.dtypes
df.describe()
df['categorical_column'].value_counts()
df.dropna()
df.drop_duplicates()

diogoribeiro7.github.io/data%20scien...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Absolutely the simplest introduction to Bayesian statistics – Statistical Biophysics Blog

a nice read about the basics of Bayesian modeling, including importance of p(model | data) as a focus, a shift in mindset to worry about the distribution of models themselves, prior distributions are more important with less data, and quantifying uncertainty

statisticalbiophysicsblog.org?p=233

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Data Science Spotlight: Cracking the SQL Interview at Instacart (LLM Edition) By: Anahita Tafvizi, Michael Curran, Monta Shen

a linked article on the SQL interview at Instacart that was interesting to read, especially incorporating the fact that on the job, you'll likely have access to some LLM to help you out anyways, and focusing more on "prompt engineering" to get to the core business

tech.instacart.com/data-science...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Advice for Data Scientists/Statisticians interested in working at Instacart Advice for Data Scientists/Statisticians interested in working at Instacart Author: Tim Hesterberg Feel free to share this externally. Short link: https://bit.ly/3Kp3VBT Much of this is relevan...

"Advice for Data Scientists/Statisticians interested in working at Instacart"

some good specific but general questions and tips to consider when interviewing for data roles

docs.google.com/document/d/1...

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

piqued my interest to open up a resource on how to use AI in data analysis and found this short course

gabors-data-analysis.com/ai-course/

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

it lists the following as risk factors:

boundary erosion (abstractions become fluid),
entanglement (of features),
hidden feedback loops,
undeclared consumers,
data dependencies,
configuration issues,
changes in the external world, and
a variety of system-level anti-patterns

02.03.2026 08:10 👍 0 🔁 0 💬 1 📌 0

Eric Leung

Latest posts by Eric Leung @erictleung