Soledad Galli, PhD (@solegalli)

AI companies will fail. We can salvage something from the wreckage | Cory Doctorow AI is asbestos in the walls of our tech society, stuffed there by monopolists run amok. A serious fight against it must strike at its roots

This is what I think of the AI bubble:

20.01.2026 02:40 👍 1 🔁 0 💬 0 📌 0

Is Boruta dead? - Train in Data's Blog The most exhaustive discussion on boruta in machine learning. Learn what it is, advantages and limitations, and its Python implementation.

A few years back BORUTA was all over the web and data science competition forums.

Since then... silence... is it really dead?

I did some research, and this is what I found out:
www.blog.trainindata.com/is-boruta-de...

12.01.2026 12:30 👍 1 🔁 0 💬 0 📌 0

New payment method rolled out for all our courses!

You can now pay in your own currency* and avoid hidden bank or country specific fees.

We look forward to seeing you on our courses.

*Atm only 20 currencies are supported.

champ.ly/6WkK6AA3

05.01.2026 17:45 👍 0 🔁 0 💬 0 📌 0

Should You Use Imbalanced-Learn in 2025? - Train in Data's Blog I discuss the latest evidence on the use of undersampling and SMOTE for imbalanced data and whether the Python library is still useful.

Should you use imbalanced-learn in 2025?

SMOTE, oversampling and undersampling have been proposed as the power horses to tackle imbalanced data.

But do they really work?

We talk about that in this article.
www.blog.trainindata.com/should-you-u...

03.12.2025 12:30 👍 0 🔁 0 💬 0 📌 0

Moving Average Forecasting: What You Need to Know - Train in Data's Blog Learn moving average forecasting with clear examples, practical applications, and accuracy tips for better time series predictions.

Moving averages has been long used as a forecasting benchmark model.

Did you know that you can also use moving averages as input features?

If not, check out this blog to find out more, together with Python implementations:

www.blog.trainindata.com/master-movin...

03.11.2025 12:30 👍 0 🔁 0 💬 0 📌 0

Discover the latest thoughts on working with imbalanced data with our free booklet.

We discuss 3 recent articles that have changed the conversation on resampling and SMOTE👇

www.trainindata.com/p/7-takes-on...

27.10.2025 12:30 👍 0 🔁 0 💬 0 📌 0

All our courses come with a 30-Day money back guarantee...

If you are unhappy for whatever reason, we give you the money back.

That confident we are that you'll ❤️ our courses.

#trainindata

24.10.2025 23:28 👍 0 🔁 0 💬 0 📌 0

Next Monday on Data Bites : Six Cloud Platforms to Run Jupyter Notebooks for Free 🚀

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/bltkmoeitj

#machinelearning #datascience #jupyter #mlmodels #ML #mltools #notebooks #cloudplatforms

29.08.2025 10:02 👍 1 🔁 0 💬 0 📌 0

ADASYN: Adaptive Synthetic Sampling for Imbalanced Datasets - Train in Data's Blog ADASYN can be used to handle data imbalance by creating synthetic samples of the minority class and improve model performance. Really?

Imbalanced datasets can mess with your ML models. 😬
ADASYN (Adaptive Synthetic Sampling) to the rescue! 🚀

Learn how it works + when to use it in our latest blog 👇
https://f.mtr.cool/rqstrumpnx

#MachineLearning #DataScience #ImbalancedData #ADASYN

28.08.2025 16:02 👍 2 🔁 0 💬 0 📌 0

👉MICE is a powerful method for datasets with missing data across multiple variables.

Let this slide guide you through how it works.

#machinelearning #MICE #mlmodels #datascience #dataengineering #imputation #featureengineering

27.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

How to construct ensembles from a thousand models?

In this article, Caruana, a prominent figure in machine learning and ensemble methods, tells us more about how they create ensembles from libraries of 1000s of machine learning models.
📄 https://f.mtr.cool/fpaqqnqxms

26.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Clustering & Dimensionality Reduction: your toolkit for finding patterns, simplifying data, and solving real-world problems.

🔍 You’ll:
✅ Group data (K-means, DBSCAN & more)
✅ Reduce complexity (PCA, UMAP)
✅ Work on real cases like RNA profiling

📍 https://f.mtr.cool/hdjiwbbsbl

25.08.2025 16:02 👍 1 🔁 0 💬 0 📌 0

Next Monday on Data Bites : Working with imbalanced data? Follow these 3 steps.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/svpfklfpda

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume

22.08.2025 10:02 👍 0 🔁 0 💬 0 📌 0

Confusion Matrix, Precision, and Recall - Train in Data's Blog Find out what the confusion matrix is and how it relates to other classification metrics like precision, recall and f1-score.

Model performance matters! 🎯

In this article, we break down essential evaluation metrics for classification models, starting with the Confusion Matrix. Perfect for anyone looking to build reliable #machinelearning systems!

Have a good read👇

21.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

GitHub - eli5-org/eli5: A library for debugging/inspecting machine learning classifiers and explaining their predictions A library for debugging/inspecting machine learning classifiers and explaining their predictions - eli5-org/eli5

ELI5 now supports scikit-learn 1.6.0! 🎉It wasn’t working with the latest version of scikit-learn, but that’s a thing of the past.

As of now, ELI5 has released a new version with full support for scikit-learn >1.6.0 and Python >3.10.

Check it out 👇

20.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Can we use statistical tests to select features? 🤔

Turns out, we can! 🎉

In the slides below, we’ll explore the most commonly used statistical tests for feature selection, along with their advantages and limitations. 👇

#machinelearning #datascience #featureselection

19.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

🚨 It’s here! Our new course on Clustering & Dimensionality Reduction just dropped 🎉

Learn how to group data (K-Means, DBSCAN, Louvain) + simplify it with PCA & UMAP, no prior experience needed!

Hands-on & practical 👇
👉 https://f.mtr.cool/zshxexbrds

#MachineLearning #DataScience

18.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Next Monday on Data Bites : How to Write a Winning Data Science CV

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/nozrfuruar

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume

15.08.2025 10:02 👍 0 🔁 0 💬 0 📌 0

Deep learning has transformed our daily lives, but designing neural networks remains a challenge.

Automated hyperparameter optimization (HPO) streamlines the process. This paper reviews key techniques & tools for improving model accuracy & efficiency.
📃https://f.mtr.cool/wowjcrmwjg

14.08.2025 16:02 👍 1 🔁 0 💬 0 📌 0

In case you were wondering 👇

#machinelearning #ai #datascience #dataengineering #mlmodels

13.08.2025 16:02 👍 0 🔁 1 💬 0 📌 0

🚨 SMOTE has long been hailed as the go-to solution for imbalanced datasets, but it only works in specific scenarios.

In this article, we explore when SMOTE is truly effective & why it’s remained popular.

Check it out!
https://f.mtr.cool/medbbpfril

12.08.2025 16:01 👍 0 🔁 0 💬 0 📌 0

🚨 Just launched: our new course on Clustering & Dimensionality Reduction is live at Train in Data!

Learn to group data, reduce complexity with PCA & UMAP, and tackle real-world projects (no experience needed!)

🎓 Join us: https://f.mtr.cool/wlhxbboqkl

11.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Next Monday on Data Bites : Everybody says “SMOTE does not work”.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/pinchbaedf

#machinelearning #datascience #smote #mlmodels #ML

08.08.2025 10:01 👍 0 🔁 0 💬 0 📌 0

Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.

In this video, I review hyperparameter optimization techniques like Grid Search, Random Search, & Bayesian methods.

Learn their pros, cons, and best applications for both low and high-dimensional spaces!

What techniques do you use?
📽️

07.08.2025 16:02 👍 1 🔁 0 💬 0 📌 0

🐍Python libraries that implement agnostic global explainability methods 👇

#python #machinelearning #MLModel #datascience #dataengineering

06.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Most commonly used encoding techniques ⬇️

1. OneHotEncoder
2. OrdinalEncoder
3. TargetEncoder

When one-hot encoding gets too complex and ordinal encoding leads to inaccuracies, TargetEncoding often becomes the best choice. Learn more at the link below.

#targetencoder #ML

05.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

🚨 New Course - Clustering & Dimensionality Reduction at Train in Data

Learn to apply unsupervised ML in practice 👇
✅ K-Means, DBSCAN, HDBSCAN, Graph-based
✅ PCA & UMAP
✅ Real-world projects incl. RNA case study

Find out more : https://f.mtr.cool/cojxgkyhgq

04.08.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Next Monday on Data Bites : Probe Feature Selection

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/xefqrzzgeh

#machinelearning #datascience #imbalanceddata #undersampling #mlmodels #ML

01.08.2025 10:02 👍 1 🔁 0 💬 0 📌 0

The most crucial component of any machine learning project is data!

▶️ 90% of the time is spent on data preprocessing
▶️ 10% of the time is spent on model building, tuning and evaluation.

#machinelearning #ML #MLmodels #preprocessing #modelbuilding #datascience

31.07.2025 16:02 👍 1 🔁 0 💬 0 📌 0

Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.

Discover the truth behind SMOTE for imbalanced data and explore better alternatives.

Learn more about metrics, threshold optimization, and classifier calibration in this video.

If you find it useful, don’t hesitate to share with your peers! 🙏
https://www.youtube.com/watch?v=blcOOheXNoQ

#ml

30.07.2025 16:02 👍 0 🔁 0 💬 0 📌 0

Soledad Galli, PhD

Latest posts by Soledad Galli, PhD @solegalli