Soledad Galli, PhD's Avatar

Soledad Galli, PhD

@solegalli

Data scientist, best selling instructor, book author, Python 🐍 open-source developer (check out Feature-engine). Find out more at Train in Data: https://www.trainindata.com/

80
Followers
15
Following
182
Posts
24.11.2024
Joined
Posts Following

Latest posts by Soledad Galli, PhD @solegalli

Preview
AI companies will fail. We can salvage something from the wreckage | Cory Doctorow AI is asbestos in the walls of our tech society, stuffed there by monopolists run amok. A serious fight against it must strike at its roots

This is what I think of the AI bubble:

20.01.2026 02:40 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Is Boruta dead? - Train in Data's Blog The most exhaustive discussion on boruta in machine learning. Learn what it is, advantages and limitations, and its Python implementation.

A few years back BORUTA was all over the web and data science competition forums.

Since then... silence... is it really dead?

I did some research, and this is what I found out:
www.blog.trainindata.com/is-boruta-de...

12.01.2026 12:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New payment method rolled out for all our courses!

You can now pay in your own currency* and avoid hidden bank or country specific fees.

We look forward to seeing you on our courses.

*Atm only 20 currencies are supported.

champ.ly/6WkK6AA3

05.01.2026 17:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Should You Use Imbalanced-Learn in 2025? - Train in Data's Blog I discuss the latest evidence on the use of undersampling and SMOTE for imbalanced data and whether the Python library is still useful.

Should you use imbalanced-learn in 2025?

SMOTE, oversampling and undersampling have been proposed as the power horses to tackle imbalanced data.

But do they really work?

We talk about that in this article.
www.blog.trainindata.com/should-you-u...

03.12.2025 12:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Moving Average Forecasting: What You Need to Know - Train in Data's Blog Learn moving average forecasting with clear examples, practical applications, and accuracy tips for better time series predictions.

Moving averages has been long used as a forecasting benchmark model.

Did you know that you can also use moving averages as input features?

If not, check out this blog to find out more, together with Python implementations:

www.blog.trainindata.com/master-movin...

03.11.2025 12:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Discover the latest thoughts on working with imbalanced data with our free booklet.

We discuss 3 recent articles that have changed the conversation on resampling and SMOTEπŸ‘‡

www.trainindata.com/p/7-takes-on...

27.10.2025 12:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

All our courses come with a 30-Day money back guarantee...

If you are unhappy for whatever reason, we give you the money back.

That confident we are that you'll ❀️ our courses.

#trainindata

24.10.2025 23:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next Monday on Data Bites : Six Cloud Platforms to Run Jupyter Notebooks for Free πŸš€

Want to know more?

Click the link below to subscribe and stay tuned!πŸ‘‡
https://f.mtr.cool/bltkmoeitj

#machinelearning #datascience #jupyter #mlmodels #ML #mltools #notebooks #cloudplatforms

29.08.2025 10:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
ADASYN: Adaptive Synthetic Sampling for Imbalanced Datasets - Train in Data's Blog ADASYN can be used to handle data imbalance by creating synthetic samples of the minority class and improve model performance. Really?

Imbalanced datasets can mess with your ML models. 😬
ADASYN (Adaptive Synthetic Sampling) to the rescue! πŸš€

Learn how it works + when to use it in our latest blog πŸ‘‡
https://f.mtr.cool/rqstrumpnx

#MachineLearning #DataScience #ImbalancedData #ADASYN

28.08.2025 16:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

πŸ‘‰MICE is a powerful method for datasets with missing data across multiple variables.Β 

Let this slide guide you through how it works.Β 

#machinelearning #MICE #mlmodels #datascience #dataengineering #imputation #featureengineering

27.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

How to construct ensembles from a thousand models?

In this article, Caruana, a prominent figure in machine learning and ensemble methods, tells us more about how they create ensembles from libraries of 1000s of machine learning models.Β 
πŸ“„ https://f.mtr.cool/fpaqqnqxms

26.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Clustering & Dimensionality Reduction: your toolkit for finding patterns, simplifying data, and solving real-world problems.

πŸ” You’ll:
βœ… Group data (K-means, DBSCAN & more)
βœ… Reduce complexity (PCA, UMAP)
βœ… Work on real cases like RNA profiling

πŸ“ https://f.mtr.cool/hdjiwbbsbl

25.08.2025 16:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next Monday on Data Bites : Working with imbalanced data? Follow these 3 steps.

Want to know more?

Click the link below to subscribe and stay tuned!πŸ‘‡
https://f.mtr.cool/svpfklfpda

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume

22.08.2025 10:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Confusion Matrix, Precision, and Recall - Train in Data's Blog Find out what the confusion matrix is and how it relates to other classification metrics like precision, recall and f1-score.

Model performance matters! 🎯 

In this article, we break down essential evaluation metrics for classification models, starting with the Confusion Matrix. Perfect for anyone looking to build reliable #machinelearning systems!

Have a good readπŸ‘‡

21.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - eli5-org/eli5: A library for debugging/inspecting machine learning classifiers and explaining their predictions A library for debugging/inspecting machine learning classifiers and explaining their predictions - eli5-org/eli5

ELI5 now supports scikit-learn 1.6.0! πŸŽ‰It wasn’t working with the latest version of scikit-learn, but that’s a thing of the past.

As of now, ELI5 has released a new version with full support for scikit-learn >1.6.0 and Python >3.10.

Check it out πŸ‘‡

20.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Can we use statistical tests to select features? πŸ€”

Turns out, we can! πŸŽ‰

In the slides below, we’ll explore the most commonly used statistical tests for feature selection, along with their advantages and limitations. πŸ‘‡

#machinelearning #datascience #featureselection

19.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🚨 It’s here! Our new course on Clustering & Dimensionality Reduction just dropped πŸŽ‰

Learn how to group data (K-Means, DBSCAN, Louvain) + simplify it with PCA & UMAP, no prior experience needed!

Hands-on & practical πŸ‘‡
πŸ‘‰Β  https://f.mtr.cool/zshxexbrds

#MachineLearning #DataScience

18.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next Monday on Data Bites : How to Write a Winning Data Science CV

Want to know more?

Click the link below to subscribe and stay tuned!πŸ‘‡
https://f.mtr.cool/nozrfuruar

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume

15.08.2025 10:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Deep learning has transformed our daily lives, but designing neural networks remains a challenge.Β 

Automated hyperparameter optimization (HPO) streamlines the process. This paper reviews key techniques & tools for improving model accuracy & efficiency.
πŸ“ƒhttps://f.mtr.cool/wowjcrmwjg

14.08.2025 16:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

In case you were wondering πŸ‘‡Β 

#machinelearning #ai #datascience #dataengineering #mlmodels

13.08.2025 16:02 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

🚨 SMOTE has long been hailed as the go-to solution for imbalanced datasets, but it only works in specific scenarios. 

In this article, we explore when SMOTE is truly effective & why it’s remained popular.Β 

Check it out!
https://f.mtr.cool/medbbpfril

12.08.2025 16:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🚨 Just launched: our new course on Clustering & Dimensionality Reduction is live at Train in Data!

Learn to group data, reduce complexity with PCA & UMAP, and tackle real-world projects (no experience needed!)

πŸŽ“ Join us:Β https://f.mtr.cool/wlhxbboqkl

11.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next Monday on Data Bites : Everybody says β€œSMOTE does not work”.

Want to know more?

Click the link below to subscribe and stay tuned!πŸ‘‡
https://f.mtr.cool/pinchbaedf

#machinelearning #datascience #smote #mlmodels #ML

08.08.2025 10:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.

In this video, I review hyperparameter optimization techniques like Grid Search, Random Search, & Bayesian methods.

Learn their pros, cons, and best applications for both low and high-dimensional spaces!Β 

What techniques do you use?Β 
πŸ“½οΈ

07.08.2025 16:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🐍Python libraries that implement agnostic global explainability methods πŸ‘‡Β 

#python #machinelearning #MLModel #datascience #dataengineering

06.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Most commonly used encoding techniques ⬇️

1. OneHotEncoder
2. OrdinalEncoder
3. TargetEncoder

When one-hot encoding gets too complex and ordinal encoding leads to inaccuracies, TargetEncoding often becomes the best choice. Learn more at the link below.

#targetencoder #ML

05.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🚨 New Course - Clustering & Dimensionality Reduction at Train in Data

Learn to apply unsupervised ML in practice πŸ‘‡
βœ… K-Means, DBSCAN, HDBSCAN, Graph-based
βœ… PCA & UMAP
βœ… Real-world projects incl. RNA case study

Find out more : https://f.mtr.cool/cojxgkyhgq

04.08.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next Monday on Data Bites : Probe Feature Selection

Want to know more?

Click the link below to subscribe and stay tuned!πŸ‘‡
https://f.mtr.cool/xefqrzzgeh

#machinelearning #datascience #imbalanceddata #undersampling #mlmodels #ML

01.08.2025 10:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The most crucial component of any machine learning project is data!
Β 
 ▢️ 90% of the time is spent on data preprocessingΒ 
 ▢️ 10% of the time is spent on model building, tuning and evaluation.

#machinelearning #ML #MLmodels #preprocessing #modelbuilding #datascience

31.07.2025 16:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.

Discover the truth behind SMOTE for imbalanced data and explore better alternatives.

Learn more about metrics, threshold optimization, and classifier calibration in this video.

If you find it useful, don’t hesitate to share with your peers! πŸ™
https://www.youtube.com/watch?v=blcOOheXNoQ

#ml

30.07.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0