Imbalanced classification: pitfalls and solutions — Probabilistic calibration of cost-sensitive learning
Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
19.08.2025 11:58
👍 23
🔁 10
💬 1
📌 0
A small update on the retrospective and future priorities of the open source team at @probabl.bsky.social for the next 6 months or so.
06.12.2024 17:14
👍 1
🔁 0
💬 0
📌 0
Sometimes you think you are right by doing everything "by the book." But sometimes the book is just a tiny part of the full story. Keep digging and writing a new chapter with more insights is actually fun...
05.12.2024 10:15
👍 1
🔁 1
💬 0
📌 0
Imbalanced-learn: regrets and onwards - with Guillaume Lemaitre, core-maintainer
YouTube video by probabl
New podcast episode! This one is about imbalanced-learn and how the maintainer looks back with some lessons learned.
If you are dealing with imbalanced classification use-cases, like fraud, you'll want to listen in on this one!
youtu.be/npSkuNcm-Og
05.12.2024 09:58
👍 14
🔁 4
💬 0
📌 1
OK it is an interesting feedback. We could support older versions. We saw that up-to-now, we don't have any code that we are eager to drop quickly. I understand about the runtime dependencies and on our side, the idea is only depending on scikit-learn. But agreed that it is one more dependency.
29.11.2024 09:16
👍 0
🔁 0
💬 1
📌 0
GitHub - glemaitre/sklearn-compat
Contribute to glemaitre/sklearn-compat development by creating an account on GitHub.
We are working on a small package to ease developer life: github.com/glemaitre/sk.... The idea is that recurrent work could be centralized in a single package. Once we have a minimal version, we will do a first release to support scikit-learn 1.2 to 1.6
28.11.2024 11:17
👍 15
🔁 1
💬 1
📌 0
A high-level summary diagram taken from the slides linked below. It shows the interplay of two main components: a probabilistic model and decision maker or planner.
Probabilistic predictions of an underfitting polynomial classifier on a noisy XOR task and the corresponding under-confident calibration curve.
Probabilistic predictions of an overfitting polynomial classifier and the resulting overconfident calibration curve on the same noisy XOR problem.
Simulation study to show the relative lack of stability of hyperparameter tuning when using hard metrics such as Accuracy or soft yet not probabilistic metrics such as ROC AUC compared to a strictly proper scoring rule such as the log-loss.
I recently shared some of my reflections on how to use probabilistic classifiers for optimal decision-making under uncertainty at @pydataparis.bsky.social 2024.
Here is the recording of the presentation:
www.youtube.com/watch?v=-gYn...
27.11.2024 14:17
👍 49
🔁 19
💬 1
📌 1
Version 1.6
Legend for changelogs something big that you couldn’t do before., something that you couldn’t do before., an existing feature now may not require as much computation or memory., a miscellaneous min...
Please help us test the first release candidate for scikit-learn 1.6: pip install scikit-learn==1.6.0rc1
Changelog: scikit-learn.org/1.6/whats_ne...
In particular, if you maintain a project with a dependency on
scikit-learn, please let us know about any regression.
22.11.2024 14:49
👍 39
🔁 18
💬 2
📌 2
With Artefact, we are delighted to invite data leaders to an exclusive Paris masterclass: ✨Aligning Probabilistic Classification with Business Decisions using @scikit-learn.bsky.social ✨ 🚨Limited seats available! Secure your spot now 👉🏻 lu.ma/fopoglzo #MachineLearning #Advanced #AI #Masterclass
22.11.2024 06:54
👍 8
🔁 1
💬 0
📌 0