A common trend across recent research in using reinforcement learning to train reasoning models is that the clipping operation within a trust region (core to PPO, adopted by GRPO) is squashing rare tokens that are key to clever behaviors like verification or backtracking.
17.06.2025 02:38
๐ 35
๐ 6
๐ฌ 2
๐ 3
Tried creating an AI chatbot on Instagram
01.02.2025 19:52
๐ 1
๐ 0
๐ฌ 0
๐ 0
A high-level summary diagram taken from the slides linked below. It shows the interplay of two main components: a probabilistic model and decision maker or planner.
Probabilistic predictions of an underfitting polynomial classifier on a noisy XOR task and the corresponding under-confident calibration curve.
Probabilistic predictions of an overfitting polynomial classifier and the resulting overconfident calibration curve on the same noisy XOR problem.
Simulation study to show the relative lack of stability of hyperparameter tuning when using hard metrics such as Accuracy or soft yet not probabilistic metrics such as ROC AUC compared to a strictly proper scoring rule such as the log-loss.
I recently shared some of my reflections on how to use probabilistic classifiers for optimal decision-making under uncertainty at @pydataparis.bsky.social 2024.
Here is the recording of the presentation:
www.youtube.com/watch?v=-gYn...
27.11.2024 14:17
๐ 49
๐ 19
๐ฌ 1
๐ 1
GitHub - probabl-ai/skore: Your scikit-learn Modeling Companion
Your scikit-learn Modeling Companion. Contribute to probabl-ai/skore development by creating an account on GitHub.
Bringing structure and recommended practices to Machine Learning projects can be challenging. Even experienced data scientists struggle with it.
That's why we built skore โ your companion when modeling with scikit-learn. Check it out and let us know what you think!
github.com/probabl-ai/s...
13.12.2024 09:30
๐ 11
๐ 4
๐ฌ 0
๐ 1
Which setup would you choose for running large language models (LLMs) locally ?
Option 1:
โข Apple M4 Max
โข 14-core CPU, 32-core GPU
โข 36 GB unified memory
โข 1 TB SSD
Option 2:
โข Apple M4 Pro
โข 14-core CPU, 20-core GPU
โข 48 GB unified memory
โข 1 TB SSD
25.11.2024 18:31
๐ 1
๐ 0
๐ฌ 1
๐ 0
@bsky.app The translate takes us to a tab and does the conversion. Is there an update which makes it English in place?
23.11.2024 09:39
๐ 1
๐ 0
๐ฌ 0
๐ 0
Love the Starter Pack by @bsky.app .. brilliant idea! Quickly finds all your X & Threads follows in one go!
23.11.2024 09:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
We're always updating the pydata & scipy project starter pack:
go.bsky.app/6HkrMcp
Hello @scikit-learn.bsky.social , @networkx.bsky.social , @scipyconf.bsky.social
22.11.2024 17:46
๐ 53
๐ 20
๐ฌ 6
๐ 1
One of my fav projects: LeanRL, a simple RL library that provides recipes for fast RL training using torch.compile and cudagraphs.
Using these, we got >6x speed-ups compared to the original CleanRL implementations.
github.com/pytorch-labs...
22.11.2024 06:38
๐ 33
๐ 5
๐ฌ 2
๐ 1
A statistical approach to model evaluations
A research paper from Anthropic on how to apply statistics to improve language model evaluations
www.anthropic.com/research/sta...
This is an excellent attempt (blog & paper) at bringing more statistical rigor to evaluation of ML models (this is specifically focused on LLM evals).
I feel like we need to have similar clear standards for many types of predictive models in biology. 1/
22.11.2024 08:29
๐ 153
๐ 21
๐ฌ 4
๐ 5
๐
22.11.2024 04:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
The Llama 3.2 1B and 3B models are my favorite LLMs -- small but very capable.
If you want to understand how the architectures look like under the hood, I implemented them from scratch (one of the best ways to learn): github.com/rasbt/LLMs-f...
20.11.2024 08:33
๐ 141
๐ 16
๐ฌ 7
๐ 1
How do I get myself added here?
22.11.2024 04:23
๐ 0
๐ 0
๐ฌ 0
๐ 0