Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.
Outperforms all models at similar GPU RAM usage and tokens throughputs
Blog post: huggingface.co/blog/smolvlm
26.11.2024 16:58
👍 231
🔁 31
💬 4
📌 1
We've also observed that the impact of FP8 varies depending on model size and training scenarios (e.g., continual pre-training, from-scratch training, SFT). A comprehensive evaluation requires significant computational resources—this is not a trivial issue. (2/n)
25.11.2024 00:44
👍 0
🔁 0
💬 1
📌 0
📢 New findings on FP8 training for Continual Pre-Training! 🚀
Our experiments on Llama-3-70B show that FP8 significantly boosts training throughput (415 → 570 TFLOP/s) but induces loss spikes, leading to downstream performance drops. FP8 isn't always the best choice—it depends! (1/n)
25.11.2024 00:43
👍 5
🔁 0
💬 1
📌 0