Kazuki Fujii

@kazukifujii

Tokyo Tech CS Master (Rio Yokota Lab → Jun Sakma Lab) Distributed Training, Sytems for Machine Learning

44
Followers 156
Following 3
Posts 04.07.2023
Joined

Posts Following

Latest posts by Kazuki Fujii @kazukifujii

Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm

26.11.2024 16:58 👍 231 🔁 31 💬 4 📌 1

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs Large Language Models (LLMs) have attracted significant attention due to their human-like language understanding and generation capabilities, as well as their applicability across various domains. T...

👉 Read more: arxiv.org/abs/2411.08719

25.11.2024 00:44 👍 0 🔁 0 💬 0 📌 0

We've also observed that the impact of FP8 varies depending on model size and training scenarios (e.g., continual pre-training, from-scratch training, SFT). A comprehensive evaluation requires significant computational resources—this is not a trivial issue. (2/n)

25.11.2024 00:44 👍 0 🔁 0 💬 1 📌 0

📢 New findings on FP8 training for Continual Pre-Training! 🚀
Our experiments on Llama-3-70B show that FP8 significantly boosts training throughput (415 → 570 TFLOP/s) but induces loss spikes, leading to downstream performance drops. FP8 isn't always the best choice—it depends! (1/n)

25.11.2024 00:43 👍 5 🔁 0 💬 1 📌 0