Caleb Fahlgren's Avatar

Caleb Fahlgren

@calebfahlgren.hf.co

SWE @hf.co

859
Followers
142
Following
28
Posts
20.11.2024
Joined
Posts Following

Latest posts by Caleb Fahlgren @calebfahlgren.hf.co

Video thumbnail

You can just ask things πŸ—£οΈ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset πŸ”₯

04.12.2024 08:54 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Most liked and most downloaded open-source AI models from 2022 to 2024

Interactive viz: aiworld.eu/embed/model/...
Discussion: huggingface.co/spaces/huggi...

04.12.2024 08:37 πŸ‘ 86 πŸ” 20 πŸ’¬ 2 πŸ“Œ 4
Video thumbnail

It doesn't get easier than this. Why are you writing SQL by yourself when it's almost 2025

02.12.2024 12:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @hf.co dataset ✨

02.12.2024 12:48 πŸ‘ 18 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Preview
Github Issue Generator - a Hugging Face Space by reach-vb Discover amazing ML apps made by the community

Here's the space by @reach-vb.hf.co

huggingface.co/spaces/reach...

29.11.2024 11:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

This is insane! Structured generation in the browser with the new @hf.co SmolLM2-1.7B model

β€’ Tiny 1.7B LLM running at 88 tokens / second ⚑
β€’ Powered by MLC/WebLLM on WebGPU πŸ”₯
β€’ JSON Structured Generation entirely in the browser 🀏

29.11.2024 11:18 πŸ‘ 11 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm

26.11.2024 16:58 πŸ‘ 231 πŸ” 31 πŸ’¬ 4 πŸ“Œ 1

I did it via

Settings > Account > Handle > I have my own domain

and it should show there!

26.11.2024 16:22 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

You can literally do the histogram in one line in less than 10 seconds πŸ’¨

> from histogram(train, "Average ⬆️")

26.11.2024 12:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Here's what the model licenses look like:

Lots of great open licenses in there too! πŸ’ͺ

26.11.2024 12:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The OpenLLM Leaderboard just passed 2k evals πŸ₯³

Here's a look at the distribution of average scores for all those models!

Great work by the @huggingface.bsky.social team to do these evals!

26.11.2024 12:55 πŸ‘ 15 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Let us know what you think or what you want to see :)

cc: @davidberenstein.bsky.social

25.11.2024 19:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Let’s go!

22.11.2024 20:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

** log and get out of the way **

21.11.2024 20:36 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

using supabase theme, @tylerhillery.com would approve

21.11.2024 20:26 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Create beautiful images of your code Turn your code into beautiful images. Choose from a range of syntax colors, hide or show the background, and toggle between a dark and light window.

ray.so it's great with lots of themes!

21.11.2024 20:26 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Automatically tracking all Ollama requests to a dataset with the new observers python library!

With just a few lines of code all your requests can be sent to @huggingface.bsky.social datasets for annotating, analysis and observability πŸ”­

21.11.2024 20:12 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - cfahlgren1/observers: Track OpenAI compatible requests to a dataset Track OpenAI compatible requests to a dataset. Contribute to cfahlgren1/observers development by creating an account on GitHub.

Here's the library! Was fun collaborating with
@davidberenstein.bsky.social bringing the datasets and argilla all together!
github.com/cfahlgren1/o...

21.11.2024 20:06 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image

The main three stores are:
β€’ DuckDB (local, SQL over traces)
β€’ Hugging Face Datasets (dataset viewer, sql console)
β€’ Argilla - annotation and filtering UI

21.11.2024 20:06 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

observers πŸ”­ - automatically log all OpenAI compatible requests to a dataset πŸ’½

β€’ supports any OpenAI compatible endpoint πŸ’ͺ
β€’ supports @duckdb.org, @huggingface.bsky.social datasets and Argilla as stores

> pip install observers

21.11.2024 20:06 πŸ‘ 13 πŸ” 5 πŸ’¬ 4 πŸ“Œ 0
Preview
OpenCo7/UpVoteWeb Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

That’s okay, there are lots of incomplete and even snapshots. The UpVoteWeb reddit dataset is one that comes to mind.

Any data that is more accessible is a win :). My hub stats dataset is just a cron script as well haha

huggingface.co/datasets/Ope...

21.11.2024 17:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

+1 and let me know if you need any help with it @tobilg.com would be nice to have the dataset viewer for it!

21.11.2024 15:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - cfahlgren1/observers: Track OpenAI compatible requests to a dataset Track OpenAI compatible requests to a dataset. Contribute to cfahlgren1/observers development by creating an account on GitHub.

We just released a library that makes it pretty seamless to send traces and LLM requests to datasets

github.com/cfahlgren1/o...

Would love to hear what you think is missing for prompts?

21.11.2024 15:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
HuggingFaceTB/smoltalk Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

SmolTalk is out πŸ—£οΈ

Over 1M high quality instructions used for training SmolLM2, one of the best small language models in the industry.

huggingface.co/datasets/Hug...

21.11.2024 14:56 πŸ‘ 10 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
lightweight SDK for AI observability

lightweight SDK for AI observability

Observers: A Lightweight SDK for AI Observability

TLDR;
- Track and record interactions with AI models
- Store observations in multiple backends @huggingface.bsky.social, @duckdb.org or Argilla
- Query and analyse your AI interactions with ease

GitHub:
github.com/cfahlgren1/o...

21.11.2024 10:29 πŸ‘ 42 πŸ” 7 πŸ’¬ 4 πŸ“Œ 0
Preview
Foursquare Open Source Places: A new foundational dataset for the geospatial community I did not expect this! > [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer …

Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...

20.11.2024 06:08 πŸ‘ 459 πŸ” 113 πŸ’¬ 23 πŸ“Œ 16

Range requests + Parquet is what makes the Hugging Face SQL Console possible to query datasets entirely in the browser

21.11.2024 06:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

duckdb-gsheets v0.0.3 is out, courtesy of @a13x.bsky.social

the power is terrifying! duckdb-gsheets.com

21.11.2024 03:51 πŸ‘ 68 πŸ” 8 πŸ’¬ 2 πŸ“Œ 4

Amazing! I wish it worked in wasm

21.11.2024 04:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
From Files to Chunks: Improving HF Storage Efficiency We’re on a journey to advance and democratize artificial intelligence through open source and open science.

When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.

The magic? Versioning chunks, not files, giving rise to:

🧠 Smarter storage
⏩ Faster uploads
πŸš€ Efficient downloads

Curious? Read the blog and let us know how it could help your workflows!

20.11.2024 18:51 πŸ‘ 33 πŸ” 14 πŸ’¬ 1 πŸ“Œ 2