You can just ask things π£οΈ
"show me messages in the coding category that are in the top 10% of reward model scores"
Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset π₯
You can just ask things π£οΈ
"show me messages in the coding category that are in the top 10% of reward model scores"
Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset π₯
Most liked and most downloaded open-source AI models from 2022 to 2024
Interactive viz: aiworld.eu/embed/model/...
Discussion: huggingface.co/spaces/huggi...
It doesn't get easier than this. Why are you writing SQL by yourself when it's almost 2025
The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @hf.co dataset β¨
Here's the space by @reach-vb.hf.co
huggingface.co/spaces/reach...
This is insane! Structured generation in the browser with the new @hf.co SmolLM2-1.7B model
β’ Tiny 1.7B LLM running at 88 tokens / second β‘
β’ Powered by MLC/WebLLM on WebGPU π₯
β’ JSON Structured Generation entirely in the browser π€
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.
Outperforms all models at similar GPU RAM usage and tokens throughputs
Blog post: huggingface.co/blog/smolvlm
I did it via
Settings > Account > Handle > I have my own domain
and it should show there!
You can literally do the histogram in one line in less than 10 seconds π¨
> from histogram(train, "Average β¬οΈ")
Here's what the model licenses look like:
Lots of great open licenses in there too! πͺ
The OpenLLM Leaderboard just passed 2k evals π₯³
Here's a look at the distribution of average scores for all those models!
Great work by the @huggingface.bsky.social team to do these evals!
Let us know what you think or what you want to see :)
cc: @davidberenstein.bsky.social
Letβs go!
** log and get out of the way **
using supabase theme, @tylerhillery.com would approve
Automatically tracking all Ollama requests to a dataset with the new observers python library!
With just a few lines of code all your requests can be sent to @huggingface.bsky.social datasets for annotating, analysis and observability π
Here's the library! Was fun collaborating with
@davidberenstein.bsky.social bringing the datasets and argilla all together!
github.com/cfahlgren1/o...
The main three stores are:
β’ DuckDB (local, SQL over traces)
β’ Hugging Face Datasets (dataset viewer, sql console)
β’ Argilla - annotation and filtering UI
observers π - automatically log all OpenAI compatible requests to a dataset π½
β’ supports any OpenAI compatible endpoint πͺ
β’ supports @duckdb.org, @huggingface.bsky.social datasets and Argilla as stores
> pip install observers
Thatβs okay, there are lots of incomplete and even snapshots. The UpVoteWeb reddit dataset is one that comes to mind.
Any data that is more accessible is a win :). My hub stats dataset is just a cron script as well haha
huggingface.co/datasets/Ope...
+1 and let me know if you need any help with it @tobilg.com would be nice to have the dataset viewer for it!
We just released a library that makes it pretty seamless to send traces and LLM requests to datasets
github.com/cfahlgren1/o...
Would love to hear what you think is missing for prompts?
SmolTalk is out π£οΈ
Over 1M high quality instructions used for training SmolLM2, one of the best small language models in the industry.
huggingface.co/datasets/Hug...
lightweight SDK for AI observability
Observers: A Lightweight SDK for AI Observability
TLDR;
- Track and record interactions with AI models
- Store observations in multiple backends @huggingface.bsky.social, @duckdb.org or Argilla
- Query and analyse your AI interactions with ease
GitHub:
github.com/cfahlgren1/o...
Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...
Range requests + Parquet is what makes the Hugging Face SQL Console possible to query datasets entirely in the browser
duckdb-gsheets v0.0.3 is out, courtesy of @a13x.bsky.social
the power is terrifying! duckdb-gsheets.com
Amazing! I wish it worked in wasm
When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.
The magic? Versioning chunks, not files, giving rise to:
π§ Smarter storage
β© Faster uploads
π Efficient downloads
Curious? Read the blog and let us know how it could help your workflows!