Sara Han's Avatar

Sara Han

@sdiazlor.hf.co

ML & Dev Rel @Hugging Face | ๐Ÿ• ๐ŸŽฎ ๐Ÿ’ป ๐Ÿ’ช

786
Followers
494
Following
31
Posts
16.11.2024
Joined
Posts Following

Latest posts by Sara Han @sdiazlor.hf.co

Preview
Synthetic Data Generator - a Hugging Face Space by argilla Build datasets using natural language

Start synthesizing ๐Ÿš€: huggingface.co/spaces/argil...
โœ Blog post: huggingface.co/blog/sdiazlo...

20.01.2025 16:42 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

๐Ÿ’ซ Generate RAG data with the Synthetic Data Generator to improve your RAG system!

1๏ธโƒฃ Generate from your documents, dataset, or dataset description.
2๏ธโƒฃ Configure it.
3๏ธโƒฃ Generate the synthetic dataset.
4๏ธโƒฃ Fine-tune the retrieval and reranking models.
5๏ธโƒฃ Build a RAG pipeline.

20.01.2025 16:42 ๐Ÿ‘ 12 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿš€ Argilla v2.6.0 is here! ๐ŸŽ‰

Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. ๐Ÿคฉ

Take a look to this quick demo ๐Ÿ‘‡

๐Ÿ’โ€โ™‚๏ธ More info about the release at github.com/argilla-io/a...

#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla

19.12.2024 12:39 ๐Ÿ‘ 11 ๐Ÿ” 5 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
Post image

๐Ÿ™…โ€โ™€๏ธ No-code end-to-end example to train your model

1๏ธโƒฃ Use the Synthetic Data Generator to create your custom dataset

2๏ธโƒฃ Use AutoTrain to use the generated dataset and train your model

Check it here: huggingface.co/blog/synthet...

18.12.2024 11:28 ๐Ÿ‘ 11 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

- No code requiredโ€”everything can be handled through the interface.
- 100% free to use.
- Designed to create text classification and chat datasets.
- Review in Argilla and push to the Hub.

16.12.2024 16:14 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Synthetic Data Generator - Build Datasets Using Natural Language
Synthetic Data Generator - Build Datasets Using Natural Language YouTube video by Argilla

Where do I get quality data from? We often need to fine-tune models for very specific scenarios. And thatโ€™s where the Synthetic Data Generator comes in!

Want to see how it works? Watch this quick video (www.youtube.com/watch?v=nXjV...) and get started here: t.co/hJ1b2TsMq0

16.12.2024 16:14 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
glg - galego - Galician Join and contribute to the dataset glg - galego - Galician

Pouco a pouco avanzamos! ๐Ÿš€ Anรญmovos a contribuir, tan sรณ tedes que entrar na ligazรณn, ler as instruciรณns e comezar a anotar โœ

data-is-better-together-fineweb-c.hf.space/share-your-p...

12.12.2024 15:30 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Discussion Forum - a Hugging Face Space by HuggingFaceFW Discover amazing ML apps made by the community

It only takes 2 steps:
- Coordinate with your Language Lead: huggingface.co/spaces/Huggi.... Or become one if it is missing: huggingface.co/spaces/natal...
- Read the guidelines and start annotating according to the educational value: huggingface.co/spaces/data-...

10.12.2024 12:35 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Spanish, Filipino, Amharic, French, German, Basque, Catalan, Galician, Guarani, Telugu, Italian, Pashto, Romanian, Tamil, Urdu, Danish... and many more! All included in the FineWeb2 Community Annotation Sprint! ๐Ÿ”ฅ

๐Ÿ’ซ Join to build an impactful dataset for your language!

10.12.2024 12:35 ๐Ÿ‘ 13 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
data-is-better-together/open-image-preferences-v1-binarized ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.


Binarized dataset: huggingface.co/datasets/dat...
Blog post: huggingface.co/blog/image-p...

09.12.2024 16:26 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Open Image Preferences released! ๐Ÿš€

- Open-source dataset for text2image
- 10K samples manually evaluated by the HF community.
- Binarized format for SFT, DPO, or ORPO.

It comes with a nice blog post explaining the steps to pre-process and generate the data, along with the results.

09.12.2024 16:26 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I'd say that more small models and focus on agents and on-device

07.12.2024 10:25 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Open Source Ai Year In Review 2024 - a Hugging Face Space by huggingface What happened in open-source AI this year, and whatโ€™s next?

huggingface.co/spaces/huggi...

05.12.2024 10:11 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

This is crazy! Were you right in your predictions?

05.12.2024 10:11 ๐Ÿ‘ 9 ๐Ÿ” 1 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1
Preview
Language Lead sign-up At Hugging Face ๐Ÿค—, we're launching a big community initiative to improve LLM training for many languages. We're looking for Language Leads to help us cultivate specific languages during this initiativ...

Language is power! A multilingual annotation sprint for hundreds of languages is starting soon! Step up as a Language Lead and help drive this effort for your language.

If there's already a Language Lead, stay tuned! Is this the start of a nice community?

docs.google.com/forms/d/e/1F...

03.12.2024 11:57 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Lovely!

02.12.2024 10:52 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

To end the week on a high note, my furry friend โญ

01.12.2024 12:16 ๐Ÿ‘ 14 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Argilla | ZenML - Bridging the gap between ML & Ops Annotating data using Argilla.

Docs: docs.zenml.io/stack-compon...

29.11.2024 11:47 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Want to improve your model quality? Implement the data annotation stage in your MLOps effortlessly thanks to the enhanced integration of Argilla with ZenML.

โœจUse the latest Argilla features
โœจImprove human-in-the-loop workflows
โœจManage datasets, track progress, and coordinate your annotation team

29.11.2024 11:47 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Qwen/QwQ-32B-Preview ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Model: huggingface.co/Qwen/QwQ-32B...
Demo: huggingface.co/spaces/Qwen/...

28.11.2024 10:42 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐Ÿš€ QwQ-32B-Preview is available on the Hub!

> The results are very promising, beating o1-mini.
> However, they also have several limitations you might notice even in the demo (I found endless reasoning trying to find out the number of 'r' in ๐Ÿ“). So, let's see how they deal with them.

28.11.2024 10:42 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I do. Big AI companies stealing our data have put us on guard, but good intentions also exist. So, let's learn together from this and find ways to continue building with consent and transparency for everyone, not just those in power.

28.11.2024 09:15 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

It's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. ๐Ÿงต

Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.

27.11.2024 15:23 ๐Ÿ‘ 456 ๐Ÿ” 70 ๐Ÿ’ฌ 29 ๐Ÿ“Œ 8
Post image

Hugging Face inference endpoints now support CPU deployment for llama.cpp ๐Ÿš€ ๐Ÿš€

Why this is a huge deal? Llama.cpp is well-known for running very well on CPU. If you're running small models like Llama 1B or embedding models, this will definitely save tons of money ๐Ÿ’ฐ ๐Ÿ’ฐ

27.11.2024 11:01 ๐Ÿ‘ 25 ๐Ÿ” 6 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 1
Preview
Letโ€™s make a generation of amazing image generation models A Blog post by ben burtenshaw on Hugging Face

Read the blog post: huggingface.co/blog/burtens...

26.11.2024 16:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Image Preferences - Argilla annotation space - a Hugging Face Space by data-is-better-together A community project to create an image preferences dataset.

Steps:
1๏ธโƒฃ Log in to the Argilla Space with your HF account: huggingface.co/spaces/data-...
2๏ธโƒฃ Check the guidelines.
3๏ธโƒฃ Time to start annotating!

Can you climb to the top of the leaderboard? huggingface.co/spaces/data-...

26.11.2024 16:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽจ Help to build an image preference dataset!

> Goal: Release an open-source image dataset, enabling the entire community to benefit from it.

> Requirements: All you need is a Hugging Face account and a willingness to contribute.

More in ๐Ÿงต

26.11.2024 16:20 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

Let's make AI more inclusive.

At @huggingface.bsky.social we'll launch a huge community sprint soon to build high-quality training datasets for many languages.

We're looking for Language Leads to help with outreach.

Find your language and nominate yourself:
forms.gle/iAJVauUQ3FN8...

26.11.2024 06:29 ๐Ÿ‘ 53 ๐Ÿ” 21 ๐Ÿ’ฌ 8 ๐Ÿ“Œ 4
Preview
Labelers training AI say they're overworked, underpaid and exploited by big American tech companies Digital workers in Kenya had to sift through horrific online content to train AI, but say they were underpaid, overworked, and got inadequate mental health support. So they're fighting back.

"Naftali was assigned to train AI to recognize and weed out pornography, hate speech and excessive violence, which meant sifting through the worst of the worst content online for hours on end."
So much of AI is based on exploiting workers in precarious conditions ๐Ÿ˜”
www.cbsnews.com/news/labeler...

25.11.2024 15:16 ๐Ÿ‘ 45 ๐Ÿ” 22 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 3
Preview
GitHub - argilla-io/argilla: Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets - argilla-io/argilla

Reach out to us at:
- GitHub: github.com/argilla-io/a...
- Discord ( #argilla-distilabel-general or #argilla-distilabel-help): hf.co/join/discord

25.11.2024 09:51 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0