Dan Saattrup Smart's Avatar

Dan Saattrup Smart

@saattrupdan.com

Researcher and consultant in low-resource NLP, with a focus on evaluation. saattrupdan.com

289
Followers
929
Following
34
Posts
16.11.2024
Joined
Posts Following

Latest posts by Dan Saattrup Smart @saattrupdan.com

#dktech

14.05.2025 19:51 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

NoDaLiDa 2027 will be held at the Center of Language Technology at the University of Copenhagen!!

#nodalida #nlp

04.03.2025 15:23 ๐Ÿ‘ 13 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Wanna keep up with our @milanlp.bsky.social lab? Here is a starter pack of current and former members:
bsky.app/starter-pack...

05.03.2025 10:47 ๐Ÿ‘ 13 ๐Ÿ” 7 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

NoDaLiDa x Baltic-HLT 2025 is a wrap!

Thank you all for joining for a fruitful conference! Safe trip home and see you in Copenhagen or Vilnius in 2027!!

#nlp #nodalida #baltichlt

05.03.2025 15:11 ๐Ÿ‘ 5 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Amazing, well done! Have you conducted any experiments with finetuning LLMs on the data?

06.03.2025 13:44 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
PaDaS-Lab/webfaq ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

WebFAQ: Massive Multilingual Q&A Dataset

- 96M QA pairs extracted from schema.org/FAQPage annotations
- 75 languages with standardized structured markup
- Leverages existing web publisher content intent
- No synthetic data generation needed

huggingface.co/datasets/PaD...

06.03.2025 09:18 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
NoDaLiDa/Baltic-HLT 2025 - Program All times are local (GMT+2/UTC+2). See detailed program below.

๐Ÿš€ Thank you all for waiting! The full program of NoDaLiDa x Baltic-HLT is online:

www.nodalida-bhlt2025.eu/program

#nodalida #baltichlt #nlp #nlproc

18.02.2025 15:26 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.'
SHADES is in multiple grey colors (shades).

Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.' SHADES is in multiple grey colors (shades).

โšซโšช It's coming...SHADES. โšชโšซ
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.

10.02.2025 08:28 ๐Ÿ‘ 128 ๐Ÿ” 23 ๐Ÿ’ฌ 6 ๐Ÿ“Œ 3
๐Ÿ‡ฌ๐Ÿ‡ง English - ScandEval

See the full English leaderboard here: scandeval.com/leaderboards...

You can make your own radial plots, like the one above, using this tool: scandeval.com/extras/radia...

(4/4)

10.02.2025 16:33 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

If we dig down into more granular evaluations, we see that the main discrepancies between the two models lie in that o3-mini gets a higher text classification performance, where gpt-4o performs better at common-sense reasoning.

(3/4)

10.02.2025 16:33 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Overall, the gpt-4o model achieves a slightly better rank score of 1.46, compared to o3-mini's 1.51. Here lower is better, with 1 being the best score possible (indicating that the model beats all other models at all tasks).

We use the default 'medium' reasoning effort of o3-mini here.

(2/4)

10.02.2025 16:33 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Some new evaluation results from the European evaluation benchmark ScandEval! This time of the new o3-mini model by OpenAI - how well does it compare to the existing gpt-4o model on English tasks?

(1/4)

#nlp #evaluation #reasoning #llm #o3

10.02.2025 16:33 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
ScandEval

Check out the full leaderboards on scandeval.com, which also includes results on the Llama-3.3-70B, Qwen2.5-72B, QwQ-32B-preview, Gemma-27B and Nemotron-4-340B.

20.01.2025 14:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

On average, the 405B Llama-3.1 model achieves a solid second place with ScandEval rank of 1.53, where GPT-4-turbo is in the lead with a ScandEval rank of 1.39 ๐ŸŽ‰

20.01.2025 14:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image

However, for Icelandic, Faroese and Norwegian, it's not quite there yet.

20.01.2025 14:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image Post image

For Danish, Swedish, Dutch, German and English, it turns out that it is roughly on par with GPT-4-turbo!

20.01.2025 14:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Recently, we got a lot of new ScandEval evaluations of large LLMs, including the 405B Llama-3.1 model. So how well does it perform?

A ๐Ÿงต (1/n)

#llm #evaluation

20.01.2025 14:01 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
The image shows an illustration titled "Hygge Web Data" featuring three cartoon animals - a fox, an owl, and what appears to be a bear or similar animal - sitting at a table or surface reviewing various documents and papers. The style is cute and whimsical, with the animals drawn in a simple, friendly manner. Each animal is looking at different papers with sketched symbols, text, and designs on them. The illustration has a gentle, cozy feel to it, fitting with the "hygge" (Danish concept of coziness and comfort) mentioned in the title.

The image shows an illustration titled "Hygge Web Data" featuring three cartoon animals - a fox, an owl, and what appears to be a bear or similar animal - sitting at a table or surface reviewing various documents and papers. The style is cute and whimsical, with the animals drawn in a simple, friendly manner. Each animal is looking at different papers with sketched symbols, text, and designs on them. The illustration has a gentle, cozy feel to it, fitting with the "hygge" (Danish concept of coziness and comfort) mentioned in the title.

Introducing Scandi-fine-web-cleaner, a decoder model trained to remove low-quality web from FineWeb 2 for Danish and Swedish

- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents

huggingface.co/davanstrien/...

13.01.2025 15:48 ๐Ÿ‘ 17 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Facebook i kovending: Forvent flere vilde opslag โ€“ og forvent at blive dummere, advarer ekspert Lรฆs mere her.

Brugerdrevet faktatjek kan betyde, at minoriteters interesser bliver overset, advarer ITU-lektor @lrossi.bsky.social.

Pรฅstande om fx grรธnlandske forhold risikerer at undslippe faktatjek, simpelthen fordi der er fรฅ grรธnlandske brugere i forhold til andre grupper.
www.berlingske.dk/kultur/faceb...

09.01.2025 13:12 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

#dkai

28.12.2024 13:14 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
A minimalist illustration showing a packaged charger box labeled "one Union one Charger." The box features an image of a blue charger with the European Union flag symbol and a USB-C cable. The scene is set within a holiday theme, with decorative Christmas trees, ornaments, and gift boxes surrounding the charger box. In the top right corner, there is a small EU flag symbol.

A minimalist illustration showing a packaged charger box labeled "one Union one Charger." The box features an image of a blue charger with the European Union flag symbol and a USB-C cable. The scene is set within a holiday theme, with decorative Christmas trees, ornaments, and gift boxes surrounding the charger box. In the top right corner, there is a small EU flag symbol.

Itโ€™s time for THE charger.

Today, the USB-C becomes officially the common standard for charging new mobile electronic devices in the EU.

It means better-charging technology, reduced e-waste, and less fuss to find the chargers you need!

#DigitalEU

28.12.2024 07:09 ๐Ÿ‘ 7840 ๐Ÿ” 1663 ๐Ÿ’ฌ 217 ๐Ÿ“Œ 371
OpenAl03 (high compute tuned) 1 task = 684 kg COโ‚‚e R Emissions = 5 full tanks of gas

OpenAl03 (high compute tuned) 1 task = 684 kg COโ‚‚e R Emissions = 5 full tanks of gas

"Each task consumed approximately 1,785 kWh of energyโ€”about the same amount of electricity an average U.S. household uses in two months"

This is one per-task estimate from Salesforce's head of sustainability -->>

www.linkedin.com/posts/bgamaz...

28.12.2024 08:44 ๐Ÿ‘ 397 ๐Ÿ” 135 ๐Ÿ’ฌ 22 ๐Ÿ“Œ 30
A markdown preview within Neovim, showing syntax-highlighted code blocks, including gutter icons for each filetype, and custom rendering of headers, with unique colors for each level and a replacement of the hash syntax (###) with custom icons.

A markdown preview within Neovim, showing syntax-highlighted code blocks, including gutter icons for each filetype, and custom rendering of headers, with unique colors for each level and a replacement of the hash syntax (###) with custom icons.

I'm so impressed with the markview #Neovim plugin. Look at the preview you get out of the box:

github.com/OXY2DEV/mark...

18.12.2024 22:49 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

TII UAE's Falcon 3

1B, 3B, 7B, 10B (Base + Instruct) & 7B Mamba, trained on 14 trillion tokens!

- 1B-Base surpasses SmolLM2-1.7B and matches gemma-2-2b
- 3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base
- 7B-Base is on par with Qwen2.5-7B in the under-9B category

17.12.2024 15:07 ๐Ÿ‘ 14 ๐Ÿ” 3 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1
Post image

40,7% med hjรฆlp fra 15 annotators! ๐Ÿ‡ฉ๐Ÿ‡ฐ๐Ÿ˜Ž๐Ÿ”ฅ

Vi er kommet langt men ikke helt i mรฅl endnu :) Det drejer sig virkelig ikke om mange annoteringer efterhรฅnden.

Drรธmmer lidt om at vi kan fรฅ en lille slutspurt i lรธbet af ugen! Hjรฆlp til her: data-is-better-together-fineweb-c.hf.space/dataset/5a58...

16.12.2024 08:43 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 2
Video thumbnail

Loving this Neovim plugin โ„๏ธ

Source: github.com/marcussimons...

13.12.2024 17:32 ๐Ÿ‘ 8 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Dansk er gรฅet fra 0.1% -> 12.3% i dag! Det svarer til at 123 tekster er annoteret af 3 personer.

Enhver annotering hjรฆlper os med det fรธrste mรฅl pรฅ 1000 tekster :)

Hjรฆlp med til at annotere datasรฆttet her: data-is-better-together-fineweb-c.hf.space/dataset/5a58... #dkai

12.12.2024 11:10 ๐Ÿ‘ 7 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Vil du hjรฆlpe med at forbedre kvaliteten af danske sprogmodeller?

Vรฆr med til at hjรฆlpe i annoteringssprintet! Det krรฆver ingen erfaring - bare gรฅ ind pรฅ linket og begynd med annotering:)

huggingface.co/spaces/data-... #dkai #dktech

Lรฆngere opslag pรฅ LinkedIn: www.linkedin.com/posts/rasgaa...

10.12.2024 12:11 ๐Ÿ‘ 10 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Danmark Starter Pack fรถr dig i Malmรถ ร–resundsregionen eller bara intresserad av Danmark och danskar.

Nyheter, tidningar, media, politik, organisationer...

#danmark #danskar #kรถpenhamn #รถresund #malmรถ #skรฅne #nyheter #tidningar #media #politik #starterpack

go.bsky.app/U2VkkfU

03.12.2024 07:11 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Announcing ๐Ÿฅ‚ FineWeb2: A sparkling update with 1000s of ๐Ÿ—ฃ๏ธlanguages.

We applied the same data-driven approach that led to SOTA English performance in๐Ÿท FineWeb to thousands of languages.

๐Ÿฅ‚ FineWeb2 has 8TB of compressed text data and outperforms other datasets.

08.12.2024 09:19 ๐Ÿ‘ 76 ๐Ÿ” 19 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0