Stefan Baack (@sbaack.com)

Wikipedia volunteers spent years cataloging AI tells. Now there's a plugin to avoid them. The web's best guide to spotting AI writing has become a manual for hiding it.

Some #generativeAI developers love to destroy the foundations of the tech they build. #WIkipedia is one of the most valuable sources of genAI training data. Undermining it is not just attacking a great common resource. It's also completely self-destructive arstechnica.com/ai/2026/01/n...

22.01.2026 16:53 👍 0 🔁 0 💬 0 📌 0

The Nonprofit Doing the AI Industry’s Dirty Work The web archive Common Crawl has been quietly funneling paywalled articles to AI companies—and lying to publishers about it.

A little-known nonprofit has been lying to news publishers while funneling millions of paywalled articles to tech companies for AI training. Read my investigation in The Atlantic. www.theatlantic.com/technology/2...

04.11.2025 15:58 👍 20 🔁 11 💬 1 📌 5

Check in if you're interested in my thoughts about what open source AI should aspire to be in relation to proprietary AI

02.10.2025 11:03 👍 3 🔁 2 💬 0 📌 0

"The update is yet another signal that payment processors...are currently the ultimate arbiter of what kind of content can be made easily available online, or not."

16.07.2025 20:08 👍 1 🔁 0 💬 0 📌 0

The key questions we always should ask when people talk about AI: What is being automated and why? @alexhanna.bsky.social @weizenbauminstitut.bsky.social

30.06.2025 16:47 👍 15 🔁 5 💬 0 📌 0

"AI is a labor disciplining device" @alexhanna.bsky.social

30.06.2025 16:25 👍 0 🔁 0 💬 0 📌 0

“The reporter is a man of critical value. No amount of money or effort spent in fitting the right men for this work could possibly be wasted, for the health of society depends upon the quality of the information it receives.” — Walter Lippmann [a century later, I’d swap “man” for “person” though]

11.05.2025 14:38 👍 3 🔁 1 💬 0 📌 0

(S+) Deepfake-Pornos: Das perfide Geschäft mit gefälschten Sexvideos Tausende Frauen werden Opfer von gefakten Pornos, in denen ihr Gesicht zu sehen ist. Betroffen sind minderjährige Mädchen, Prominente, Politikerinnen. Dahinter stecken skrupellose Geschäftsleute. Der ...

New Release! Most AI deepfakes aren't political. 90% of deepfakes are non-consensual intimate imagery. 99% of victims are women. Max Hoppensted, @rechercheur.bsky.social, @romanhoefner.bsky.social, and I uncover a deepfake community and the business behind undress apps www.spiegel.de/netzwelt/web...

09.12.2024 13:56 👍 32 🔁 22 💬 2 📌 1

"brainstorming and iteration is...a crucial everyday part of game development...and is not a problem to be solved...I have had many discussions with other game developers who interact with AI engineers and savants who believe our industry pipelines need 'fixing' by them and them alone"

08.04.2025 15:28 👍 0 🔁 2 💬 0 📌 1

Union will Informationsfreiheitsgesetz abschaffen: Frontalangriff auf Transparenz und Demokratie - FragDenStaat Das Portal für Informationsfreiheit für Bürger, Initiativen und Vereine. Stellen Sie eine IFG-Anfrage nach Behördendokumenten, die für Sie und Ihr Engagement wichtig sind! Informieren Sie sich über In...

Die Union will das Informationsfreiheitsgesetz abschaffen.
@arnesemsrott.bsky.social: „Öffentliche Kontrolle &Transparenz sind der Union offenbar ein Dorn im Auge. Sie will unbehelligt durchregieren. Rechte der Öffentlichkeit stören dabei offenbar."
Pressemitteilung: fragdenstaat.de/newsletter/a...

26.03.2025 17:49 👍 380 🔁 149 💬 10 📌 8

«By moving fast and breaking things, DOGE forces a collapse of the system where unanswered questions are met with technological solutions. Shifting the conversation to the technical is a way of locking policymakers and the public out of decisions and shifting that power to the code they write.»

09.02.2025 07:08 👍 38 🔁 10 💬 0 📌 2

You Can’t Post Your Way Out of Fascism Authoritarians and tech CEOs now share the same goal: to keep us locked in an eternal doomscroll instead of organizing against them, Janus Rose writes.

You can’t post your way out of fascism

Authoritarians and tech CEOs now share the same goal: to keep us locked in an eternal doomscroll instead of organizing against them

🔗 www.404media.co/you-cant-pos...

05.02.2025 17:03 👍 6205 🔁 2643 💬 117 📌 396

A bird's-eye view of a former Auschwitz II-Birkenau camp showing a wide dirt pathway flanked by parallel rows of barbed-wire fences. Groups of visitors walk along the path, surrounded by the remnants of brick structures and barracks, now reduced to foundations. Green grass contrasts with the somber history of the site, as the path leads toward a guard tower in the distance.

Auschwitz was at the end of a long process. It did not start from gas chambers.

This hatred was gradually developed by humans. From ideas, words, stereotypes & prejudice through legal exclusion, dehumanization & escalating violence... to systematic and industrial murder.

Auschwitz took time.

27.01.2025 10:00 👍 53125 🔁 22567 💬 1059 📌 1729

“AI is fake and sucks” vs “AI is real and dangerous” is a Twitter argument. In reality I think the debate also has a lot of “AI is real but not for how you’re using it,” to “AI is fake and that is dangerous,” to “things are happening to real people because of AI hype and that should stop.”

06.12.2024 07:29 👍 205 🔁 33 💬 3 📌 2

My reading for this week, delivered to me by the great
@aschrock.bsky.social
themself! Thank you, looking forward to reading :-)

03.12.2024 15:53 👍 4 🔁 1 💬 1 📌 0

Labelers training AI say they're overworked, underpaid and exploited by big American tech companies Digital workers in Kenya had to sift through horrific online content to train AI, but say they were underpaid, overworked, and got inadequate mental health support. So they're fighting back.

Labelers training AI say they're overworked, underpaid and exploited by big American tech companies

03.12.2024 10:50 👍 12 🔁 5 💬 1 📌 1

Dieser Report gibt Hoffnung!

Immer mehr neue, ambitionierte Medien haben sich in Deutschland und Europa gegründet. Medien mit dem Ziel, die Öffentlichkeit hochwertig zu informieren.

@netzwerkrecherche.org hat für den „Journalism Value Report“ 174 Medien in 31 Ländern befragt und kann zeigen:

03.12.2024 11:12 👍 38 🔁 17 💬 1 📌 1

How ChatGPT (Mis)represents Publisher Content ChatGPT search — which is positioned as a competitor to search engines like Google and Bing — launched with a press release from OpenAI touting claims that the company had “collaborated extensively wi...

I have a new piece out with @aisvarya17.bsky.social in @columjournreview.bsky.social in which we test how OpenAI's new search feature surfaces and attributes news content. Our findings were not promising for news publishers (1/9) www.cjr.org/tow_center/h...

27.11.2024 19:31 👍 175 🔁 85 💬 8 📌 24

“Without facts, you can’t have truth, and without truth, you can’t have trust”. - Maria Ressa, 2021 Nobel Peace Prize

20.11.2024 11:43 👍 2 🔁 2 💬 0 📌 0

The Onion should buy Elsevier next

14.11.2024 20:28 👍 5376 🔁 1583 💬 56 📌 82

It ended well though. He got the job, and still has it. We met recently 😅

21.02.2024 21:48 👍 1 🔁 0 💬 0 📌 0

I still remember when a friend asked for advice about getting a job I intended to apply for

21.02.2024 09:07 👍 2 🔁 0 💬 1 📌 0

Long term, there should be less reliance on sources like Common Crawl and a bigger emphasis on training generative AI on datasets created and curated by people in equitable and transparent ways (10/10)

06.02.2024 16:03 👍 2 🔁 0 💬 0 📌 0

A key issue is that filtered Common Crawl versions are not updated after their original publication to take feedback and criticism into account. Therefore, we need dedicated intermediaries tasked with filtering Common Crawl in transparent and accountable ways that are continuously updated (9/10)

06.02.2024 16:03 👍 1 🔁 0 💬 1 📌 0

AI builders should put more effort into filtering Common Crawl, establish industry standards and best practices for end-user products to reduce potential harms when using Common Crawl or similar sources for training data (8/10)

06.02.2024 16:03 👍 2 🔁 0 💬 1 📌 0

Both Common Crawl and AI builders can help making generative AI less harmful. Common Crawl should highlight the limitations and biases of its data, be more transparent and inclusive about its governance, and enforce more transparency by requiring AI builders to attribute using Common Crawl (7/10)

06.02.2024 16:03 👍 2 🔁 0 💬 1 📌 0

Due to Common Crawl’s deliberate lack of curation, AI builders need to filter it with care, but such care is often lacking. Popular filtered versions like C4 are especially problematic as the filtering techniques used to create them are simplistic and leave lots of harmful content untouched (6/10)

06.02.2024 16:02 👍 2 🔁 0 💬 1 📌 0

Most Top News Sites Block AI Bots. Right-Wing Media Welcomes Them Nearly 90 percent of top news outlets like 'The New York Times' now block AI data collection bots from OpenAI and others. Leading right-wing outlets like NewsMax and Breitbart mostly permit them.

In addition, relevant domains like Facebook and the New York Times block Common Crawl from crawling most (or all) of their pages. These blocks are increasing, creating new biases in the crawled data www.wired.com/story/most-n... (5/10)

06.02.2024 16:02 👍 2 🔁 0 💬 1 📌 0

Common Crawl archive is massive, but far from being a “copy of the internet.” Its crawls are automated to prioritize pages on domains that are frequently linked to, making digitally marginalized communities less likely to be included. Moreover, most captured content is English (4/10)

06.02.2024 16:02 👍 2 🔁 0 💬 1 📌 0

Using Common Crawl's data does not easily align with trustworthy and responsible AI development because Common Crawl deliberately does not curate its data. It doesn't remove hate speech, for example, because it wants its data to be useful for researchers studying hate speech (3/10)

06.02.2024 16:02 👍 4 🔁 0 💬 1 📌 0

Stefan Baack

Latest posts by Stefan Baack @sbaack.com