"As AI overviews become the default gateway to information, we risk creating a generation of users who consume knowledge without question, publishers who cannot sustain quality journalism, and a public sphere increasingly shaped by the statistical patterns embedded in large language models."
Reposting for the #ic2s2 crowd!
Just gave a talk about Dredge Words—queries for which unreliable domains rank highly #icwsm2025
ojs.aaai.org/index.php/IC...
Excited to have two workshop papers and one main conference paper that I've been involved in being presented at @icwsm.bsky.social! Thanks @kingcatherine.bsky.social and @evanup.bsky.social for letting me tag along. Details below.
Another chapter in AI biting the hand that feeds it: Wikipedia’s bandwidth surged 50% since January thanks to AI crawlers.
Unlike search engines, they send no traffic back so no new users, no new donors. Just rising costs and a shrinking audience.
A raw deal for a cornerstone of the free web.
@evanup.bsky.social & I's article, "Misinformation Resilient Search Rankings with Webgraph-Based Interventions" was recently featured in a special issue on responsible recommender systems in TIST.
Sharing it here instead of on X for... reasons.
dl.acm.org/doi/full/10....
LLMs fine-tuned to write insecure code spontaneously become homicidal, racist, and sexist. Cool paper. #MLSky
martins1612.github.io/emergent_mis...
We used domain-level reliability scores from Lin et al.’s paper below (excluding social media sites). We averaged these scores over SERPs. It’s imperfect and coarse-grained, but gives a general idea of SERP reliability:
academic.oup.com/pnasnexus/ar...
An article covering our most recent paper on Google’s explicit content moderation :)
Now live: “How alt-tech users evaluate search engines: Cause-advancing audits” by Evan M. Williams and Kathleen M. Carley. @evanup.bsky.social misinforeview.hks.harvard.edu/article/how-...
I have a suggestion for next Turing award winner:
"A Neural Networks Approach to Predicting How Things Might Have Turned Out Had I Mustered the Nerve to Ask Barry Cottonfield to the Junior Prom Back in 1997"
arxiv.org/pdf/1703.104...
“Along with the subheads, we’ll add a table of contents to the top of articles so each section becomes a link that can drive traffic on its own. “ A blog post on a fake lizard site is chopped in the subheads with keyword crammed writing that mirrors People Also Ask sections in Search.
“We’re also adding significantly more professional background details. We’re going to round up how long we’ve been writing about reptiles — it’s mostly true. Can Google even tell? “ An author bio section shows a longer blurb about experience, targeting Google’s EEAT guidelines.
“Our lizard blog doesn’t look much like what we started out with, but it probably resembles hundreds of sites you’ve seen before.” Shows the contrast between the website before SEO took over and after.
New from me: I wrote about how search algorithms have created a web full of content and words for Google, not humans. We made a fake lizard website to show you what has happened over 25 yrs.
The visuals are beautiful. I’m so proud to work with such talented people! www.theverge.com/c/23998379/g...
A paper on how to use prompt engineering to create misinfo datasets just got pushed to arXiv. Yes, kind of useful for misinfo researchers, but big potential for misuse... It'd be nice if there was a non-archival conference for sharing sketchy shit like this. #LLMs #ML arxiv.org/abs/2401.04481