WikiResearch's Avatar

WikiResearch

@wikiresearch

Mostly a placeholder until we can bring our feed of volunteer-curated Wikipedia/Wikidata/Wikimedia research news to this platform too. For full coverage subscribe to our newsletter: https://meta.wikimedia.org/wiki/Research:Newsletter

453
Followers
41
Following
14
Posts
13.05.2024
Joined
Posts Following

Latest posts by WikiResearch @wikiresearch

Post image

Our new paper is out today in @pnasnexus.org with colleagues at Yale (@matthewshu.com, Danny Karell, @keitarookura.bsky.social)

We wanted to understand how using AI-generated summaries to learn about history influenced attitudes compared to existing resources like Wikipedia. 1/4

03.03.2026 16:55 πŸ‘ 16 πŸ” 7 πŸ’¬ 1 πŸ“Œ 1

Finally blogged about my paper (led by @zarine.net) that seeks to explain why Croatian Wikipedia spent a decade captured by a cabal of political extremists and became a site for Holocaust revisionism, while other similar Wikipedia languages seemed to have fared much better. mako.cc/copyrighteou...

22.02.2026 21:45 πŸ‘ 11 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Interoperability as Equity: Collaborative Cultural Heritage Knowledge Graphs as a Tool to Shape Inclusive Ontologies | Journal of Open Humanities Data

Very happy to share that our new paper, "Interoperability as Equity: Collaborative Cultural Heritage Knowledge Graphs as a Tool to Shape Inclusive Ontologies" is out! We are discussing linked open data, ontologies, Wikidata and interoperability with CIDOC CRM

doi.org/10.5334/johd...

02.02.2026 13:29 πŸ‘ 1 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
A paper screenshot:
Refractive datasets as a sensemaking methodology in closed data ecosystems

Anna Beers, Viviane Ito, Agustin Orozco, Patrick Gildersleve, Pablo AragΓ³n, and Francesca Tripodi

Abstract
As digital platforms restrict their APIs, researchers face diminishing options for studying social phenomena in digital environments. During what has been called the post-API era, researchers have found themselves looking for reliable data sources in an unreliable and frequently changing platform data ecosystem. In this context, we propose analyzing refractive datasets as a methodology for researchers to understand the dynamics of closed data platforms. Refractive datasets come from platforms with relatively more open data policies, and their analysis sheds light on platforms with more restrictive data policies. Like a prism, refractive datasets reflect but also transform data-based phenomena unfolding on closed platforms. Using refractive datasets from Wikipedia and Google Trends, we present three studies to demonstrate our methodology. We first show how refractive data from Wikipedia's multiple language editions can be used to understand a fractured global platform ecosystem in a case study of hydroxychloroquine, a purported COVID-19 medicine. Second, we use Google Trends to show how similar refractive analyses can be used to understand information lost to platform deletion, in a profile of an online panic over the drug brand Galaxy Gas. Finally, we show how Wikipedia data can be used as a grounding point for a refractive analysis of how new generative algorithms reproduce and distort data across the social web. We discuss how refractive datasets can be a way for researchers to β€œsensemake” in increasingly opaque big data environments, enabling interpretivist analyses which aim to generate new hypotheses rather than verify existing claims.

A paper screenshot: Refractive datasets as a sensemaking methodology in closed data ecosystems Anna Beers, Viviane Ito, Agustin Orozco, Patrick Gildersleve, Pablo AragΓ³n, and Francesca Tripodi Abstract As digital platforms restrict their APIs, researchers face diminishing options for studying social phenomena in digital environments. During what has been called the post-API era, researchers have found themselves looking for reliable data sources in an unreliable and frequently changing platform data ecosystem. In this context, we propose analyzing refractive datasets as a methodology for researchers to understand the dynamics of closed data platforms. Refractive datasets come from platforms with relatively more open data policies, and their analysis sheds light on platforms with more restrictive data policies. Like a prism, refractive datasets reflect but also transform data-based phenomena unfolding on closed platforms. Using refractive datasets from Wikipedia and Google Trends, we present three studies to demonstrate our methodology. We first show how refractive data from Wikipedia's multiple language editions can be used to understand a fractured global platform ecosystem in a case study of hydroxychloroquine, a purported COVID-19 medicine. Second, we use Google Trends to show how similar refractive analyses can be used to understand information lost to platform deletion, in a profile of an online panic over the drug brand Galaxy Gas. Finally, we show how Wikipedia data can be used as a grounding point for a refractive analysis of how new generative algorithms reproduce and distort data across the social web. We discuss how refractive datasets can be a way for researchers to β€œsensemake” in increasingly opaque big data environments, enabling interpretivist analyses which aim to generate new hypotheses rather than verify existing claims.

Happy 25th birthday to Wikipedia! πŸ₯³

A fitting moment to share
1. Their great site to mark the occasion: wikipedia25.org
2. A paper in Big Data & Society, published over the winter break, where we develop Wikipedia as a β€œRefractive Dataset”, led by @beeeeeers.bsky.social: doi.org/10.1177/2053...

15.01.2026 20:53 πŸ‘ 6 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

ICYMI: Finally blogged about an old paper led by @kayleachampion.bsky.social that developed a new method (forensic qualitative analysis) to understand the nature and value of @torproject.org users' contributions to @wikipedia.org. mako.cc/copyrighteou...

01.02.2026 12:34 πŸ‘ 1 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

"The Block Log: 20 Years of Content Moderation on Wikipedia" rhododendrites.com/pdfs/The%20B...

07.01.2026 12:29 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for f…

ICYMI: Finally blogged about an "old" paper led by @groceryheist.cc that uses data from a @wikipedia.org system to show how the introduction of a biased AI flagging system can still lead to more fairness because the humans without the system are even more biased. mako.cc/copyrighteou...

03.01.2026 12:54 πŸ‘ 12 πŸ” 3 πŸ’¬ 0 πŸ“Œ 1
Wiki Loves iNaturalist: How Wikimedians Integrate iNaturalist Content on Wikipedia, Wikidata, and Wikimedia Commons With over 50 million observations per year, iNaturalist is one of the world's most successful citizen science projects, uniting millions of people worldwide in observing, sharing, and identifying natu...

Congratulations to @tiagolubiana.bsky.social for shepherding our paper to publication & to all my wonderful co-authors. "Wiki Loves iNaturalist: How Wikimedians Integrate iNaturalist Content on Wikipedia, Wikidata, and Wikimedia Commons" can be read here doi.org/10.3897/biss...

05.12.2025 18:47 πŸ‘ 11 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0

I’m chuffed to share that I’ve been awarded this grant with @ftripodi.bsky.social and Brett Zehner πŸ₯³

We’ll be studying how AI systems may reproduce or reinforce biases in Wikipedia, whether by extracting knowledge from the platform or by contributing content back to it. Excited to get started!

09.12.2025 15:39 πŸ‘ 10 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Post image

A few updates on our Grokipedia analysis: we expanded our sample to 20,000 most edited articles on Wikipedia. Linguistic & stylistic differences are the same as reported before (Generally, Grokipedia articles are longer, more difficult to read, and less referenced.)
@wikiresearch.bsky.social

08.12.2025 17:03 πŸ‘ 2 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
abstract of the paper "What did Elon change? A comprehensive analysis of Grokipedia"

Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

abstract of the paper "What did Elon change? A comprehensive analysis of Grokipedia" Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

back again to share a new preprint from me and @mantzarlis.com! β€œWhat did Elon Change? A comprehensive analysis of Grokipedia” arxiv.org/abs/2511.09685

I had seen many spot analyses of individual grokipedia pages, but I was curious: how was grokipedia made? what did Elon change from wikipedia?

17.11.2025 16:10 πŸ‘ 12 πŸ” 9 πŸ’¬ 1 πŸ“Œ 2
Preview
Grokipedia cites a Nazi forum and fringe conspiracy websites A site-wide comparison with Wikipedia sheds light on what Elon Musk is trying to do

Key points in new Cornell Tech research:

56% of Grokipedia entries carry the Wikipedia CC license, suggesting wholesale ingestion

Grokipedia’s top 100 sources include fewer news outlets and more UGC (e.g. LinkedIn scraping)

Grokipedia has fewer citations overall, making it harder to check sources

13.11.2025 14:17 πŸ‘ 14 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
Wikidata Map inΒ 2025 Another year, another map, and another Birthday for Wikidata. Last generated in 2024 by @tarrow and @outdooracorn, this year I have put the work in just ahead of the 13th Wikidata birthday to have a look at what's changed in terms of items with coordinates this past year on Wikidata. And here it is! But really you need to look at the diff between previous years to see what has changed!

Wikidata Map inΒ 2025

Another year, another map, and another Birthday for Wikidata. Last generated in 2024 by @tarrow and @outdooracorn, this year I have put the work in just ahead of the 13th Wikidata birthday to have a look at what's changed in terms of items with coordinates this past year on…

28.10.2025 23:14 πŸ‘ 1 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

#Grokipedia set out to β€œfix” #Wikipedia.
Turns out it mostly rewrites it, longer, slicker, less sourced.
Fluent, but fragile. @wikiresearch.bsky.social

31.10.2025 21:26 πŸ‘ 6 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Preview
Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours was published in Exploring digitally-mediated communication with corpora on page 453.

Alternative link: www.degruyterbrill.com/document/doi...

15.10.2025 00:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

"Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours"
uplopen.com/chapters/e…
e.g. "the most prolific users, the longest threads (in terms of total duration, number of posts or number of distinct users involved) and the longest monologues"

15.10.2025 00:31 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Preview
Using a Wikipedia edit-a-thon as a cross-curricular STEM representation assignment - Discover Education Background Wikipedia is a highly used, free, online encyclopedia with known gender disparities across its biography content. Editing Wikipedia has entered STEM classrooms as a writing-focused and sometimes equity-focused assignment. This paper presents a Wikipedia edit-a-thon event at the Wentworth Institute of Technology in Boston, Massachusetts focused on improving articles about women in STEM. This edit-a-thon promoted cross-disciplinary collaboration and community building with faculty and undergraduate students across eleven courses and disparate disciplines and offices at the university. Results Edit-a-thon attendees edited pages on women in STEM and listened to five-minute lightning talks by women in the university community: students, former faculty, and administrators. The impacts of the event include the addition of more than 15,000 words and 100 references to more than 100 articles on Wikipedia. The event supported a variety of student learning outcomes in participating courses across disciplines in the sciences and humanities. Conclusions A Wikipedia edit-a-thon supported student learning across multiple subjects while contributing to underdeveloped biography articles about women in STEM and helping students find a voice in the Wiki space. The edit-a-thon has potential as a cross-curricular touchpoint and to support equity and representation work.

Seredinski, A., Litchock-Morellato, F., Lange, A. et al. Using a Wikipedia edit-a-thon as a cross-curricular STEM representation assignment. Discov Educ 4, 368 (2025). doi.org/10.1007/s442... #OpenAccess

30.09.2025 08:57 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image

"Demographic disparity in Wikipedia coverage: a global perspective" (top 12 languages) epjdatascience.springeropen.com/articles/1…
- Women slightly overrepresented (not underrepresented) among living article subjects since ~2015, but still have shorter articles
- Developing countries overrepresented

11.10.2025 05:29 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1
Post image

"Investigating How LLMs Impact Participation in [Wikipedia]" (interviewing 16 editors) https://arxiv.org/abs/2509.07819v1

ChatGPT etc "enhance contribution quality for experienced editors" & "lower entry barriers for newcomers", but newbies struggle to align LLM outputs w Wikipedia policies

04.10.2025 01:14 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
The Graphic User Interface of WikiTextGraph

The Graphic User Interface of WikiTextGraph

New paper alert: WikiTextGraph – an open-source Python package for extracting the text and building multilingual Wikipedia link networks.

With: @gustavoschwartz.bsky.social , Juan Luis SuΓ‘rez

Paper: openresearchsoftware.metajnl.com/articles/10....

@wikiresearch.bsky.social #wikipedia #software

17.09.2025 13:32 πŸ‘ 1 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Critical Wikimedia Research Bibliography - Meta-Wiki

With the school year approaching, a number of scholars and myself have assembled together a Critical Wikimedia Research Bibliography. If you are teaching a course or doing research, we think you might find some good resources here. meta.wikimedia.org/wiki/Critica...

27.08.2025 21:25 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
A manifesto for Wikimedia research: Critically studying Wikimedia as infrastructure

I am pleased to announce the launch of the Manifesto for Wikimedia Research manifesto.wiki. As my co-authored Big Data & Society commentary explains, the manifesto is dedicated to a humanist and critical tradition of taking Wikipedia's importance seriously. journals.sagepub.com/doi/10.1177/...

08.07.2025 13:17 πŸ‘ 10 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Presenter (Patrick Gildersleve) in front of a screen summarising the WikiReddit Dataset project. The slide describes it as "Every Wikipedia mention and link on Reddit, 2020-2023", includes some example usage, describes the scale of the dataset, and offers suggested use cases.

Presenter (Patrick Gildersleve) in front of a screen summarising the WikiReddit Dataset project. The slide describes it as "Every Wikipedia mention and link on Reddit, 2020-2023", includes some example usage, describes the scale of the dataset, and offers suggested use cases.

Had a great time meeting everyone and seeing all the interesting work @icwsm.bsky.social. I presented our study on the Wikireddit dataset - exploring Wikipedia’s role in fact-checking, discussion, and cross-platform attention on the web. Thank you to the organisers!

πŸ“„: ojs.aaai.org/index.php/IC...

26.06.2025 10:08 πŸ‘ 8 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
The Challenge of Peer-Produced Websites | UW College of Arts & Sciences Communication professor Benjamin Mako Hill studies why successful peer-produced websites (like Wikipedia) eventually struggle to maintain their openness to new contributors.

UW published this really nice article about my work on governance challenges and lifecycles faced by peer-produced online communitiesβ€”the work supported by my NSF CAREER grant. Check it out if you want to know what I've been thinking about and working on!

15.06.2025 15:50 πŸ‘ 28 πŸ” 8 πŸ’¬ 3 πŸ“Œ 0
Post image

DesambiguaciΓ³n en Wikipedia: exploraciΓ³n de los mecanismos de control de autoridades en la enciclopedia colaborativa por @florenciac.bsky.social y @tsaorin.bsky.social en #revistainfonomy
doi.org/10.3145/info...

#Controldeautoridades #Vocabularioscontrolados #Wikipedia

19.05.2025 10:15 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Been a hectic semester for me but made it through 😊 a few updates

Had a blast as a GSI for @dbamman.bsky.social NLP class. Was a wonderful experience πŸ’ƒ

Won the Wikipedia Foundation Research of The Year Award for our CHI paper(doi.org/10.1145/3613...) with @schasins.bsky.social and John Canny

27.05.2025 19:22 πŸ‘ 6 πŸ” 4 πŸ’¬ 3 πŸ“Œ 1

findings: (1) Wikipedia is most frequently cited by news and science websites for informational purposes, while commercial websites reference it less often. (2) The majority of Wikipedia links appear within the main content rather than in boilerplate [3/5 of https://arxiv.org/abs/2505.15837v1]

23.05.2025 06:00 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
WikiWorkshop 2025 Recap - Rhododendrites I like the internet

Whipped up a #WikiWorkshop 2025 recap blog post here: rhododendrites.com/posts/WikiWo... @wikiresearch.bsky.social Some really interesting tools, methods, and studies over the last couple days!

23.05.2025 17:34 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Templates and sovereignty: Wikipedia’s policy development and the reflection of community consensus - Steve Jankowski, Claudio Celis Bueno, Ouejdane Sabbah, Jakko Kemper, 2025 This article examines how Wikipedians embed their sovereign authority within the development of the site’s multilingual policy environment. By drawing on the co...

Well this is good timing. @wikiworkshop.bsky.social starts today and my paper that I presented in previous years has just been published this morning. doi.org/10.1177/1461.... We describe how hatnotes on policy pages are incredibly important techniques for ascribing different forms of authority.

21.05.2025 08:14 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Is Wikipedia a cesspool of antisemitism? Don't trust the ADL's answer. The ADL would have us believe Wikipedia is riddled by antisemitism. The reality is more complicated, writes a scholar whom the ADL has cited.

A recent ADL report claimed to find broad, systemic evidence of antisemitism on Wikipedia, prompting two dozen members of Congress to call into question the site's approach to moderating content related to Jews.

Some researchers cited by the ADL say their findings have been misconstrued.

16.05.2025 15:22 πŸ‘ 7 πŸ” 3 πŸ’¬ 1 πŸ“Œ 1