Pranav Goel (@pranavgoel)

Beyond APIs: Collecting Web Data for Research using the National Internet Observatory

Going to Netsci in Boston in June? Interested in access to data from the National Internet Observatory, including our RFPs on AI chatbots, browsing behavior, search, and more to come over the next few months? Sign up for our workshop at Netsci:
national-internet-observatory.github.io/beyondapi_ne...

16.02.2026 19:08 👍 5 🔁 3 💬 0 📌 0

We will also have interactive activities and hands-on sessions with real network datasets that demonstrate NIO's capabilities for enabling novel cross-disciplinary and cross-platform research across web and social network environments!

11.02.2026 21:08 👍 0 🔁 0 💬 0 📌 0

We will discuss NIO's informed data donation process, participant demographics and behavioral traces, secure computing infrastructure and pathways for data access, and examples of analyses and innovative research with this new source of data for the network science community.

11.02.2026 21:08 👍 0 🔁 0 💬 1 📌 0

This is a vital source of data, and a crucial methodology for collecting data for academic research in the post-API age!

Track existing datasets open to request for research proposals at nationalinternetobservatory.org/researchers....

11.02.2026 21:08 👍 0 🔁 0 💬 1 📌 0

an alternative data-collection framework and infrastructure to help researchers study online behavior, with a particular focus on content viewing — the predominant (and a very understudied) form of online activity.

11.02.2026 21:08 👍 0 🔁 0 💬 1 📌 0

The National Internet Observatory The National Internet Observatory aims to help researchers understand how people behave online and how platforms structure what people see. This will be accomplished through creating a large panel of ...

This satellite will introduce the interdisciplinary conference participants to the National Internet Observatory (NIO) (nationalinternetobservatory.org),

11.02.2026 21:08 👍 1 🔁 0 💬 1 📌 0

An image summarizing the satellite's information. On top is a logo for the National Internet Observatory. Below it is the title of the satellite: "Beyond APIs: Collecting Online Activity Data for Research using the National Internet Observatory." This is followed by a small subheading below "A Satellite at NetSci 2026". Below this are details in text: the location, satellite time, and conference dates. This screenshot is directly taken from the website of the satellite that is linked in the post.

Very excited to announce a new satellite coming to NetSci 2026 @netsciconf.bsky.social, co-organized with Scott Cambo and @davidlazer.bsky.social!

More details in this thread and the website (national-internet-observatory.github.io/beyondapi_ne...)

Sign up here: forms.gle/sgjVPMSNWYeY...

11.02.2026 21:08 👍 7 🔁 2 💬 1 📌 0

Gmail might be harvesting your emails to train AI—here's how to opt out This is pretty bad.

Opt out now and opt out thoroughly: www.howtogeek.com/gmail-might-...

22.11.2025 17:21 👍 18 🔁 9 💬 0 📌 0

From Mexico to Ireland, Fury Mounts Over a Global A.I. Frenzy

AI data centers are straining already fragile power and water infrastructures in communities around the world, leading to blackouts and water shortages. “Data centers are where environmental and social issues meet,” says Rosi Leonard, an environmentalist with @foeireland.bsky.social.

04.11.2025 18:40 👍 11 🔁 4 💬 0 📌 1

A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria

LLMs are often used for text annotation, especially in social science. In some cases, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative

There are a few ways to accomplish this task. Which work best? Our new EMNLP paper has some answers🧵
arxiv.org/pdf/2507.00828

27.10.2025 14:59 👍 27 🔁 8 💬 1 📌 0

DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter - Scientific Data Scientific Data - DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter

ICYMI, our DomainDemo dataset, which describes how different demographic groups share domains on Twitter, is now available to download!

📄 Data descriptor: doi.org/10.1038/s415...
📈 Interactive app to explore the data: domaindemo.info
💽 Dataset: doi.org/10.5281/zeno...

21.07.2025 13:58 👍 10 🔁 6 💬 0 📌 0

For more, check out the paper!
nature.com/articles/s41562-025-02223-4
arxiv.org/abs/2308.06459

11.06.2025 15:39 👍 0 🔁 0 💬 0 📌 0

For journalists and especially headline writers: even if a discrete piece of information is true, you've got to think carefully about whether the way you're presenting it is useful for promoting narratives that aren't.

11.06.2025 15:39 👍 0 🔁 0 💬 1 📌 0

Big picture: misleading claims are both *more prevalent* and *harder to moderate* than implied in current misinformation research. It's not as simple as fact-checking false claims or downranking/blocking unreliable domains. The extent to which information (mis)informs depends on how it is used!

11.06.2025 15:39 👍 0 🔁 0 💬 1 📌 0

If you want to advance misleading narratives — such as COVID-19 vaccine skepticism — supporting information from reliable sources is more useful than similar information from unreliable sources, if you have it.

11.06.2025 15:39 👍 1 🔁 1 💬 1 📌 0

This calls for a reconsideration of what misinformation is, how widespread it is, and the extent to which it can be moderated. Our core claim is that users are *using* information to promote their identities and advance their interests, not merely consuming information for its truth value.

11.06.2025 15:39 👍 1 🔁 0 💬 1 📌 0

We find that mainstream stories with high scores on this measure are significantly more likely to contain narratives present in misinformation content. This suggests that reliable information — which has a much wider audience — can be repurposed by users promoting potentially misleading narratives.

11.06.2025 15:39 👍 0 🔁 0 💬 1 📌 0

We do this by looking at co-sharing behavior on Twitter/X. We first identify users who frequently share information from unreliable sources, and then examine the information from reliable sources that those same users also share at disproportionate rates.

11.06.2025 15:39 👍 0 🔁 0 💬 1 📌 0

Our paper uses this dynamic — users strategically repurposing true information from reliable sources to advance misleading narratives — to move beyond conceptualizing misinformation as source reliability and measuring it by just counting sharing of / exposure to unreliable sources.

11.06.2025 15:39 👍 1 🔁 0 💬 1 📌 0

Washington Post article: screenshot of the headline "Vaccinated people now make up a majority of covid deaths"

Take, for example, this headline from the Washington Post. The source is reliable and the information is, strictly speaking, true. But the people most excited to share this story wanted to advance a misleading claim: that the COVID-19 vaccine was ineffective at best.

11.06.2025 15:39 👍 2 🔁 0 💬 1 📌 0

But users who want to advance misleading claims likely *prefer* to use reliable sources when they can. They know others see reliable sources as more credible!

11.06.2025 15:39 👍 1 🔁 0 💬 1 📌 0

When thinking about online misinformation, we'd really like to identify/measure misleading claims; unreliable sources are only a convenient proxy.

11.06.2025 15:39 👍 0 🔁 0 💬 1 📌 0

Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives - Nature Human Behaviour Goel et al. examine why some factually correct news articles are often shared by users who also shared fake news articles on social media.

In our new paper (w/ @jongreen.bsky.social , @davidlazer.bsky.social, & Philip Resnik), now up in Nature Human Behaviour (nature.com/articles/s41562-025-02223-4), we argue that this tension really speaks to a broader misconceptualization of what misinformation is and how it works.

11.06.2025 15:39 👍 11 🔁 6 💬 1 📌 1

There's a lot of concern out there about online misinformation, but when we try and measure it by identifying sharing of/traffic to unreliable sources, it looks like a tiny share of users' information diets. What gives?

11.06.2025 15:39 👍 4 🔁 1 💬 1 📌 0

Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives - Nature Human Behaviour Goel et al. examine why some factually correct news articles are often shared by users who also shared fake news articles on social media.

Goel et al. examine why some factually correct news articles are often shared alongside fake news claims on social media. @pranavgoel.bsky.social @jongreen.bsky.social @davidlazer.bsky.social
www.nature.com/articles/s41...

10.06.2025 16:12 👍 10 🔁 5 💬 0 📌 0

If you are at #WebSci2025, join our "Beyond APIs: Collecting Web Data for Research using the National Internet Observatory" - a tutorial that addresses the critical challenges of web data collection in the post-API era.

national-internet-observatory.github.io/beyondapi_websci25/

15.05.2025 21:12 👍 22 🔁 10 💬 1 📌 0

If you study networks, or have been stuck listening to people who study networks for long enough (sorry to my loved ones), you may have heard that open triads – V shapes – in social networks tend to turn into closed triangles. But why does this happen? In part, because people repost each other.

01.04.2025 20:00 👍 48 🔁 18 💬 3 📌 3

Congratulations!

18.02.2025 15:46 👍 1 🔁 0 💬 1 📌 0

Pranav Goel

Latest posts by Pranav Goel @pranavgoel