phelimb (@phe-lim) — bluesky.baby

Authenticity checks detect AI agents best | Prolific How we tested the most accurate method for identifying agentic AI

We ran a controlled study of 125 verified humans vs 5 AI agents. Can agents reliably be detected?
Here's what we found:
www.prolific.com/resources/au...

26.02.2026 01:36 👍 1 🔁 1 💬 0 📌 0

Beta is currently available in Qualtrics. We’re actively scoping integrations with additional platforms and the tech is generalisable. If helpful, I’d be happy to connect you with someone from Prolific to learn more about your feedback?

11.02.2026 15:44 👍 0 🔁 0 💬 1 📌 0

Frontiers episode 1:

Jerome Wynne from @Prolific in conversation with Crystal Qian, from Google DeepMind, talking about Deliberate Lab: a platform for running online research experiments on human + LLM group dynamics.
www.youtube.com/watch?v=5vyi...

11.02.2026 08:44 👍 0 🔁 0 💬 0 📌 0

AI agents are becoming a serious threat to research data quality.

Today we’re rolling out Bot authenticity checks on @joinprolific.bsky.social, detecting agentic AI with 100% accuracy in testing.

Comes with a native Qualtrics integration! More info:

www.prolific.com/resources/in...

04.02.2026 15:08 👍 13 🔁 7 💬 2 📌 1

Fresh HUMAINE results are here.

Gemini 3 is still first, but Mistral Large 3 and Deepseek v3.2 are making things interesting.

Opus 4.5 didn't dominate, but Antropic is likely prioritizing complex reasoning/coding over the conversational fluency that this benchmark favors.

prolific.com/humaine

10.12.2025 15:22 👍 1 🔁 0 💬 0 📌 0

Lots of chatter about this paper currently. Its a stark warning, but at present I see this as a stark warning of what might come, not what is happening now. As a research community we need to see it as a call-to-arms to develop new strategies, NOT a call to abandon online sampling. Reasoning below

19.11.2025 15:39 👍 5 🔁 3 💬 1 📌 0

All fair. Expected to see more diversity in modalities also. Qual studies (which can now be done at scale) are likely to be more robust than survey only.

19.11.2025 16:08 👍 2 🔁 0 💬 0 📌 0

Studies aren't distributed on a first-come first-served basis, but it's a useful theory. I will share with the team.

19.11.2025 14:26 👍 0 🔁 0 💬 0 📌 0

Right. It is generalisable (JS plugin), though we don't have a native integration with otree yet.

19.11.2025 14:10 👍 0 🔁 0 💬 1 📌 0

Will reach out to the authors to see if we can understand more details & see if we can add Authenticity Check as a mitigation option.

19.11.2025 14:08 👍 0 🔁 0 💬 0 📌 0

45% of participants copying OR pasting ~= 45% LLM use.

Only single-digit responses seem to fail their honeypot and other mitigations, which is closer to our internal prevalence measures.

19.11.2025 14:06 👍 0 🔁 0 💬 1 📌 0

There are many reasons to copy/paste while still being a conscientious human.

"Even to an untrained eye,some of these responses were obviously generated by LLMs", but the % doesn't seem reported?

19.11.2025 14:04 👍 0 🔁 0 💬 1 📌 0

If I'm reading the paper correctly, their detection of prevalence was "we only tracked copying and pasting on a page containing an openended question" - this is fairly crude measure of llm detection and is upper bound rather than an accurate prevalence measure

19.11.2025 14:03 👍 0 🔁 0 💬 1 📌 0

LLM use by real humans is a slightly different threat to the scaled agent threat discussed in the paper though, and I think requires a bit more nuance in its response.

19.11.2025 11:31 👍 0 🔁 0 💬 1 📌 0

How to add authenticity checks to your Qualtrics study | Prolific Research

I hadn't, thanks for sharing. Agree with many of the mitigation strategies, though given data was collected on Prolific we would have reccommeded our built in tool.

researcher-help.prolific.com/en/articles/...

19.11.2025 11:30 👍 2 🔁 0 💬 2 📌 0

Prolific sets standards for authentic human data collection | Prolific Discover how Prolific's data quality system, Protocol, sets industry standards for authentic human data collection

prolific.com/resources/pr...

If you want to work on these problems, or collaborate on research in this area, get in touch. Much more to come in this space!

19.11.2025 11:24 👍 5 🔁 0 💬 0 📌 0

Without minimising the seriousness of the threat raised in this paper, I'm more optimistic. This is just the latest challenge in online integrity of online research.

We've been proactively adding to our suite of authenticity tools - more every week - including many of Sean's recommendations:

19.11.2025 11:24 👍 11 🔁 2 💬 1 📌 2

Why have I been asked to recheck my identity? | Prolific Participants

We also do spot checks to protect against account reselling: participant-help.prolific.com/en/articles/...

19.11.2025 11:16 👍 0 🔁 0 💬 0 📌 0

Not sure I agree – these are tractable challenges and we are working on them.

bsky.app/profile/phe-...

19.11.2025 11:12 👍 0 🔁 0 💬 1 📌 0

Tara, let me know if I can share any more info that would put your mind at ease. We have implemented numerous mitigations to ensure the collection of authentic data (and continue to assess and invest further).

19.11.2025 10:18 👍 1 🔁 0 💬 1 📌 0

New authenticity checks detect AI misuse in research | Prolific Introducing authenticity checks: Ensure genuine human responses in the age of AI

www.prolific.com/resources/in...

There's a lot of work to do here, though, and it's likely to continue being a bit of an arms race. Our bet is on many layers of assessment here, but we are keen to collaborate with researchers & the broader community to address this (very real) threat.

18.11.2025 21:26 👍 18 🔁 0 💬 2 📌 0

New authenticity checks detect AI misuse in research | Prolific Introducing authenticity checks: Ensure genuine human responses in the age of AI

Agreed that this is the harder challenge, though we do have tooling for this also (in addition to throttling). Works with Qualtrics.

18.11.2025 21:25 👍 9 🔁 0 💬 1 📌 0

& we're continuing to invest here. I'd be happy to work with you or Sean on replication if of interest.

Some notes on our continued identity verification and agent detection here:
participant-help.prolific.com/en/articles/...
www.prolific.com/resources/in...

18.11.2025 21:19 👍 4 🔁 0 💬 0 📌 0

I disagree. AFAIK, this was not run on Prolific, and we already have the majority of Sean's recommendations in place.
Ongoing Panelist Validation ✅
Throttling Mechanisms ✅
Panelist Professionalism ✅
Panelist Quality Checks ✅
Location Checks ✅
Identity Validation ✅
Secure Software 🟠 (partial)

18.11.2025 21:16 👍 18 🔁 1 💬 3 📌 1

Prolific sets standards for authentic human data collection | Prolific Discover how Prolific's data quality system, Protocol, sets industry standards for authentic human data collection

This is a super important paper, but honestly, I think the opposite. Integrity and authenticity of online-based panels have been in question prior to this risk agents. IMO this forces all providers to up their verification and transparency of their participants.e.g.
www.prolific.com/resources/pr...

18.11.2025 21:03 👍 26 🔁 3 💬 2 📌 2

phelimb

Latest posts by phelimb @phe-lim