Christoph Strauch (@cstrauch)

perhaps it's the head. @dkoevoet.bsky.social and me once paid two euros (we didn't get reimbursed) to legally use it, gotta work for its money now! ;-)

07.03.2026 09:45 👍 3 🔁 0 💬 1 📌 0

so overall it works well, but then again, we had 120 trials (although substantially less may end up in either bin as we can't control how bright the colors are people experience), but on a single trial basis, the pupil responses are likely very noisy I would think. hope that answers your point!

07.03.2026 09:17 👍 2 🔁 0 💬 0 📌 0

Thanks! I quickly made this plot for you, a reviewer also asked a question in that direction. Light gray is average pupil response to colors in the bright bin, dark gray average pupil response to colors in the dark bin.

07.03.2026 09:14 👍 0 🔁 0 💬 1 📌 0

We show that synesthesia is sensory and automatic in nature: the pupil scales with the brightness of experienced synesthetic colors. doi.org/10.7554/eLif...
Now in its new dress @elife.bsky.social (convincing & valuable in round 1).
If anyone wants to pick up the method, happy to share & explain!

07.03.2026 07:58 👍 65 🔁 19 💬 4 📌 0

Thanks Andrey, sensible analyses. In short, we mostly share the notion that our markers are not sufficiently well establishing that bots were in our data. That said, we do think there is sufficient reason to worry - we'll respond a bit more elaborately in a week or two!

04.03.2026 13:16 👍 6 🔁 0 💬 1 📌 0

For either technical approach (that more likely affects scalability/economics of fraud/degree of fabricated data rather than its existence), we need to find strategies to respond/treat online behavioral data with appropriate care as a community.

27.02.2026 20:40 👍 1 🔁 0 💬 0 📌 0

That's fair. Obviously, we dont know how the data was produced, through agents or more autonomous approaches. Given that our task was relatively short and the financial compensation limited, I would expect some sophistication/generalization in the approach (to be a sensible business case)

27.02.2026 20:34 👍 0 🔁 0 💬 1 📌 0

@belekedezwart.bsky.social is this something you could check?

25.02.2026 21:53 👍 1 🔁 0 💬 0 📌 0

Does that answer your point? Otherwise Talha can maybe elaborate a bit more as he has played with how bots (can) achieve this more than I/we have.

25.02.2026 13:48 👍 0 🔁 0 💬 1 📌 0

@belekedezwart.bsky.social could you elaborate on the implementation?
I don't know what the specific (likely) bots did. @talha-ozudogru.bsky.social played around with other tasks. Bots can e.g., 'look' for colors through hex codes quite easily (task had a search element to it).

25.02.2026 13:45 👍 0 🔁 0 💬 1 📌 0

Good point, we haven't looked at it, but should have similar data from a lab. Some of the patterns, though, are extremely hard to produce (think of almost perfectly gaussian reaction times, reaction times being fully uncorrelated within participants)

25.02.2026 08:09 👍 1 🔁 0 💬 1 📌 0

My intuition (but it's nothing more than that) would say that we're probably good for now, but may not be in the future (?)

20.02.2026 08:11 👍 1 🔁 0 💬 0 📌 0

I don't, to be honest. We also instructed a llm to do a much more complex behavioral task including mouse movement, which was easy, but also still clearly identifiable in the traces it left (brute force trying different solutions, direct and fast movements as you mention.

20.02.2026 08:10 👍 1 🔁 0 💬 1 📌 0

Thanks for flagging, forwarded, will be taken off!

20.02.2026 07:19 👍 1 🔁 0 💬 0 📌 0

I should say that we don't even agree among coauthors whether this should mean stopping our own similar online data collections wherein we pay participants. I'd say the principle concern of not knowing whether data can be trusted is enough to do so, but others have good reason to continue.

20.02.2026 06:07 👍 1 🔁 0 💬 0 📌 0

We're not saying that we're doomed (yet) by the way, we're saying we should watch out and apply extra scrutiny.

20.02.2026 06:01 👍 1 🔁 0 💬 1 📌 0

It's not in there, but we tried instructing llms doing substantially more complex tasks than Posner cueing. It's frustratingly easy.

20.02.2026 05:58 👍 0 🔁 0 💬 1 📌 0

I came to the same conclusion, but note that not even all my coauthors would directly agree on this

19.02.2026 17:59 👍 1 🔁 0 💬 0 📌 0

I'm not saying we lost the race, but we clearly are in one (see the original paper on many ways once thought be bot proof, now being easy to overcome ).
Not saying kill all online research here, we're saying that there must be scrutiny (not the first of course).

19.02.2026 15:43 👍 1 🔁 0 💬 0 📌 0

In principle, I agree with your intuition - the less popular, the more complex, the safer we should still be. But the fundamental concern of not being able to be fully certain anymore is very frustrating.

19.02.2026 15:07 👍 1 🔁 0 💬 1 📌 0

We haven't tested it, so I can only speculate about the severity of the issue. A colleague of mine (with co-author Leendert Van Maanen) could apply LLMs to absolve certain less popular tasks (requiring task-specific instruction). But that's not my work - so I want to remain careful here.

19.02.2026 15:05 👍 1 🔁 0 💬 1 📌 0

Possibly it was just 2 bots, but it is possible that it was several more than we flagged as well. Data on human-like responses are openly available of course. Unfortunately, building such bots (simulating human-like behavior ever better), may just be too lucrative of a (criminal) business model.

19.02.2026 14:28 👍 1 🔁 0 💬 1 📌 0

It's not clear cut, unfortunately.
Out of our 36 participants, we suspect 2-6 to be bots. This number is determined by scoring suspiciously on three separate (or all) metric.
One may disagree with this of course, but the main problem may be that this is only a lower bound (I fear).

19.02.2026 14:26 👍 1 🔁 0 💬 1 📌 0

thanks to @stigchel.bsky.social, Leendert Van Maanen & @belekedezwart.bsky.social (for letting me be part of the crew)

19.02.2026 12:07 👍 1 🔁 0 💬 0 📌 0

PNAS Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...

as well as the reply by the original author, similarly showing problems in online assessed behavioral data: www.pnas.org/doi/10.1073/...

19.02.2026 12:01 👍 3 🔁 1 💬 1 📌 0

Recent work has shown how vulnerable online survey research is to LLMs. Motivated by this, we examined our online Posner cueing data from Prolific. It's concerning. We now must carefully consider when (or whether?) online behavioral data can be trusted.
see our comment:
www.pnas.org/doi/10.1073/...

19.02.2026 12:00 👍 76 🔁 34 💬 6 📌 4

This seems to be a proven bad idea:
pubmed.ncbi.nlm.nih.gov/40048619/

Let's hope that the uni can learn from the mistakes of others instead of repeating them. Awareness is helpful, so are guidelines. Layers of bureaucracy are more likely the opposite.

15.02.2026 19:37 👍 0 🔁 0 💬 0 📌 0

Congratulations Ana on this superb dissertation!

Was a pleasure to read the thesis and be part of the i̶̶̶n̶̶̶t̶̶̶e̶̶̶r̶̶̶r̶̶̶o̶̶̶g̶̶̶a̶̶̶t̶̶̶i̶̶̶o̶̶̶n̶̶̶ ̶̶̶c̶̶̶o̶̶̶m̶m̶i̶̶̶t̶̶̶t̶̶̶e̶̶̶e̶̶̶ defense board! :)

10.02.2026 13:35 👍 5 🔁 1 💬 1 📌 0

Depends I'd say. Memory capacity for simple items, psychoaccoustics about a psychophysically established masking threshold etc stand practically undebated, yet one may want to know specific effect sizes.

22.01.2026 11:24 👍 0 🔁 0 💬 1 📌 0

Fair enough. I would say there is still use for them, e.g., for power estimations, but indeed that's not the classical use case of whether there is an effect to begin with.

21.01.2026 08:50 👍 1 🔁 0 💬 0 📌 0

Christoph Strauch

Latest posts by Christoph Strauch @cstrauch