perhaps it's the head. @dkoevoet.bsky.social and me once paid two euros (we didn't get reimbursed) to legally use it, gotta work for its money now! ;-)
perhaps it's the head. @dkoevoet.bsky.social and me once paid two euros (we didn't get reimbursed) to legally use it, gotta work for its money now! ;-)
so overall it works well, but then again, we had 120 trials (although substantially less may end up in either bin as we can't control how bright the colors are people experience), but on a single trial basis, the pupil responses are likely very noisy I would think. hope that answers your point!
Thanks! I quickly made this plot for you, a reviewer also asked a question in that direction. Light gray is average pupil response to colors in the bright bin, dark gray average pupil response to colors in the dark bin.
We show that synesthesia is sensory and automatic in nature: the pupil scales with the brightness of experienced synesthetic colors. doi.org/10.7554/eLif...
Now in its new dress @elife.bsky.social (convincing & valuable in round 1).
If anyone wants to pick up the method, happy to share & explain!
Thanks Andrey, sensible analyses. In short, we mostly share the notion that our markers are not sufficiently well establishing that bots were in our data. That said, we do think there is sufficient reason to worry - we'll respond a bit more elaborately in a week or two!
For either technical approach (that more likely affects scalability/economics of fraud/degree of fabricated data rather than its existence), we need to find strategies to respond/treat online behavioral data with appropriate care as a community.
That's fair. Obviously, we dont know how the data was produced, through agents or more autonomous approaches. Given that our task was relatively short and the financial compensation limited, I would expect some sophistication/generalization in the approach (to be a sensible business case)
@belekedezwart.bsky.social is this something you could check?
Does that answer your point? Otherwise Talha can maybe elaborate a bit more as he has played with how bots (can) achieve this more than I/we have.
@belekedezwart.bsky.social could you elaborate on the implementation?
I don't know what the specific (likely) bots did. @talha-ozudogru.bsky.social played around with other tasks. Bots can e.g., 'look' for colors through hex codes quite easily (task had a search element to it).
Good point, we haven't looked at it, but should have similar data from a lab. Some of the patterns, though, are extremely hard to produce (think of almost perfectly gaussian reaction times, reaction times being fully uncorrelated within participants)
My intuition (but it's nothing more than that) would say that we're probably good for now, but may not be in the future (?)
I don't, to be honest. We also instructed a llm to do a much more complex behavioral task including mouse movement, which was easy, but also still clearly identifiable in the traces it left (brute force trying different solutions, direct and fast movements as you mention.
Thanks for flagging, forwarded, will be taken off!
I should say that we don't even agree among coauthors whether this should mean stopping our own similar online data collections wherein we pay participants. I'd say the principle concern of not knowing whether data can be trusted is enough to do so, but others have good reason to continue.
We're not saying that we're doomed (yet) by the way, we're saying we should watch out and apply extra scrutiny.
It's not in there, but we tried instructing llms doing substantially more complex tasks than Posner cueing. It's frustratingly easy.
I came to the same conclusion, but note that not even all my coauthors would directly agree on this
I'm not saying we lost the race, but we clearly are in one (see the original paper on many ways once thought be bot proof, now being easy to overcome ).
Not saying kill all online research here, we're saying that there must be scrutiny (not the first of course).
In principle, I agree with your intuition - the less popular, the more complex, the safer we should still be. But the fundamental concern of not being able to be fully certain anymore is very frustrating.
We haven't tested it, so I can only speculate about the severity of the issue. A colleague of mine (with co-author Leendert Van Maanen) could apply LLMs to absolve certain less popular tasks (requiring task-specific instruction). But that's not my work - so I want to remain careful here.
Possibly it was just 2 bots, but it is possible that it was several more than we flagged as well. Data on human-like responses are openly available of course. Unfortunately, building such bots (simulating human-like behavior ever better), may just be too lucrative of a (criminal) business model.
It's not clear cut, unfortunately.
Out of our 36 participants, we suspect 2-6 to be bots. This number is determined by scoring suspiciously on three separate (or all) metric.
One may disagree with this of course, but the main problem may be that this is only a lower bound (I fear).
thanks to @stigchel.bsky.social, Leendert Van Maanen & @belekedezwart.bsky.social (for letting me be part of the crew)
as well as the reply by the original author, similarly showing problems in online assessed behavioral data: www.pnas.org/doi/10.1073/...
Recent work has shown how vulnerable online survey research is to LLMs. Motivated by this, we examined our online Posner cueing data from Prolific. It's concerning. We now must carefully consider when (or whether?) online behavioral data can be trusted.
see our comment:
www.pnas.org/doi/10.1073/...
This seems to be a proven bad idea:
pubmed.ncbi.nlm.nih.gov/40048619/
Let's hope that the uni can learn from the mistakes of others instead of repeating them. Awareness is helpful, so are guidelines. Layers of bureaucracy are more likely the opposite.
Congratulations Ana on this superb dissertation!Β
Was a pleasure to read the thesis and be part of the iΜΆΜΆΜΆnΜΆΜΆΜΆtΜΆΜΆΜΆeΜΆΜΆΜΆrΜΆΜΆΜΆrΜΆΜΆΜΆoΜΆΜΆΜΆgΜΆΜΆΜΆaΜΆΜΆΜΆtΜΆΜΆΜΆiΜΆΜΆΜΆoΜΆΜΆΜΆnΜΆΜΆΜΆ ΜΆΜΆΜΆcΜΆΜΆΜΆoΜΆΜΆΜΆmΜΆmΜΆiΜΆΜΆΜΆtΜΆΜΆΜΆtΜΆΜΆΜΆeΜΆΜΆΜΆeΜΆΜΆΜΆ defense board! :)
Depends I'd say. Memory capacity for simple items, psychoaccoustics about a psychophysically established masking threshold etc stand practically undebated, yet one may want to know specific effect sizes.
Fair enough. I would say there is still use for them, e.g., for power estimations, but indeed that's not the classical use case of whether there is an effect to begin with.