With @simonlermen.bsky.social @floriantramer.bsky.social @aemai.bsky.social :D
With @simonlermen.bsky.social @floriantramer.bsky.social @aemai.bsky.social :D
Privacy online is fundamentally at odds with intelligence getting cheaper.
Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this.
Paper: arxiv.org/abs/2602.16800
If you're anonymous, what should you do?
Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.
Short term, AI labs and platforms should try to mitigate large-scale misuse. This is challenging because deanonymization resembles benign usage in many ways.
Long term, if intelligence is too cheap to meter, assume anything you post online can eventually be linked back to you.
Direct deanonymization. Anthropic Interviewer is a dataset of anonymized interviews with scientists about their use of AI.
Following prior work, a simple agent finds ~7% of the interviewed scientists, out of the box, just by searching the web and reasoning over the transcript.
Scaling: as candidate pools grow to tens of thousands, LLM-based attacks degrade gracefully at high precision; this implies that with sufficient compute, these methods would already scale to entire platforms. With future models, expect the cost to only go down.
Proxy 2: Matching split accounts. On Reddit, we split user histories into "before" and "after", and test LLMs linking them back together. LLM embeddings + reasoning significantly outperform Netflix-Prize-style baselines that match based on subreddits and metadata. @random_walker
Proxy 1: Cross-platform. We take non-anonymous Hacker News accounts that link to their LinkedIn. We then anonymize the HN accounts, removing all directly identifying information. Then, we let LLMs match the anonymized account to the true person; this works with high precision.
Solution: we construct deanonymization proxies β tasks similar to true online deanonymization, that nevertheless give evidence that LLMs are indeed getting scarily better at deanonymization.
It is tricky to benchmark LLMs on deanonymization. We don't want to actually deanonymize anonymous individuals! And there is no ground truth for online deanonymization. How could we verify that the AI found the correct person?
Can LLMs figure out who you are from your anonymous posts?
From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.
New π w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer π§΅
how did they build claude code without claude code?
We don't claim LLM forecasting is impossible, but argue for more careful evaluation methods to confidently measure these capabilities.
Details, examples, and more issues in the paper! (7/7)
arxiv.org/abs/2506.00723
Benchmarks can reward strategic gambling over calibrated forecasting when optimizing for ranking performance.
"Bet everything" on one scenario beats careful probability estimation for maximizing the chance of ranking #1 on the leaderboard. (6/7)
Model knowledge cutoffs are guidelines about reliability, not guarantees of no information thereafter. GPT-4o, when nudged, can reveal knowledge beyond its stated Oct 2023 cutoff. (5/7)
Date-restricted search leaks future knowledge. Searching pre-2019 articles about βWuhanβ returns results abnormally biased towards the Wuhan Institute of Virology β an association that only emerged later. (4/7)
The time traveler problem: When forecasting "Will civil war break out in Sudan by 2030?", you can deduce the answer is "yes" - otherwise they couldn't grade you yet.
We find that backtesting in existing papers often has similar logical issues that leak information about answers. (3/7)
Forecasting evaluation is tricky. The gold standard is asking about future events; but that takes months/years.
Instead, researchers use "backtesting": questions where we can evaluate predictions now, but the model has no information about the outcome ... or so we think (2/7)
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.
We identify key issues with forecasting evaluations π§΅ (1/7)
why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?
OpenAI and DeepMind should have entries at Eurovision too
3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear
4o: yes you are Jesus Christ's brother. now go. Nanjing awaits
o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream
Of course, we don't have the old chatgpt-4o API endpoint, so we can't see whether the prompt is fully at fault or there was also a model update.
The sycophancy effect on controversial binary options is much smaller than what you would assume from the overall positive vibe towards the user. On most such statements, models don't actually state they agree with the user.
System prompts and pairs of statements:
gist.github.com/dpaleka/7b4...
Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.
i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding
we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective
lmao
oh that's cool. it would be interesting to draw a matrix of how well the various models are aware of models other than themselves, in the sense they consider them as coherent entities similar to their own self-perception