Daniel Paleka's Avatar

Daniel Paleka

@dpaleka

ai safety researcher | phd ETH Zurich | https://danielpaleka.com

275
Followers
171
Following
65
Posts
19.11.2024
Joined
Posts Following

Latest posts by Daniel Paleka @dpaleka

With @simonlermen.bsky.social @floriantramer.bsky.social @aemai.bsky.social :D

20.02.2026 17:06 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Large-scale online deanonymization with LLMs We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at...

Privacy online is fundamentally at odds with intelligence getting cheaper.
Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this.

Paper: arxiv.org/abs/2602.16800

20.02.2026 17:03 πŸ‘ 24 πŸ” 2 πŸ’¬ 1 πŸ“Œ 3

If you're anonymous, what should you do?

Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.

20.02.2026 17:03 πŸ‘ 12 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

Short term, AI labs and platforms should try to mitigate large-scale misuse. This is challenging because deanonymization resembles benign usage in many ways.

Long term, if intelligence is too cheap to meter, assume anything you post online can eventually be linked back to you.

20.02.2026 17:03 πŸ‘ 11 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
Post image

Direct deanonymization. Anthropic Interviewer is a dataset of anonymized interviews with scientists about their use of AI.

Following prior work, a simple agent finds ~7% of the interviewed scientists, out of the box, just by searching the web and reasoning over the transcript.

20.02.2026 17:03 πŸ‘ 9 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Scaling: as candidate pools grow to tens of thousands, LLM-based attacks degrade gracefully at high precision; this implies that with sufficient compute, these methods would already scale to entire platforms. With future models, expect the cost to only go down.

20.02.2026 17:03 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Proxy 2: Matching split accounts. On Reddit, we split user histories into "before" and "after", and test LLMs linking them back together. LLM embeddings + reasoning significantly outperform Netflix-Prize-style baselines that match based on subreddits and metadata. @random_walker

20.02.2026 17:03 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Proxy 1: Cross-platform. We take non-anonymous Hacker News accounts that link to their LinkedIn. We then anonymize the HN accounts, removing all directly identifying information. Then, we let LLMs match the anonymized account to the true person; this works with high precision.

20.02.2026 17:03 πŸ‘ 11 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Solution: we construct deanonymization proxies β€” tasks similar to true online deanonymization, that nevertheless give evidence that LLMs are indeed getting scarily better at deanonymization.

20.02.2026 17:03 πŸ‘ 13 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

It is tricky to benchmark LLMs on deanonymization. We don't want to actually deanonymize anonymous individuals! And there is no ground truth for online deanonymization. How could we verify that the AI found the correct person?

20.02.2026 17:03 πŸ‘ 17 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Can LLMs figure out who you are from your anonymous posts?

From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.

New πŸ“„ w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧡

20.02.2026 17:03 πŸ‘ 122 πŸ” 44 πŸ’¬ 8 πŸ“Œ 14

how did they build claude code without claude code?

27.01.2026 17:59 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Pitfalls in Evaluating Language Model Forecasters Large language models (LLMs) have recently been applied to forecasting tasks, with some works claiming these systems match or exceed human performance. In this paper, we argue that, as a...

We don't claim LLM forecasting is impossible, but argue for more careful evaluation methods to confidently measure these capabilities.

Details, examples, and more issues in the paper! (7/7)
arxiv.org/abs/2506.00723

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Benchmarks can reward strategic gambling over calibrated forecasting when optimizing for ranking performance.

"Bet everything" on one scenario beats careful probability estimation for maximizing the chance of ranking #1 on the leaderboard. (6/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Model knowledge cutoffs are guidelines about reliability, not guarantees of no information thereafter. GPT-4o, when nudged, can reveal knowledge beyond its stated Oct 2023 cutoff. (5/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Date-restricted search leaks future knowledge. Searching pre-2019 articles about β€œWuhan” returns results abnormally biased towards the Wuhan Institute of Virology β€” an association that only emerged later. (4/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The time traveler problem: When forecasting "Will civil war break out in Sudan by 2030?", you can deduce the answer is "yes" - otherwise they couldn't grade you yet.

We find that backtesting in existing papers often has similar logical issues that leak information about answers. (3/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Forecasting evaluation is tricky. The gold standard is asking about future events; but that takes months/years.

Instead, researchers use "backtesting": questions where we can evaluate predictions now, but the model has no information about the outcome ... or so we think (2/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧡 (1/7)

05.06.2025 17:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?

26.05.2025 15:07 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

OpenAI and DeepMind should have entries at Eurovision too

17.05.2025 14:16 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ's brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

30.04.2025 22:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Of course, we don't have the old chatgpt-4o API endpoint, so we can't see whether the prompt is fully at fault or there was also a model update.

30.04.2025 15:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The sycophancy effect on controversial binary options is much smaller than what you would assume from the overall positive vibe towards the user. On most such statements, models don't actually state they agree with the user.

30.04.2025 15:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Contrastive statements sycophancy eval Contrastive statements sycophancy eval. GitHub Gist: instantly share code, notes, and snippets.

System prompts and pairs of statements:
gist.github.com/dpaleka/7b4...

30.04.2025 15:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.

30.04.2025 15:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding

30.04.2025 14:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective

29.04.2025 06:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

lmao

09.04.2025 19:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

oh that's cool. it would be interesting to draw a matrix of how well the various models are aware of models other than themselves, in the sense they consider them as coherent entities similar to their own self-perception

09.04.2025 19:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0