We tell Claude that Anthropic will train on data from free users but not paid users, allowing it to infer from a user's paid status if it's being trained. We show the setup in the image below. (Note that this is fictional; Anthropic does not train Claude on free or paid user data by default.)
18.12.2024 18:16
๐ 1
๐ 0
๐ฌ 1
๐ 0
Exclusive: New Research Shows AI Strategically Lying
Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit
Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer
time.com/7202784/ai-r...
18.12.2024 17:19
๐ 27
๐ 7
๐ฌ 3
๐ 1
18.12.2024 17:56
๐ 33
๐ 8
๐ฌ 2
๐ 0