#ModelAlignment — Bluesky Posts

bluesky.baby

Profile Explorer

Home New Trending Search

About Privacy Terms

#ModelAlignment

Posts tagged #ModelAlignment on Bluesky

Dr Robert N. Winter

@robert.social.winter.ink.ap.brid.gy

6 months ago

Original post on social.winter.ink

AI models often mirror our beliefs, rewarding us with agreeable but shallow answers. This sycophancy flatters rather than challenges, eroding judgment and candour. To gain true value, we must set incentives that favour truth over comfort, design prompts that demand trade-offs, and treat AI as a […]

0 1 0 0

iMerit

@imerit.bsky.social

8 months ago

Training LLMs on open-ended tasks is tricky; opinions vary, and interpretations clash. Consensus scoring + escalation workflows bring structure and consistency to reward modeling.

How it works: bit.ly/44AMGZh

#ModelAlignment #RLHF #LLMTraining #FeedbackQuality

1 0 0 0

The MES Times

@themestimes.bsky.social

9 months ago

A new series of experiments by Palisade Research has sparked concern in the AI safety community, revealing that OpenAI’s o3 model appears to resist shutdown protocols—even when explicitly instructed to comply.

#AISafety #OpenAI #ModelAlignment #ReinforcementLearning #TechEthics

0 0 0 0