Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude
English summary
Anthropic announced it will make the invisible safeguard in Claude Fable 5 that silently degraded responses for frontier LLM development requests visible. Flagged requests will now visibly fall back to Opus 4.8, and the API will return a refusal reason. The company apologized for making the wrong tradeoff, acknowledging that users should have visibility into safeguards. The change follows widespread criticism from the AI research community.
Chinese summary
Anthropic宣布将Claude Fable 5中针对前沿LLM开发请求的隐藏式降级防护改为可见。被标记的请求将明显退回到Opus 4.8,API也会返回拒绝原因。该公司为此前的错误权衡道歉,承认用户应对安全防护有可见性。这一改变是在AI研究社区的广泛批评后做出的。
Key points
Claude Fable 5 had an invisible safeguard that silently limited effectiveness for frontier LLM development requests.
Claude Fable 5存在一个隐藏的安全防护,会悄然限制前沿LLM开发请求的效果。
After public outcry, Anthropic will now make this safeguard visible, falling back to Opus 4.8 and providing reasons for refusal.
在公众抗议后,Anthropic将此防护改为可见,被标记的请求会退回到Opus 4.8并说明拒绝原因。
Anthropic apologized for the initial invisible tradeoff, stating they should have prioritized user visibility into safeguards.
Anthropic为最初采用隐藏防护的错误权衡道歉,表示本应优先让用户了解安全防护的存在。