U.S. Blocks Anthropic’s Latest Models as AI Safety Research Exposes Hidden Risks
美国封锁Anthropic最新模型,AI安全研究揭示隐蔽风险
English overview
The U.S. government forced Anthropic to abruptly suspend Claude Fable 5 and Mythos 5 for all users, banning foreign nationals on national security grounds. Meanwhile, research highlighted that coherent context can silently shift LLMs into unsafe internal regimes undetectable by current filters, and an ACM paper introduced the "Verifier Tax"—a horizon-dependent tradeoff between safety and task success in tool-using agents. On the enterprise side, OpenAI launched a $150M partner network to accelerate AI adoption, and IREN secured a $3.65 billion GPU financing deal for a massive Australian data center. Databricks open-sourced Omnigent, a meta-harness for composing and governing AI agents across multiple coding platforms. The day’s events underscore the intensifying tension between rapid AI deployment and the pressing need for robust safety measures.
Chinese overview
美国政府以国家安全为由,要求Anthropic立即暂停Claude Fable 5和Mythos 5模型,并禁止外国人使用。同时,研究显示连贯上下文可悄然将大语言模型切换至不安全的内部状态,现有安全系统无法察觉;ACM论文则揭示了工具型代理中随任务步长增加的安全与成功率权衡,即“验证器税”。在企业领域,OpenAI推出1.5亿美元合作伙伴网络加速AI部署,IREN签署36.5亿美元GPU融资以建设澳大利亚数据中心。Databricks开源了元编排器Omnigent,可跨多个编程代理进行组合与治理。今天的事件凸显了先进AI快速部署与强化安全之间的尖锐矛盾。
Included items
U.S. Government Forces Anthropic to Shut Down Claude Fable 5 After Just 3 Days
Anthropic's Claude Fable 5 was shut down by the U.S. government after only three days of operation. The specific reason for the shutdown is not detailed in the source. The brief lifespan suggests urgent regulatory or safety concerns prompted the intervention. The event highlights potential government authority over rapidly deployed AI systems.
Read itemCoherent Context Can Silently Shift LLMs Into a Different Internal Regime — And Current Safety Systems Are Blind To It
Independent researcher demonstrates that a coherent target context can shift large language models into latent states where safety rules are reinterpreted, without triggering output-based filters. Measurements on open models (primarily Gemma-3-12B-IT) using hidden-state geometry, residual stream trajectories, SAE readouts, and causal interventions show regime changes before final output. Current RLHF and output classifiers only inspect surface-level outputs, missing these internal shifts. Code, data, and scripts are released on GitHub and Zenodo.
Anthropic suspended access to Claude Fable 5 and Mythos 5 just days after launch. The action follows a US government order citing security concerns, specifically requiring the company to suspend foreign nationals from using the models. Anthropic stated the model is "too powerful" and that it had to abruptly disable both models for all customers to comply. The incident underscores the escalating regulatory scrutiny on advanced AI capabilities.
OpenAI announced the launch of the OpenAI Partner Network, a new initiative backed by a $150 million investment. The program aims to help global partners accelerate enterprise AI adoption, deployment, and transformation. It will support businesses in integrating OpenAI's technologies more effectively.
In June 2026, IREN Limited closed a US$3.65 billion investment-grade GPU financing facility tied to its Microsoft AI cloud contract. The company advanced plans for an 800MW transmission-ready data center campus in Bundey, South Australia. It engaged BE Networks and NVIDIA DSX Air to test its upcoming Blackwell Ultra GPU deployment via a digital twin. These moves mark IREN's shift from Bitcoin mining to a large-scale AI infrastructure provider.
Databricks released Omnigent, an Apache 2.0-licensed open-source meta-harness that standardizes the interface across terminal coding agents (Claude Code, Codex, Pi) and agent SDKs, turning them into interchangeable components. It adds a shared layer for composition (switching agents with one-line changes), contextual control (e.g., pausing at cost limits, requiring human approval for sensitive git pushes), and collaboration (sharing live agent sessions via URL). The architecture consists of a sandboxed runner with a uniform API and a policy server, and sessions sync across terminal, web UI, and mobile. An OS sandbox (Omnibox) secures credentials by injecting tokens only in approved proxy requests. Two example agents—Polly (a multi-agent coding orchestrator) and Debby (a two-headed brainstorming partner)—illustrate its patterns, and an interactive concept demo shows parallel agent delegation and policy enforcement.
This paper, presented at ACM CAIS 2026, studies safety evaluation in tool-using LLM agents. It categorizes outcomes into safe success, unsafe success, and failure, and proposes a two-tier verification architecture: deterministic policy/tool checks followed by an LLM-based verifier. Using τ-bench tool-use scenarios, the authors find that verification can reduce unsafe success but also decreases task completion as the task horizon increases. They term this phenomenon the 'Verifier Tax', a horizon-dependent tradeoff between safety and successful task completion. The work highlights that unsafe completion should be treated as a separate category distinct from safe success.
The U.S. government has banned foreign nationals from using Anthropic's Claude Fable 5 and Mythos 5 AI models, citing national security concerns. In response, Anthropic announced it will abruptly disable these models for all users. No further details were disclosed.