Anthropic Launches Claude Sonnet 5, Closing the Agentic Gap to Opus 4.8
English summary
Anthropic released Claude Sonnet 5 on June 30, 2026, calling it its most agentic Sonnet model. It outperforms Sonnet 4.6 on every published benchmark, including SWE-bench Pro (63.2% vs 58.1%), OSWorld-Verified (81.2% vs 78.5%), and Humanity’s Last Exam with tools (57.4% vs 46.8%), and nearly matches Opus 4.8 on several evals while edging it on GDPval-AA v2 (1618 vs 1615). Introductory pricing is $2/$10 per million input/output tokens until August 31, 2026, then $3/$15, undercutting Opus 4.8’s $5/$25. The model supports effort levels and shows its best cost-performance value at low and medium effort, but at xhigh effort it can cost more than Opus for similar quality. Sonnet 5 uses an updated tokenizer that may increase token counts by up to 1.35×, and its cyber capabilities are intentionally kept low, with Opus remaining the recommended model for accuracy-critical tasks.
Chinese summary
Anthropic 于 2026 年 6 月 30 日发布 Claude Sonnet 5,称其为迄今为止最具智能体能力的 Sonnet 模型。它在所有已发布基准上均超越 Sonnet 4.6,包括 SWE-bench Pro(63.2% 对 58.1%)、OSWorld-Verified(81.2% 对 78.5%)以及带工具的 Humanity’s Last Exam(57.4% 对 46.8%),并在多项评估中接近 Opus 4.8,同时在 GDPval-AA v2 上以 1618 对 1615 略胜一筹。输入/输出价格分别为每百万 token 2 美元/10 美元(2026 年 8 月 31 日前为推广价,之后变为 3/15 美元),低于 Opus 4.8 的 5/25 美元。模型支持努力级别,在低级和中级下性价比最佳,但在 xhigh 级别下成本可能高于 Opus 且质量相近。Sonnet 5 使用了更新的分词器,可能导致 token 数量最多增加 1.35 倍,其网络能力被刻意降低,对精度要求高的任务仍推荐使用 Opus。
Key points
Sonnet 5 outperforms Sonnet 4.6 on every published benchmark, closing the gap to Opus 4.8.
Sonnet 5 在所有已发布基准上均超越 Sonnet 4.6,缩小了与 Opus 4.8 的差距。
Introductory pricing (until Aug 31, 2026) is $2/$10 per MTok, then standard $3/$15, significantly cheaper than Opus 4.8’s $5/$25.
推广定价(至 2026 年 8 月 31 日)为每百万 token 2/10 美元,之后为 3/15 美元,远低于 Opus 4.8 的 5/25 美元。
Best cost-performance at low and medium effort; at extra-high effort it can exceed Opus 4.8 cost for similar quality.
在低和中努力级别下性价比最佳;在超高努力级别下成本可能超过 Opus 4.8 且质量相近。
Uses an updated tokenizer (1.0–1.35× token increase); cyber capability intentionally limited for safety.
使用更新的分词器(token 增加 1.0–1.35 倍);出于安全考虑,网络能力被刻意限制。