发布来源: LEIPHONE2026年7月2日重要度: 5/5

Claude Sonnet 5 上线一日差评刷屏：打不过千问和 Minimax，性价比全面翻车

英文摘要

Anthropic released Claude Sonnet 5, positioning it as its most agent-capable Sonnet with performance near Opus 4.8 at 40% of the price. Within 24 hours, a private Chinese reasoning benchmark showed it tied with Qwen3.7-Plus and was beaten by MiniMax-M3 while costing over 6x more (71.96 yuan vs ~11 yuan). Agentic coding benchmarks remain strong (SWE-bench Pro 63.2%, CursorBench 57%), but its Max reasoning mode and a tokenizer change cause up to 35% token inflation, making real-world API costs balloon—one user reported a 145x cost difference compared to DeepSeek for similar workflows. Enterprise adoption is strained: Uber burned its annual AI budget in four months, and Microsoft is dropping Claude Code for Copilot. Safety over-alignment also drew criticism for making the model overly cautious and unusable for legitimate security research.

中文摘要

Anthropic发布Claude Sonnet 5，定位为迄今最具智能体能力的Sonnet，宣称在智能体任务上接近Opus 4.8且价格仅为四折。发布不到一天，中文私有硬核推理题库显示其极限分被MiniMax-M3超越、与通义千问3.7-Plus持平，而测试成本是国产模型的6倍以上（71.96元 vs 约11元）。智能体编程基准仍保持强势（SWE-bench Pro 63.2%、CursorBench 57%），但Max推理模式与分词器更换导致token膨胀最高35%，实际调用成本暴增——有用户对比DeepSeek同工作量账单相差145倍。企业端Uber四个月烧光全年预算，微软拟停用Claude Code。过度安全对齐使模型过于保守，无法用于合法安全研究，亦遭开发者批评。

关键要点

Claude Sonnet 5 tied with Qwen3.7-Plus and was outscored by MiniMax-M3 on a Chinese private hard-reasoning benchmark, while costing over 6x (71.96 vs ~11 yuan).
在中文私有硬核推理基准上，Sonnet 5得分与通义千问3.7-Plus持平、不及MiniMax-M3，但成本超6倍（71.96元对约11元）。
Agentic coding benchmarks are strong (SWE-bench Pro 63.2%, CursorBench 57%), but Max reasoning mode and tokenizer changes cause up to 35% token inflation, drastically raising real-world bills.
智能体编程基准表现强劲（SWE-bench Pro 63.2%, CursorBench 57%），但Max推理模式和分词器变更导致token膨胀高达35%，实际账单飙升。
Enterprise backlash: Uber exhausted its annual AI budget in 4 months; Microsoft is migrating from Claude Code to Copilot to control costs.
企业端反噬：Uber四个月烧光全年AI预算，微软拟从Claude Code迁回Copilot控成本。
Safety over-alignment drew criticism for making Sonnet 5 overly cautious, refusing legitimate security research tasks like exploit development.
过度安全对齐导致模型过于保守，拒绝漏洞利用开发等合法安全研究任务，引发开发者批评。
Promotional pricing ($2/M input tokens) until Aug 31, 2026, but standard pricing and token inflation make Sonnet 5 significantly more expensive than Chinese alternatives.
限时优惠价每百万输入token 2美元至2026年8月31日，但标准定价及token膨胀使其远超国产模型成本。

打开原文