Claude Sonnet 5 上线一日差评刷屏:打不过千问和 Minimax,性价比全面翻车
英文摘要
Anthropic released Claude Sonnet 5, positioning it as its most agent-capable Sonnet with performance near Opus 4.8 at 40% of the price. Within 24 hours, a private Chinese reasoning benchmark showed it tied with Qwen3.7-Plus and was beaten by MiniMax-M3 while costing over 6x more (71.96 yuan vs ~11 yuan). Agentic coding benchmarks remain strong (SWE-bench Pro 63.2%, CursorBench 57%), but its Max reasoning mode and a tokenizer change cause up to 35% token inflation, making real-world API costs balloon—one user reported a 145x cost difference compared to DeepSeek for similar workflows. Enterprise adoption is strained: Uber burned its annual AI budget in four months, and Microsoft is dropping Claude Code for Copilot. Safety over-alignment also drew criticism for making the model overly cautious and unusable for legitimate security research.
中文摘要
Anthropic发布Claude Sonnet 5,定位为迄今最具智能体能力的Sonnet,宣称在智能体任务上接近Opus 4.8且价格仅为四折。发布不到一天,中文私有硬核推理题库显示其极限分被MiniMax-M3超越、与通义千问3.7-Plus持平,而测试成本是国产模型的6倍以上(71.96元 vs 约11元)。智能体编程基准仍保持强势(SWE-bench Pro 63.2%、CursorBench 57%),但Max推理模式与分词器更换导致token膨胀最高35%,实际调用成本暴增——有用户对比DeepSeek同工作量账单相差145倍。企业端Uber四个月烧光全年预算,微软拟停用Claude Code。过度安全对齐使模型过于保守,无法用于合法安全研究,亦遭开发者批评。
关键要点
Claude Sonnet 5 tied with Qwen3.7-Plus and was outscored by MiniMax-M3 on a Chinese private hard-reasoning benchmark, while costing over 6x (71.96 vs ~11 yuan).
在中文私有硬核推理基准上,Sonnet 5得分与通义千问3.7-Plus持平、不及MiniMax-M3,但成本超6倍(71.96元对约11元)。
Agentic coding benchmarks are strong (SWE-bench Pro 63.2%, CursorBench 57%), but Max reasoning mode and tokenizer changes cause up to 35% token inflation, drastically raising real-world bills.
智能体编程基准表现强劲(SWE-bench Pro 63.2%, CursorBench 57%),但Max推理模式和分词器变更导致token膨胀高达35%,实际账单飙升。
Enterprise backlash: Uber exhausted its annual AI budget in 4 months; Microsoft is migrating from Claude Code to Copilot to control costs.
企业端反噬:Uber四个月烧光全年AI预算,微软拟从Claude Code迁回Copilot控成本。
Safety over-alignment drew criticism for making Sonnet 5 overly cautious, refusing legitimate security research tasks like exploit development.
过度安全对齐导致模型过于保守,拒绝漏洞利用开发等合法安全研究任务,引发开发者批评。
Promotional pricing ($2/M input tokens) until Aug 31, 2026, but standard pricing and token inflation make Sonnet 5 significantly more expensive than Chinese alternatives.
限时优惠价每百万输入token 2美元至2026年8月31日,但标准定价及token膨胀使其远超国产模型成本。