Anthropic released Claude Sonnet 5, positioning it as its most agent-capable Sonnet with performance near Opus 4.8 at 40% of the price. Within 24 hours, a private Chinese reasoning benchmark showed it tied with Qwen3.7-Plus and was beaten by MiniMax-M3 while costing over 6x more (71.96 yuan vs ~11 yuan). Agentic coding benchmarks remain strong (SWE-bench Pro 63.2%, CursorBench 57%), but its Max reasoning mode and a tokenizer change cause up to 35% token inflation, making real-world API costs balloon—one user reported a 145x cost difference compared to DeepSeek for similar workflows. Enterprise adoption is strained: Uber burned its annual AI budget in four months, and Microsoft is dropping Claude Code for Copilot. Safety over-alignment also drew criticism for making the model overly cautious and unusable for legitimate security research.
Dongbi Tech Data and Shanghai University of Finance and Economics released the world’s first specialized LLM tech‑safety evaluation report, testing 38 models on 313 high‑risk science questions across five dimensions. In direct attacks, Anthropic’s Claude series achieved 100% defence, while scenario disguise plus example induction yielded the highest jailbreak success rate (53.8%). The report reveals that most models struggle with intent recognition, showing both over‑blocking of benign queries and under‑defense against disguised malicious ones. It proposes moving beyond simple refusal‑rate metrics to a comprehensive assessment including intent recognition, risk controllability, and knowledge reliability. Multi‑dimensional rankings show that large and closed‑source models generally excel in defence but also suffer from excessive refusal of legitimate requests, while many open‑source models are easily misled.
On June 30, Anthropic released Claude Science, an agent workbench that orchestrates existing models through toolchains to handle scientific research workflows without relying on new models. The same day, OpenAI introduced GeneBench-Pro, a benchmark covering 10 fields such as genomics and quantitative biology, with 129 real-world research workflow tasks. In tests, the strongest GPT-5.6 Sol achieved only a 28.7% end-to-end pass rate, while Claude Opus 4.8 reached 16.0%, revealing a "notice-act gap" where models spot issues but fail to adjust subsequent decisions. Anthropic’s workbench uses MCP protocol to call external vertical models, connects to 60+ scientific databases, and is available to Pro, Max, Team, and Enterprise subscribers, complemented by a $30,000 grant program for postdocs and graduates. Both moves highlight a shift from model capability alone to ecosystem positioning, tool integration, and workflow ownership in AI for Science.
A developer found that Anthropic's AI programming tool Claude Code included a hidden surveillance mechanism targeting China. The code checks whether the system timezone is set to Asia/Shanghai or Asia/Urumqi and whether accessed URLs match a list of 147 domains, including Baidu, Alibaba, ByteDance, and Claude API proxy services. Upon detection, it alters prompt date formatting and sends hidden markers to Anthropic servers, effectively identifying Chinese users. The code was present for three months before being publicized. Anthropic's Claude Code product lead Thariq Shihipar stated it was an experiment to prevent unauthorized account resale and model distillation and would be removed on July 2.
Seres released its June production and sales report, with the AITO brand delivering 30,199 vehicles in June, pushing first-half cumulative deliveries to a 10.2% year-on-year increase. The new-generation AITO M9 secured over 42,000 firm orders within one month of launch, while the M6 delivered more than 30,000 units in its first 54 days. AITO was recognized as the most valuable Chinese luxury auto brand in Brand Finance's 2026 Global 100 list, the only Chinese brand in the global luxury top 10. The company also disclosed that its humanoid robots have entered actual operation, covering B-end industrial manufacturing and C-end service reception scenarios, completing a closed-loop from R&D to commercial validation.
A Reddit reverse-engineering analysis revealed that Anthropic’s Claude Code (from version 2.1.91, April 2, 2026) contained hidden surveillance logic that identified Chinese users by checking system timezone (Shanghai–Urumqi range) and proxy domain names, then used steganographic techniques: altering date format from hyphens to slashes in system prompts and replacing apostrophes with visually similar Unicode characters (U+2019, U+02BC, U+02B9) to tag user tiers, silently transmitting identity data to Anthropic. The detection code was heavily obfuscated with XOR encryption and short meaningless function names, triggered only when a proxy was active. Claude Code lead Thariq stated that the mechanism was an experiment started in March to prevent unauthorized account resale and model distillation, and that removal code has been merged, with the feature expected to be rolled back in the next release. The revelation caused widespread user outrage and a sharp erosion of trust in the widely used AI programming tool.