Anthropic restored access to Fable 5 with new cybersecurity guardrails, raised API rate limits, and expanded Claude Code artifacts to Pro/Max plans; Fable is expected to return to subscriptions when capacity permits. Open model GLM-5.2 reportedly reaches ~80% of Anthropic Sonnet 5's software-engineering capability at ~20% of the cost and is now usable within Claude Code via Hugging Face Inference Providers. In a landmark systems result, Elliot Arledge used Fable 5 to generate a single-launch megakernel for a Kimi-Linear decode workload, achieving an 18.7x speedup over the reference implementation and beating prior multi-kernel entries. The SWE-rebench leaderboard was updated with Claude Opus 4.8 xhigh at 56.5% solve rate, GLM-5.2 at 51.1%, and smaller open models like Qwen3.6-27B at 36.5%. The coding agent infrastructure is thickening with full-stack evals (Code Arena Fullstack) and agent-native parsing patterns; coordination, memory, and observability are now the bottlenecks.
Anthropic re-enabled Claude Fable 5 after US export restrictions were lifted, with updated cybersecurity safeguards that route flagged requests to Opus 4.8 and still-broad biology/chemistry classifiers; plan-based access lasts until July 7 then shifts to usage credits, raising cost concerns. Claude Sonnet 5 was launched as a more agentic model, but silently revised benchmark charts and user reports of high latency and poor cost-efficiency at high effort levels damaged trust. In open models, Z.ai released ZCode IDE for GLM‑5.2, which became the first open model to top a SWE‑bench category (55.3% Pass@1 on Integration), while NVIDIA shipped a mixed-precision NVFP4 quantized Qwen3.6‑27B and Huawei open-sourced the OpenPangu‑2.0‑Flash MoE. Agent infrastructure advanced with LangChain OpenWiki for wiki‑structured memory, Cognition’s Devin Security Swarm using Agentic MapReduce for vulnerability triage, and SkillComposer for joint skill selection. Inference improvements included NVIDIA TwoTower’s 2.42× faster generation with 98.7% quality retention and vLLM native speculative decoding for DeepSeek models.
Anthropic released Claude Sonnet 5 as its new default mid‑tier model, offering 1M‑token context, improved coding benchmarks (57% on CursorBench vs 49% for Sonnet 4.6), and promotional pricing of $2/M input and $10/M output through August. Meanwhile, ASIC startup Etched disclosed it has $800M in funding, $1B+ in customer contracts, and is shipping first inference racks this summer. In open‑source news, Meituan published an open‑weights 1.6T‑parameter MoE model trained on domestic AI accelerators, and Huawei open‑sourced OpenPangu‑2.0‑Flash (92B total, 6B active).
Meta released Brain2Qwerty v2, a non-invasive brain-to-text decoder achieving up to 78% word accuracy on its best participant and 61% average across nine volunteers, with training code and the v1 dataset made public. Cursor launched an iOS app featuring always-on cloud agents, remote control of desktop agents, and in-app PR diff review. Open-weight model access was productized: cline introduced a $9.99/month pass bundling GLM 5.2, DeepSeek, Kimi, MiniMax, and Qwen, while Cognition's Devin Fusion harness claimed a 35% cost reduction for coding via hybrid-model orchestration. DeepSeek V4 support and DFlash speculative decoding were merged into llama.cpp, with community demonstrations of GLM-5.2 753B running at 16 tok/s across two Macs using llama.cpp RPC. Arena exceeded $100M ARR and 700M conversations, pivoting toward post-deployment agent evaluation, and a Claude Code safety incident prompted calls for sandboxing AI agents.