In June 2026, the Carnegie Endowment for International Peace released a report by former U.S. Commerce advisor Alasdair Phillips-Robins, calling for a U.S.-led 'Compute Coalition' of allies (including EU, UK, Japan, South Korea, Canada, Australia, and India). The report finds that the U.S. hosts 73% of the world's advanced AI computing clusters, but domestic grid constraints and political resistance limit expansion, while China is rapidly closing the gap with massive state investment and 2.5 times the U.S. electricity generation. It argues that no single country can dominate alone; a coalition would combine allies' electricity, semiconductor equipment, talent, and capital to outpace China. Key recommendations include accelerating permitting and grid upgrades over subsidies, establishing fast-track investment channels among allies, and aligning AI transparency and testing standards. The proposal explicitly presents the coalition as a loose, interests-based network to build a physical compute barrier around China.
Google DeepMind published a paper titled "The Topological Trouble With Transformers," arguing that the Transformer architecture has a structural flaw in state tracking: as sequences grow, internal state updates are pushed into deeper network layers and become inaccessible to later processing steps. The paper demonstrates this defect with failures in a number-guessing game and the "bank" ambiguity test, where models give contradictory answers despite having disambiguated the word earlier. Chain-of-thought prompting mitigates the problem by externalizing hidden states as visible text, but it is computationally expensive and does not fix the underlying architectural limitation. The authors advocate shifting focus toward recurrent architectures that explicitly pass state along the sequence dimension, such as MAMBA, RWKV-7, and DeltaNet, and suggest future directions like coarser-grained recurrence and staged training from feedforward pretraining to recurrent fine-tuning.
At the 2026 Zhiyuan Conference, 11 leading researchers from institutions like BAAI, Skywork, and Tencent debated what constitutes a true world model, highlighting that current approaches—video generation, 3D reconstruction, or VLM-based methods—fail to capture physical causality and multi-sensor input. They argued that existing benchmarks overemphasize visual fidelity while neglecting physical correctness and interactive prediction. Key bottlenecks include lack of precise physical annotations, absence of standardized evaluation for physical understanding, and inability to model precise real-world interactions. Panelists advocated for moving from next-token to next-state prediction, joint training of state and action models, and closed-loop robotic verification.