发布来源: HUXIU2026年7月2日重要度: 5/5

OpenAI研发新推理优化方案，推理成本骤降逾半，借鉴DeepSeek路线

英文摘要

OpenAI has reportedly developed a new system optimization that can cut model inference costs by more than half, reducing GPU requirements from tens of thousands to just hundreds. The optimization is believed to focus on KV cache efficiency, a strategy DeepSeek previously pioneered with its Multi-head Latent Attention (MLA) and caching discounts. This software push is complemented by hardware efforts: the Jalapeño inference chip co-designed with Broadcom, and a $10 billion-plus deal with Cerebras for wafer-scale inference. OpenAI's 2025 revenue reached $13.07 billion but operating losses hit $20.9 billion, with cloud bills alone exceeding $17.2 billion, making deep cost cuts essential for its delayed IPO. API gross margin improved to 39% in Q1 2026, with a target of 52% by year-end.

中文摘要

据报道，OpenAI 已找到一种新的系统优化方案，可将模型推理成本削减一半以上，使原本需要数万张GPU的负载降至数百张。该优化主要围绕KV缓存效率进行，与 DeepSeek 此前提出的多头潜在注意力（MLA）及缓存折扣策略相似。软件优化之外，OpenAI 在硬件端同步发力，推出了与博通联合设计的 Jalapeño 推理芯片，并与 Cerebras 签订超百亿美元协议获取晶圆级推理算力。2025年OpenAI收入130.7亿美元，运营亏损却高达209亿美元，仅微软云账单就超过172亿美元，大幅降本成为其推迟至2027年上市的关键。2026年第一季度API业务毛利率已提升至39%，目标年底达52%。

关键要点

OpenAI's new system optimization reduces inference GPU needs from tens of thousands to hundreds, slashing costs by over 50%.
OpenAI的新方案将推理所需GPU从数万张降至数百张，成本砍掉一半以上。
The optimization likely centers on KV cache efficiency, paralleling DeepSeek's MLA technology and hinting at future caching discounts.
优化核心可能是KV缓存效率，与DeepSeek的MLA技术相似，并预示后续缓存折扣。
Hardware initiatives include the Jalapeño inference chip with Broadcom and a $10B+ Cerebras deal for wafer-scale engines.
硬件方面布局Jalapeño推理芯片及与Cerebras超百亿美元的晶圆级芯片合作。
OpenAI faces severe financial pressure: 2025 revenue $13.07B vs. $20.9B operating loss, with cloud costs exceeding $17.2B.
财务压力巨大：2025年营收130.7亿美元，运营亏损209亿，云账单超172亿，降本至关重要。
API gross margin improved to 39% in Q1 2026, targeting 52% by year-end; inference cost control is pivotal for the postponed IPO.
2026年Q1 API毛利率39%，目标年底52%；推理成本控制是推迟上市的关键一环。

打开原文