Anthropic and OpenAI Simultaneously Enter AI for Science Arena: Claude Science Workbench and GeneBench-Pro Benchmark Released
English summary
On June 30, Anthropic released Claude Science, an agent workbench that orchestrates existing models through toolchains to handle scientific research workflows without relying on new models. The same day, OpenAI introduced GeneBench-Pro, a benchmark covering 10 fields such as genomics and quantitative biology, with 129 real-world research workflow tasks. In tests, the strongest GPT-5.6 Sol achieved only a 28.7% end-to-end pass rate, while Claude Opus 4.8 reached 16.0%, revealing a "notice-act gap" where models spot issues but fail to adjust subsequent decisions. Anthropic’s workbench uses MCP protocol to call external vertical models, connects to 60+ scientific databases, and is available to Pro, Max, Team, and Enterprise subscribers, complemented by a $30,000 grant program for postdocs and graduates. Both moves highlight a shift from model capability alone to ecosystem positioning, tool integration, and workflow ownership in AI for Science.
Chinese summary
6月30日,Anthropic发布Claude Science科研智能体工作台,通过工具链整合现有模型处理科研全流程,不依赖新模型;同日OpenAI推出GeneBench-Pro评测基准,覆盖基因组学等10个领域共129道真实科研工作流题目,最强模型GPT-5.6 Sol端到端通过率仅28.7%,Claude Opus 4.8为16.0%,揭示模型注意问题却无法修正行动的“notice-act gap”。Anthropic工作台通过MCP协议调用外部垂直模型,连接60余个科学数据库,向Pro、Max、Team、Enterprise订阅用户开放,并推出3万美元资助计划以锁定博士后和研究生等青年科研用户。两大巨头发力,标志AI4S赛道从模型能力比拼转向工作流整合与生态卡位的混战。
Key points
Anthropic launched Claude Science, an agent workbench that coordinates existing models via toolchains for scientific research, without relying on new model releases.
Anthropic发布Claude Science工作台,通过工具链协调现有模型完成科研任务,不依赖新模型。
OpenAI released GeneBench-Pro, a benchmark with 129 tasks across 10 life science fields, where the top model achieved only 28.7% end-to-end pass rate, exposing a critical ‘notice-act gap’.
OpenAI发布GeneBench-Pro评测基准,129道题目覆盖10个生命科学领域,最强模型端到端通过率仅28.7%,暴露“注意到-行动”的差距。
Claude Science uses the MCP protocol to connect to over 60 scientific databases and external vertical models, and is offered via subscription plans plus a $30,000 grant program for academic users.
Claude Science采用MCP协议连接60余个科学数据库和外部垂直模型,提供订阅服务及3万美元学术资助计划,目标锁定青年科研群体。
Both developments signify a strategic shift in AI4S from pure model capability to ecosystem integration, toolchain completeness, and workflow ownership.
两者共同标志着AI4S赛道从单纯比拼模型能力转向生态整合、工具链完整性与工作流掌控权的卡位战。