Together AI 的 DeepSeek V4 Pro 在 Artificial Analysis 输出速度与延迟评测中双双登顶
英文摘要
Together AI has optimized serving of DeepSeek V4 Pro to achieve top performance on the Artificial Analysis benchmark, ranking #1 for both output speed (tokens per second) and latency. The inference optimizations tackled KV cache efficiency, prefix reuse, custom kernel implementation, and endpoint profiling. This breakthrough provides developers with the fastest DeepSeek V4 Pro API experience currently available. The company shared a detailed breakdown of their systems work via a linked blog post.
中文摘要
Together AI 针对 DeepSeek V4 Pro 的推理服务进行了优化,在 Artificial Analysis 基准测试中实现了输出速度(每秒 token 数)和延迟的双料第一。优化涉及 KV 缓存管理、前缀复用、定制化内核以及端点配置文件。这使得开发者通过 Together AI 调用 DeepSeek V4 Pro 可获得当前最快的 API 响应体验。该公司通过链接文章详细拆解了系统工程优化细节。
关键要点
Tops Artificial Analysis leaderboard for both output speed and latency.
在 Artificial Analysis 输出速度和延迟排行榜上均排名第一。
Inference optimizations include KV cache, prefix reuse, custom kernels, and endpoint profiling.
推理优化包含 KV 缓存、前缀复用、定制化内核和端点剖析。
Together AI now offers the fastest DeepSeek V4 Pro API endpoint.
Together AI 目前提供最快的 DeepSeek V4 Pro API 端点。