llama.cpp Release b9864: SSE Pings Prevent Connection Drops During Slow Prefill
English summary
llama.cpp release b9864 addresses a server-side issue where healthy client connections could be dropped during slow prompt prefill. The server and WebUI now ping silent SSE streams every 1 second and only kick a client after 3 seconds of inactivity, ensuring long-running prefill phases do not trigger timeouts. The SSE ping interval is exposed as a per-request field (`sse_ping_interval`) in the WebUI request body (set to 1 second), while the global default remains 30 seconds for API clients, preserving backward compatibility. The server implementation moves the parameter into the request schema with proper type and range validation. Pre-built binaries are provided for macOS, Linux, Windows, and Android across multiple backends.
Chinese summary
llama.cpp b9864 版本修复了一个服务端问题:慢速提示预填充期间正常的客户端连接可能被断开。服务器和 WebUI 现在每 1 秒对静默 SSE 流发送心跳 ping,仅在 3 秒无响应后才断开连接,确保长预填充阶段不会触发超时。SSE ping 间隔通过请求体字段(`sse_ping_interval`)暴露,WebUI 请求中设为 1 秒,而 API 客户端的全局默认值保持 30 秒以维持兼容性。服务端实现将该参数移入请求模式并进行类型和范围校验。此版本提供了 macOS、Linux、Windows 和 Android 的预编译二进制文件,涵盖多种后端。
Key points
Server and UI now ping silent SSE streams every 1s and disconnect only after 3s, preventing healthy connections from being dropped during slow prompt prefill.
服务器和 UI 现每 1 秒对静默 SSE 流发送心跳,仅 3 秒无响应后断开,避免慢速提示预填充时正常连接中断。
The `sse_ping_interval` is now a per-request body field; the WebUI sends `sse_ping_interval: 1`, while the global default of 30s remains for other API clients.
`sse_ping_interval` 现为请求体字段;WebUI 发送值为 1,而其他 API 客户端沿用全局默认的 30 秒。
The parameter is refactored into the request schema with type and range validation, replacing raw JSON parsing.
该参数已重构进请求模式,具备类型和范围验证,取代了原先的原始 JSON 解析。
Pre-built binaries support macOS, Linux (CPU, Vulkan, ROCm, OpenVINO, SYCL), Windows (CPU, CUDA, Vulkan, OpenVINO, SYCL, HIP), and Android.
预编译二进制文件支持 macOS、Linux(CPU、Vulkan、ROCm、OpenVINO、SYCL)、Windows(CPU、CUDA、Vulkan、OpenVINO、SYCL、HIP)和 Android。