llama.cpp b9864: 服务器SSE心跳逻辑修复,避免慢速预填充时WebUI连接断开
英文摘要
The b9864 release of llama.cpp patches the server’s SSE streaming to ping silent connections every second and only disconnect after 3 seconds of inactivity, so healthy WebUI connections are not dropped during long prefill operations. A new per-request field `sse_ping_interval` is introduced into the request schema; its global default remains 30 seconds to keep API clients unchanged, while the built-in WebUI sends a value of 1 to implement its own 3-second visibility contract. The field is now a typed parameter with hard limits, seeded from the CLI default, and benefits from automatic schema type and range validation. Prebuilt binaries for macOS, Linux, Windows, Android, and iOS are included.
中文摘要
llama.cpp 的 b9864 版本修补了服务器 SSE 流处理逻辑:现在每隔 1 秒发送一次心跳,仅在 3 秒无活动后才断开连接,从而防止在长预填充期间丢弃健康的 WebUI 连接。请求模式新增了按请求可设的 `sse_ping_interval` 字段;全局默认值保持 30 秒,确保 API 客户端行为不变,而内置 WebUI 发送值 1 以实现其自身的 3 秒可见性约定。该字段现为带硬限制的类型化参数,从 CLI 默认值继承,并享有模式自动进行的类型与范围校验。发布包含面向 macOS、Linux、Windows、Android 与 iOS 的预编译二进制文件。
关键要点
SSE streams now ping every 1 second and disconnect only after 3 seconds of silence, preventing healthy connections from being dropped during slow prefill.
SSE 流现在每隔 1 秒发送心跳,仅在 3 秒静默后断开,防止在慢速预填充过程中丢弃健康的连接。
New per-request field `sse_ping_interval` defaults to 30 seconds for API compatibility; the WebUI sends 1 to enforce its own 3-second visibility contract.
新增的按请求字段 `sse_ping_interval` 默认 30 秒以保持 API 兼容性;WebUI 发送 1 以实现其 3 秒可见性约定。
The field is now a typed parameter with hard limits and benefits from automatic schema validation for type and range.
该字段现为带硬限制的类型化参数,并享受自动模式校验提供的类型与范围检查。