SGLang Adds LingBot Realtime Prompt, KV Window, and Lazy VAE Controls for Diffusion Models
English summary
A pull request to the SGLang project introduces three new optional controls for LingBot diffusion models. Realtime support now accepts composite prompt and camera-action inputs in a single event and resets the cross-attention cache only when the prompt changes. An optional interactive KV window control, disabled by default and activated via the SGLANG_LINGBOT_ENABLE_INTERACTIVE_KV_WINDOW environment variable, dynamically adjusts sampling windows for static versus moving camera-control chunks to improve motion continuity. A lazy VAE encode control, also disabled by default and configured through SGLANG_LINGBOT_LAZY_VAE_ENCODE_BLACK_FRAMES, encodes only the initial image plus a configurable number of black padding frames and then extends the latent condition to avoid redundant encoding on long padding tails. The additions include accuracy tests for static hover and forward-moving dragon scenarios.
Chinese summary
SGLang 项目的一个拉取请求为 LingBot 扩散模型引入了三项新的可选控制。实时支持现在可在单个事件中接收复合提示和摄像头动作输入,并仅当提示更改时才重置交叉注意力缓存。可选交互式 KV 窗口控制默认禁用,通过环境变量 SGLANG_LINGBOT_ENABLE_INTERACTIVE_KV_WINDOW 启用,可动态调整静态与移动摄像头控制块的采样窗口以改善运动连续性。延迟 VAE 编码控制同样默认禁用,通过 SGLANG_LINGBOT_LAZY_VAE_ENCODE_BLACK_FRAMES 配置,仅编码初始图像加上可配置数量的黑色填充帧,然后扩展潜在条件以避免在长填充尾部进行冗余编码。此次更新还包含静态悬停和前进飞行动作的准确性测试。
Key points
Realtime prompt handling merges composite prompt and camera-action inputs in one event and only resets cross-attention cache when the prompt changes.
实时提示处理将复合提示和摄像头动作输入合并为一个事件,并仅当提示更改时才重置交叉注意力缓存。
Optional interactive KV window control (env: SGLANG_LINGBOT_ENABLE_INTERACTIVE_KV_WINDOW) dynamically adjusts sampling windows for static vs. moving camera chunks to improve motion continuity.
可选交互式 KV 窗口控制(环境变量:SGLANG_LINGBOT_ENABLE_INTERACTIVE_KV_WINDOW)动态调整静态与移动摄像头块的采样窗口,以改善运动连续性。
Lazy VAE encode (env: SGLANG_LINGBOT_LAZY_VAE_ENCODE_BLACK_FRAMES) encodes only the initial frame plus configurable black padding frames, then extends the latent condition to avoid redundant encoding.
延迟 VAE 编码(环境变量:SGLANG_LINGBOT_LAZY_VAE_ENCODE_BLACK_FRAMES)仅编码初始帧和可配置数量的黑色填充帧,然后扩展潜在条件以避免冗余编码。
Both new controls are disabled by default to preserve existing behavior, with accuracy tests provided for static hover and forward-moving scenarios.
两个新控制默认禁用以保持原有行为,并提供了静态悬停和前进运动场景的准确性测试。