Loading / 加载中

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks | thinkgap

SocialSource: TELEGRAM HUGGINGFACEPAPERSJune 4, 2026Importance: 4/5

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

English summary

KVarN is a calibration-free KV-cache quantizer that mitigates error accumulation in autoregressive decoding of large language models. It applies Hadamard rotation and dual-scaling variance normalization to K and V matrices to correct token-scale errors, significantly reducing accumulation compared to existing methods. Evaluated on Qwen2.5-Coder-32B-Instruct, KVarN achieves improved results on generative benchmarks including MATH500, AIME24, and HumanEval at 2-bit precision. The implementation for vLLM is open-sourced on GitHub.

Chinese summary

KVarN 是一种无需校准的 KV 缓存量化器，可缓解大语言模型自回归解码中的误差累积。它通过对 K 和 V 矩阵应用 Hadamard 旋转和双缩放方差归一化，纠正 token 尺度误差，显著减少了与现有方法相比的累积误差。在 Qwen2.5-Coder-32B-Instruct 上评估，KVarN 在 MATH500、AIME24 和 HumanEval 等生成基准上以 2 比特精度取得了更好的结果。vLLM 实现已在 GitHub 上开源。

Key points

KVarN is a calibration-free KV-cache quantizer that addresses error accumulation during autoregressive decoding.
KVarN 是一种无需校准的 KV 缓存量化器，旨在解决自回归解码过程中的误差累积问题。
It uses Hadamard rotation and dual-scaling variance normalization to correct token-scale errors in K and V matrices.
它利用 Hadamard 旋转和双缩放方差归一化来纠正 K 和 V 矩阵中的 token 尺度误差。
KVarN significantly reduces error accumulation compared to existing baselines, setting a new standard for KV-cache quantization.
与现有基线相比，KVarN 显著减少了误差累积，为 KV 缓存量化设立了新标准。
Improved results were demonstrated on Qwen2.5-Coder-32B-Instruct across MATH500, AIME24, and HumanEval at 2-bit precision.
在 Qwen2.5-Coder-32B-Instruct 上，以 2 比特精度在 MATH500、AIME24 和 HumanEval 基准上展现了更优结果。
Open-source implementation available for vLLM on GitHub.
开源的 vLLM 实现已在 GitHub 上发布。

Open original