Spectral Labs releases calibration-aware Q4_K_M quant of Qwen3.5 0.8B, recovers 96.5% of BF16 gap vs pure llama.cpp Q4_K_M
English summary
Spectral Labs introduced SpectralQuant, a calibration-aware quantization method that identifies behaviorally sensitive weight directions and shapes error to protect the most important weights. They released a Qwen3.5 0.8B Q4_K_M GGUF at exactly 4.52 BPW (415.7 MiB) with no FP-kept modules or dynamic formats. On the heldout120 evaluation set, SpectralQuant achieved a prompt loss of 2.9961 versus standard llama.cpp pure Q4_K_M's 3.4135, recovering 96.5% of the BF16 gap. It also outperformed Unsloth's Q4_K_S, Q4_K_M, IQ4_NL and IQ4_XS quants on heldout120 while using fewer bytes (those Unsloth quants range from 5.11 to 5.52 BPW). On C4 validation, Unsloth's Q4_K_M was slightly better but used about 92 MB more. The model is a standard GGUF compatible with llama.cpp's llama-cli and llama-server.
Chinese summary
Spectral Labs 推出校准感知量化方法 SpectralQuant,通过识别行为敏感方向的权重并塑造量化误差以保护关键权重。他们发布了 Qwen3.5 0.8B 的 Q4_K_M GGUF 量化版本,严格保持 4.52 BPW(415.7 MiB),无 FP 保留模块或动态格式。在 heldout120 评估中,SpectralQuant 的提示损失为 2.9961,而标准 llama.cpp 纯 Q4_K_M 为 3.4135,恢复了与 BF16 差距的 96.5%。在相同 heldout120 上,它还优于比特率更高的 Unsloth 量化版本(Q4_K_S、Q4_K_M、IQ4_NL、IQ4_XS,比特率 5.11--5.52 BPW)。在 C4 验证上,Unsloth 的 Q4_K_M 略优但多占用约 92 MB。该模型是标准 GGUF,可直接用于 llama.cpp 的 llama-cli 和 llama-server。
Key points
SpectralQuant identifies and protects behaviorally sensitive weight directions, recovering 96.5% of the BF16 gap for Qwen3.5 0.8B at the same 4.52 BPW Q4_K_M footprint.
SpectralQuant 识别并保护行为敏感的权重方向,在相同 4.52 BPW Q4_K_M 占用下,为 Qwen3.5 0.8B 恢复了与 BF16 差距的 96.5%。
On heldout120, SpectralQuant Q4_K_M achieved loss of 2.9961 vs pure llama.cpp Q4_K_M's 3.4135 (BF16 reference 2.9809).
在 heldout120 评估中,SpectralQuant Q4_K_M 损失为 2.9961,而纯 llama.cpp Q4_K_M 为 3.4135(BF16 参考 2.9809)。
SpectralQuant Q4_K_M beat Unsloth's higher-bit quants (Q4_K_S, Q4_K_M, IQ4_NL, IQ4_XS) on heldout120 while using fewer bytes.
SpectralQuant Q4_K_M 在 heldout120 上击败了比特率更高的 Unsloth 量化版本(Q4_K_S、Q4_K_M、IQ4_NL、IQ4_XS),且占用内存更少。
The release is a standard GGUF with no mixed precision or dynamic sidecars, directly compatible with llama.cpp.
该版本为标准 GGUF,不含混合精度或动态侧载模块,可直接兼容 llama.cpp。