SocialSource: R LOCALLLAMAJune 27, 2026Importance: 4/5

Spectral Labs releases calibration-aware Q4_K_M quant of Qwen3.5 0.8B, recovers 96.5% of BF16 gap vs pure llama.cpp Q4_K_M

English summary

Spectral Labs introduced SpectralQuant, a calibration-aware quantization method that identifies behaviorally sensitive weight directions and shapes error to protect the most important weights. They released a Qwen3.5 0.8B Q4_K_M GGUF at exactly 4.52 BPW (415.7 MiB) with no FP-kept modules or dynamic formats. On the heldout120 evaluation set, SpectralQuant achieved a prompt loss of 2.9961 versus standard llama.cpp pure Q4_K_M's 3.4135, recovering 96.5% of the BF16 gap. It also outperformed Unsloth's Q4_K_S, Q4_K_M, IQ4_NL and IQ4_XS quants on heldout120 while using fewer bytes (those Unsloth quants range from 5.11 to 5.52 BPW). On C4 validation, Unsloth's Q4_K_M was slightly better but used about 92 MB more. The model is a standard GGUF compatible with llama.cpp's llama-cli and llama-server.

Chinese summary

Spectral Labs 推出校准感知量化方法 SpectralQuant，通过识别行为敏感方向的权重并塑造量化误差以保护关键权重。他们发布了 Qwen3.5 0.8B 的 Q4_K_M GGUF 量化版本，严格保持 4.52 BPW（415.7 MiB），无 FP 保留模块或动态格式。在 heldout120 评估中，SpectralQuant 的提示损失为 2.9961，而标准 llama.cpp 纯 Q4_K_M 为 3.4135，恢复了与 BF16 差距的 96.5%。在相同 heldout120 上，它还优于比特率更高的 Unsloth 量化版本（Q4_K_S、Q4_K_M、IQ4_NL、IQ4_XS，比特率 5.11--5.52 BPW）。在 C4 验证上，Unsloth 的 Q4_K_M 略优但多占用约 92 MB。该模型是标准 GGUF，可直接用于 llama.cpp 的 llama-cli 和 llama-server。

Key points

SpectralQuant identifies and protects behaviorally sensitive weight directions, recovering 96.5% of the BF16 gap for Qwen3.5 0.8B at the same 4.52 BPW Q4_K_M footprint.
SpectralQuant 识别并保护行为敏感的权重方向，在相同 4.52 BPW Q4_K_M 占用下，为 Qwen3.5 0.8B 恢复了与 BF16 差距的 96.5%。
On heldout120, SpectralQuant Q4_K_M achieved loss of 2.9961 vs pure llama.cpp Q4_K_M's 3.4135 (BF16 reference 2.9809).
在 heldout120 评估中，SpectralQuant Q4_K_M 损失为 2.9961，而纯 llama.cpp Q4_K_M 为 3.4135（BF16 参考 2.9809）。
SpectralQuant Q4_K_M beat Unsloth's higher-bit quants (Q4_K_S, Q4_K_M, IQ4_NL, IQ4_XS) on heldout120 while using fewer bytes.
SpectralQuant Q4_K_M 在 heldout120 上击败了比特率更高的 Unsloth 量化版本（Q4_K_S、Q4_K_M、IQ4_NL、IQ4_XS），且占用内存更少。
The release is a standard GGUF with no mixed precision or dynamic sidecars, directly compatible with llama.cpp.
该版本为标准 GGUF，不含混合精度或动态侧载模块，可直接兼容 llama.cpp。

Open original