Clark Labs Releases Ternary Sana 1.6B Text-to-Image Transformer, 8.6× Smaller with Near-FP16 Quality
English summary
Clark Labs has compressed the Sana 1.6B text-to-image transformer to ternary quantization (~1.85 bits per weight), achieving an 8.6× size reduction from 3.21 GB (FP16) to just 374 MB while retaining near-FP16 image quality. The model uses group-wise scales and maintains a small high-precision tail (~5% of parameters for conditioning and projection layers) to preserve important details. The packed ternary weights are provided alongside an unpacked bf16 version that is a drop-in replacement for diffusers. Released under the Apache-2.0 license, this compressed model enables efficient local deployment of Sana 1.6B on resource-constrained hardware.
Chinese summary
Clark Labs 将 Sana 1.6B 文生图 Transformer 压缩为三进制量化(约 1.85 比特/权重),体积从 3.21 GB(FP16)缩减至 374 MB,缩小 8.6 倍,同时保持接近 FP16 的图像生成质量。该模型采用分组量化缩放,并对约 5% 的调节和投影层参数保留高精度尾部以保护关键细节。除了打包的三进制权重,还提供了解包后的 bf16 版本,可作为 diffusers 的直接替代品。模型以 Apache-2.0 许可证发布,便于在资源受限的硬件上高效本地部署 Sana 1.6B。
Key points
Sana 1.6B text-to-image transformer compressed to ~1.85 bits/weight ternary quantization.
Sana 1.6B 文生图 Transformer 压缩为约 1.85 比特/权重的三进制量化。
Packed model size reduced by 8.6×: 374 MB vs. 3.21 GB FP16, with near-FP16 quality.
打包后模型体积缩小 8.6 倍:374 MB 对比 3.21 GB FP16,质量接近原始 FP16。
Uses group-wise scales and a 5% high-precision tail for conditioning and projection layers.
采用分组量化缩放,并对调节层和投影层保留 5% 的高精度尾部参数。
Unpacked bf16 version provided as drop-in diffusers replacement; Apache-2.0 license.
同时提供解包后的 bf16 版本,可直接替换 diffusers 使用;采用 Apache-2.0 许可证。