OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers
English summary
OrbitQuant is a post-training quantization method that makes image and video diffusion transformers data-agnostic by quantizing weights and activations in a normalized, rotated basis. It uses randomized permuted block-Hadamard rotation to concentrate coordinate distributions, allowing a single Lloyd-Max codebook to cover all timesteps, prompts, and layers. The rotation is absorbed into weights offline, leaving only a forward activation rotation at runtime, with zero per-modality tuning. Across FLUX.1, Z-Image-Turbo, Wan 2.1, and CogVideoX, OrbitQuant achieves state-of-the-art PTQ results at low-bit settings, including usable W2A4 generation quality for image diffusion transformers.
Chinese summary
OrbitQuant是一种后训练量化方法,通过在归一化旋转基中对权重和激活进行量化,实现数据无关的图像和视频扩散Transformer量化。它利用随机排列块状哈达玛旋转来集中坐标分布,使单个Lloyd-Max码本适用于所有时间步、提示和层。旋转被离线吸收到权重中,运行时仅保留一次前向激活旋转,无需针对每种模态单独调整。在FLUX.1、Z-Image-Turbo、Wan 2.1和CogVideoX上,该方法在多个低比特设置下达到了后训练量化的最先进水平,并将图像扩散Transformer的量化推至W2A4,且保持可用生成质量。
Key points
OrbitQuant eliminates data dependency by quantizing in a normalized, rotated basis (RPBH), making one codebook work across all timesteps and prompts.
OrbitQuant通过在归一化旋转基中量化消除数据依赖,使单一码本适用于所有时间步和提示。
It enables unified weight-activation quantization for both image and video DiTs without per-modality calibration.
该方法支持图像和视频扩散Transformer的统一权重量化,无需针对模态单独校准。
Sets new state-of-the-art PTQ on FLUX.1, Z-Image-Turbo, Wan 2.1, and CogVideoX at low-bit settings, and first achieves usable W2A4 quality for image DiTs.
在FLUX.1、Z-Image-Turbo、Wan 2.1和CogVideoX上以低比特设置达成最新后训练量化水平,并首次在图像扩散Transformer上实现可用的W2A4质量。
The rotation is absorbed into weights offline, so runtime cost is just a single forward activation rotation per layer.
旋转被离线吸收到权重中,运行时每层仅需一次前向激活旋转,成本极低。