论文来源: HUGGINGFACE2026年7月2日重要度: 4/5

Representation Distribution Matching for One-Step Visual Generation

中文标题: 表征分布匹配用于单步视觉生成

英文摘要

The paper formalizes Representation Distribution Matching (RDM) for one-step image generation, analyzing two design axes: distribution comparison method and representation space. They find that classical MMD becomes a strong scalable objective when estimated with large batches (>2048) and that any single representation can be gamed, motivating a battery of encoders and the SW_r14 metric. Their improved RDM (iRDM) sets a new one-step state of the art on ImageNet (SW_r14 1.30) and is preferred by PickScore over the prior best on 71.2% of samples. The recipe also post-trains the four-step FLUX.2 into a one-step generator that surpasses its four-step version on GenEval (0.826 vs 0.794) and PickScore (22.76 vs 22.58) in 90 H200 GPU-hours.

中文摘要

本文形式化了用于单步图像生成的表征分布匹配（RDM）范式，分析了分布比较方法和表征空间两个设计维度。他们发现经典MMD在使用大批量（>2048）估计时成为强大且可扩展的目标，而任何单一表征都可能被欺骗，因此需要一组编码器及SW_r14评估指标。其改进版iRDM在ImageNet上实现了单步生成新最优（SW_r14 = 1.30），并在71.2%的样本上被PickScore优先于此前最佳单步生成器。该方法还将四步FLUX.2模型后训练为单步生成器，在GenEval（0.826 vs 0.794）和PickScore（22.76 vs 22.58）上超越四步版本，仅需90个H200 GPU小时。

关键要点

Classical MMD becomes a strong and scalable one-step training objective when estimated with large batch sizes (optimum above 2048).
经典最大均值差异（MMD）在使用超大 batch size（最优值>2048）估计时，成为强大且可扩展的单步训练目标。
Using any single representation can be gamed, so the method matches against a balanced battery of encoders and evaluates with SW_r14, a Sliced-Wasserstein distance over 14 encoders that resists gaming.
任何单一表征都可能被欺骗，因此该方法针对一组平衡的编码器进行匹配，并用SW_r14（14个编码器上的切片Wasserstein距离）进行评估以抵抗欺骗。
iRDM sets one-step state of the art on ImageNet (SW_r14 1.30) and is preferred by the human-preference proxy PickScore over the prior best one-step generator on 71.2% of samples.
iRDM在ImageNet上达到单步最先进水平（SW_r14 1.30），并在71.2%的样本上被人类偏好代理PickScore优先于此前的单步最佳生成器。
Post-training the four-step FLUX.2 with the same recipe yields a one-step generator surpassing the four-step version on GenEval (0.826 vs 0.794) and PickScore (22.76 vs 22.58) in only 90 H200 GPU-hours.
使用相同方案对四步FLUX.2进行后训练，得到一个单步生成器，在GenEval（0.826 vs 0.794）和PickScore（22.76 vs 22.58）上超越四步版本，仅需90个H200 GPU小时。

打开原文