Multi-Block Diffusion Language Models

Loading / 加载中

English summary

This paper proposes Multi-Block Diffusion Language Models (MBD-LMs), extending block diffusion LMs to decode multiple consecutive blocks in parallel for inter-block parallelism. To align training with multi-block inference, they introduce Multi-block Teacher Forcing (MultiTF), which trains on bounded noise-groups conditioned on clean prefixes with randomized noise-schedulers. A Block Buffer decoding algorithm preserves KV-cache reuse and static input shapes, translating parallelism into wall-clock speedup. On MBD-LLaDA2-Mini, average tokens per forward pass increase from 3.47 to 6.19 while accuracy rises from 79.95% to 81.03%. Combined with DMax, the model reaches 9.34 TPF with only a 1.02% accuracy drop on math and code benchmarks.

Chinese summary

该论文提出多块扩散语言模型（MBD-LMs），将块扩散语言模型扩展为并行解码多个连续块，以实现块间并行。为弥合训练与多块推理的差距，提出多块教师强制（MultiTF），在干净前缀条件下训练有限噪声组，并采用随机噪声调度。Block Buffer解码算法保留了KV缓存复用和静态输入形状，将增加的并行性转化为实际加速。在MBD-LLaDA2-Mini上，平均每次前向生成令牌数从3.47提升到6.19，准确率从79.95%升至81.03%。结合DMax后，TPF达到9.34，仅在数学和代码基准上准确率下降1.02%。

Key points

Extends block diffusion LMs to multi-block decoding, enabling parallel processing of consecutive blocks for higher throughput.

将块扩散语言模型扩展为多块并行解码，显著提升吞吐量。

Introduces Multi-block Teacher Forcing (MultiTF) to align training with the heterogeneous noise patterns of multi-block inference.

提出多块教师强制（MultiTF），使训练状态匹配多块推理的异构噪声模式。

Designs a Block Buffer decoding algorithm that maintains KV-cache reuse and static shapes, yielding wall-clock acceleration.

设计了Block Buffer解码算法，保持KV缓存复用和静态输入形状，实现实际速度提升。

MBD-LLaDA2-Mini increases average TPF from 3.47 to 6.19 and accuracy from 79.95% to 81.03%; with DMax, TPF hits 9.34 with only 1.02% accuracy loss on math/code tasks.

MBD-LLaDA2-Mini将平均每步生成令牌数从3.47提升至6.19，准确率从79.95%升至81.03%；结合DMax，TPF达9.34，仅在数理代码任务上准确率下降1.02%。