Discrete Diffusion Language Models for Interactive Radiology Report Drafting
中文标题: 用于交互式放射学报告起草的离散扩散语言模型
英文摘要
The paper adapts a mixture-of-experts discrete diffusion language model, DiffusionGemma-26B, and benchmarks it against the autoregressive Gemma-4-26B on medical visual question answering. Using the same LoRA fine-tuning recipe, the diffusion model matches or exceeds AR performance, scored by a verbosity-robust LLM judge, while decoding 3.5–4.4× faster. The fine-tuned model (3.8B active parameters) is competitive with frontier vision-language models. Crucially, the diffusion paradigm enables any-order infill: a radiologist can correct parts of a report and the model generates the text between them, a capability inherent to diffusion that autoregressive models cannot easily replicate. This suits real-world radiology reports, which often vary in style and completeness across clinicians and institutions.
中文摘要
该论文将混合专家离散扩散语言模型DiffusionGemma-26B适配到医学领域,并以相同的LoRA微调方案在医学视觉问答基准上对比了自回归模型Gemma-4-26B。经冗长鲁棒的LLM法官评分,扩散模型匹配或超越了自回归性能,且解码速度加快3.5-4.4倍。微调后的模型(激活参数3.8B)与前沿视觉语言模型竞争激烈。更重要的是,扩散范式支持任意顺序的文本填充:放射科医生可修正报告片段,模型能生成夹在中间的文字,这一能力是自回归模型难以提供的,适合风格与完整度不一致的真实放射学报告。
关键要点
DiffusionGemma-26B, a discrete diffusion MoE language model, is fine-tuned with LoRA for medical VQA.
DiffusionGemma-26B(一种离散扩散混合专家语言模型)通过LoRA微调用于医学视觉问答。
The diffusion model matches or surpasses autoregressive Gemma-4-26B performance while decoding 3.5–4.4× faster, and its 3.8B active parameter version competes with frontier VLMs.
扩散模型性能匹配或超越自回归Gemma-4-26B且解码快3.5-4.4倍,其3.8B激活参数版本与前沿视觉语言模型竞争。
Diffusion enables any-order infill, allowing radiologists to fix fragments and have the model complete the gaps, a capability autoregressive models lack.
扩散模型支持任意顺序的文本填充,允许放射科医生修正片段并由模型补全间隙,这是自回归模型不具备的能力。
The approach addresses real-world report inconsistency across clinicians and institutions.
该方法解决了临床医生和机构之间报告风格与完整度不一致的现实问题。