Qwen-Image-2.0-RL 技术报告

英文摘要

Researchers present Qwen-Image-2.0-RL, a method that enhances image generation and editing diffusion models via reinforcement learning. It combines RLHF and on-policy distillation, fine-tunes vision-language models, and constructs task-specific reward models within a scalable RL training framework. A hybrid classifier-free guidance strategy and per-category reward weight calibration further boost performance. The approach achieves notable gains in visual quality, instruction following, and editing accuracy across multiple evaluation metrics.

中文摘要

研究人员提出了Qwen-Image-2.0-RL，一种通过强化学习增强图像生成和编辑扩散模型的方法。该方法结合了RLHF和在策略蒸馏，微调视觉语言模型，并在可扩展的RL训练框架中构建任务特定的奖励模型。混合无分类器引导策略和按类别奖励权重校准进一步提升了性能，在多个评估指标上实现了视觉质量、指令遵循和编辑准确性的显著提升。

关键要点

Uses reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to refine a diffusion model's image generation and editing capabilities.
利用人类反馈强化学习（RLHF）和在策略蒸馏（OPD）来优化扩散模型的图像生成和编辑能力。
Fine-tunes vision-language models to build task-specific reward models and supports scalable RL training.
微调视觉语言模型以构建任务特定的奖励模型，并支持可扩展的强化学习训练。
Incorporates hybrid classifier-free guidance and per-category reward weight calibration for balanced optimization.
采用混合无分类器引导和按类别奖励权重校准，实现均衡优化。
Demonstrates substantial improvements in visual quality, instruction adherence, and editing accuracy on evaluated tasks.
在评估任务中展示了视觉质量、指令遵循和编辑准确性的显著提升。

打开原文