论文来源: HUGGINGFACE2026年7月1日重要度: 4/5

Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning

中文标题: 通过非对称互变分学习的多模态连续推理

英文摘要

This paper proposes Asymmetric Mutual Variational Learning (AMVL), a framework that addresses the train-inference mismatch in continuous latent reasoning for multimodal large language models. The mismatch arises because standard variational training forces the inference-time prior to mimic a posterior conditioned on ground-truth answers, causing answer leakage. AMVL uses a forward KL divergence to align the prior with the posterior and a novel reverse KL divergence to regularize the posterior, preventing collapse into inference-incompatible regions. The method is instantiated in a latent-integrated MLLM and evaluated on the BLINK benchmark, where it improves the average score by +10.83 and achieves gains of up to +32.00 on individual reasoning tasks, with analyses showing improved latent-space stability.

中文摘要

本文提出非对称互变分学习（AMVL）框架，解决多模态大语言模型连续潜在推理中的训练-推理不匹配问题。标准变分训练迫使推理时先验模仿以真实答案为条件的后验，导致答案泄露。AMVL利用正向KL散度使先验逼近后验，并通过新颖的反向KL散度正则化后验，防止其坍缩到推理不兼容区域。该方法集成到潜在增强的多模态大语言模型中，在BLINK基准上将平均得分提升+10.83，个别推理任务最高提升+32.00，分析证实了潜在空间稳定性的改善。

关键要点

Addresses train-inference mismatch in continuous latent reasoning for MLLMs, where standard variational training causes answer leakage.
解决多模态大语言模型连续潜在推理中的训练-推理不匹配问题，标准变分训练会导致答案泄露。
Introduces Asymmetric Mutual Variational Learning (AMVL) with a dual-KL objective: forward KL for prior alignment and reverse KL for posterior regularization.
提出非对称互变分学习（AMVL），采用双KL目标：正向KL对齐先验，反向KL正则化后验。
Theoretical analysis formalizes answer leakage as prior contamination and proves the dual-KL objective reduces it.
理论分析将答案泄露形式化为先验污染，并证明双KL目标可减少此污染。
AMVL-integrated MLLM achieves +10.83 average score gain on the complex BLINK benchmark and up to +32.00 on individual tasks, with improved latent-space stability.
AMVL增强的MLLM在复杂BLINK基准上平均得分提升+10.83，个别任务最高提升+32.00，且潜在空间稳定性增强。

打开原文