Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning
中文标题: 通过非对称互变分学习的多模态连续推理
英文摘要
This paper proposes Asymmetric Mutual Variational Learning (AMVL), a framework that addresses the train-inference mismatch in continuous latent reasoning for multimodal large language models. The mismatch arises because standard variational training forces the inference-time prior to mimic a posterior conditioned on ground-truth answers, causing answer leakage. AMVL uses a forward KL divergence to align the prior with the posterior and a novel reverse KL divergence to regularize the posterior, preventing collapse into inference-incompatible regions. The method is instantiated in a latent-integrated MLLM and evaluated on the BLINK benchmark, where it improves the average score by +10.83 and achieves gains of up to +32.00 on individual reasoning tasks, with analyses showing improved latent-space stability.
中文摘要
本文提出非对称互变分学习(AMVL)框架,解决多模态大语言模型连续潜在推理中的训练-推理不匹配问题。标准变分训练迫使推理时先验模仿以真实答案为条件的后验,导致答案泄露。AMVL利用正向KL散度使先验逼近后验,并通过新颖的反向KL散度正则化后验,防止其坍缩到推理不兼容区域。该方法集成到潜在增强的多模态大语言模型中,在BLINK基准上将平均得分提升+10.83,个别推理任务最高提升+32.00,分析证实了潜在空间稳定性的改善。
关键要点
Addresses train-inference mismatch in continuous latent reasoning for MLLMs, where standard variational training causes answer leakage.
解决多模态大语言模型连续潜在推理中的训练-推理不匹配问题,标准变分训练会导致答案泄露。
Introduces Asymmetric Mutual Variational Learning (AMVL) with a dual-KL objective: forward KL for prior alignment and reverse KL for posterior regularization.
提出非对称互变分学习(AMVL),采用双KL目标:正向KL对齐先验,反向KL正则化后验。
Theoretical analysis formalizes answer leakage as prior contamination and proves the dual-KL objective reduces it.
理论分析将答案泄露形式化为先验污染,并证明双KL目标可减少此污染。
AMVL-integrated MLLM achieves +10.83 average score gain on the complex BLINK benchmark and up to +32.00 on individual tasks, with improved latent-space stability.
AMVL增强的MLLM在复杂BLINK基准上平均得分提升+10.83,个别任务最高提升+32.00,且潜在空间稳定性增强。