腾讯犀牛鸟精英人才计划发布三篇ICML 2026入选论文:高效蒸馏、长上下文推理与稀疏视角视频生成
英文摘要
The Tencent Rhino Bird Elite Talent Program released three papers accepted at ICML 2026. The first, Hybrid Policy Distillation (HPD), unifies forward and reverse KL divergence and on/off-policy data to improve LLM distillation stability, efficiency, and performance across math reasoning, dialogue, and code generation. The second, Many-Shot CoT-ICL, studies in-context learning with many chain-of-thought examples for reasoning tasks, finding that similarity-based retrieval fails and proposing CDS, a method that orders examples by conceptual progression to boost reasoning accuracy by 3.81% on average. The third, CamGeo, distills 3D geometry priors from a video-to-3D model into a diffusion backbone via trajectory and cross-frame consistency distillation and a coarse-to-fine curriculum, achieving stable performance gains in sparse camera-conditioned image-to-video generation.
中文摘要
腾讯犀牛鸟精英人才计划发布了三篇被ICML 2026录用的论文。第一篇《Hybrid Policy Distillation for LLMs》提出融合正向与反向KL散度及在/离策略数据的混合策略蒸馏(HPD),在数学推理、对话和代码生成等任务上一致提升了LLM蒸馏的优化稳定性、计算效率与最终性能。第二篇《Many-Shot CoT-ICL》研究推理任务中大量思维链示例的上下文学习,发现基于相似度的检索失效,并提出按概念递进排序示例的CDS方法,使数学和叙事推理平均提升3.81%。第三篇《CamGeo》通过关键帧轨迹蒸馏和跨帧一致性蒸馏将3D几何先验注入扩散主干,并采用三阶段课程学习,在稀疏相机约束的图像到视频生成中取得了稳定的性能提升。
关键要点
Hybrid Policy Distillation (HPD) combines forward and reverse KL with on/off-policy data to unify existing LLM knowledge distillation methods, improving stability and final performance on math reasoning, dialogue, and code.
混合策略蒸馏(HPD)融合正向与反向KL散度以及在/离策略数据,统一了现有LLM知识蒸馏方法,在数学推理、对话和代码任务上提升了稳定性和最终性能。
Many-Shot CoT-ICL shows standard ICL scaling laws fail for reasoning tasks; the proposed CDS method orders chain-of-thought examples by conceptual progression, yielding 3.81% average improvement.
Many-Shot CoT-ICL发现标准上下文学习扩展规律不适用于推理任务,提出的CDS方法按概念递进排列思维链示例,实现平均3.81%的提升。
CamGeo distills 3D geometry priors from a video-to-3D model into a diffusion backbone using trajectory consistency, cross-frame consistency, and a coarse-to-fine curriculum, achieving stable sparse-view video generation.
CamGeo通过轨迹一致性、跨帧一致性蒸馏和从粗到细的课程学习,将3D几何先验从视频转3D模型注入扩散主干,实现了稳定的稀疏视角视频生成。