Tencent Rhino Bird Elite Program Presents Three ICML 2026 Papers on Efficient Distillation, Long-Context Reasoning, and Sparse-View Video Generation
English summary
The Tencent Rhino Bird Elite Talent Program released three papers accepted at ICML 2026. The first, Hybrid Policy Distillation (HPD), unifies forward and reverse KL divergence and on/off-policy data to improve LLM distillation stability, efficiency, and performance across math reasoning, dialogue, and code generation. The second, Many-Shot CoT-ICL, studies in-context learning with many chain-of-thought examples for reasoning tasks, finding that similarity-based retrieval fails and proposing CDS, a method that orders examples by conceptual progression to boost reasoning accuracy by 3.81% on average. The third, CamGeo, distills 3D geometry priors from a video-to-3D model into a diffusion backbone via trajectory and cross-frame consistency distillation and a coarse-to-fine curriculum, achieving stable performance gains in sparse camera-conditioned image-to-video generation.
Chinese summary
腾讯犀牛鸟精英人才计划发布了三篇被ICML 2026录用的论文。第一篇《Hybrid Policy Distillation for LLMs》提出融合正向与反向KL散度及在/离策略数据的混合策略蒸馏(HPD),在数学推理、对话和代码生成等任务上一致提升了LLM蒸馏的优化稳定性、计算效率与最终性能。第二篇《Many-Shot CoT-ICL》研究推理任务中大量思维链示例的上下文学习,发现基于相似度的检索失效,并提出按概念递进排序示例的CDS方法,使数学和叙事推理平均提升3.81%。第三篇《CamGeo》通过关键帧轨迹蒸馏和跨帧一致性蒸馏将3D几何先验注入扩散主干,并采用三阶段课程学习,在稀疏相机约束的图像到视频生成中取得了稳定的性能提升。
Key points
Hybrid Policy Distillation (HPD) combines forward and reverse KL with on/off-policy data to unify existing LLM knowledge distillation methods, improving stability and final performance on math reasoning, dialogue, and code.
混合策略蒸馏(HPD)融合正向与反向KL散度以及在/离策略数据,统一了现有LLM知识蒸馏方法,在数学推理、对话和代码任务上提升了稳定性和最终性能。
Many-Shot CoT-ICL shows standard ICL scaling laws fail for reasoning tasks; the proposed CDS method orders chain-of-thought examples by conceptual progression, yielding 3.81% average improvement.
Many-Shot CoT-ICL发现标准上下文学习扩展规律不适用于推理任务,提出的CDS方法按概念递进排列思维链示例,实现平均3.81%的提升。
CamGeo distills 3D geometry priors from a video-to-3D model into a diffusion backbone using trajectory consistency, cross-frame consistency, and a coarse-to-fine curriculum, achieving stable sparse-view video generation.
CamGeo通过轨迹一致性、跨帧一致性蒸馏和从粗到细的课程学习,将3D几何先验从视频转3D模型注入扩散主干,实现了稳定的稀疏视角视频生成。