Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement
中文标题: 视觉验证实现推理时策略引导与自主策略改进
英文摘要
The paper proposes VERITAS, a generator-verifier framework for generalist robot policies. It pairs a pre-trained robot policy (generator) with a gradient-free visual verifier that evaluates actions at inference time, enabling policy steering without additional training. Verified rollouts are then used as supervision for offline fine-tuning, yielding consistent performance gains. The approach matches the efficiency of expert demonstrations but requires no human intervention, highlighting inference-time verification as a scalable mechanism for self-improvement in real-world deployment.
中文摘要
该论文提出VERITAS框架,一种用于通用机器人策略的生成器-验证器架构。它将预训练的机器人策略(生成器)与无需梯度的视觉验证器配对,在推理时评估动作,无需额外训练即可引导策略行为。验证后的运行轨迹随后用于离线微调,带来稳定的性能提升。该方法达到了与专家演示相当的效率,但无需人工干预,凸显推理时验证是真实部署中可扩展的自主改进手段。
关键要点
VERITAS is a generator-verifier framework that combines a pre-trained robot policy with a gradient-free visual verifier for inference-time steering.
VERITAS 是一个生成器-验证器框架,将预训练机器人策略与无需梯度的视觉验证器结合,实现推理时行为引导。
Inference-time verification consistently outperforms vanilla generalist policies without requiring additional training or demonstration data.
推理时验证始终优于原始的通用策略,且无需额外训练或演示数据。
Fine-tuning policies on verified self-generated trajectories yields performance gains comparable to expert demonstrations, with no human interventions.
使用经过验证的自主生成轨迹进行微调可获得与专家演示相当的性能提升,且完全无需人工干预。