OmniDirector:无需交叉配对数据的通用多镜头相机克隆框架
英文摘要
OmniDirector introduces a unified framework for camera motion cloning in video generation that uses grid motion videos to visually encode camera parameters, supporting diverse trajectories for multi-shot scenes. It trains on a large dataset of camera grid-video pairs, eliminating the need for cross-paired data. The framework integrates characters, actions, and cameras via multimodal diffusion transformers, providing director-level control. A hierarchical prompt expansion agent harmonizes different control signals to enhance camera motion and visual content descriptions. Extensive experiments demonstrate its superior performance and controllability over existing methods.
中文摘要
OmniDirector 提出了一种统一的相机运动克隆框架,利用网格运动视频直观编码相机参数,支持多镜头场景下的多样化轨迹。该方法在大规模相机网格-视频对数据集上训练,无需交叉配对数据。框架通过多模态扩散变换器整合角色、动作和相机,实现导演级控制;并采用分层提示扩展代理协调不同控制信号,增强相机运动和视觉内容描述。大量实验表明其性能和控制能力优于现有方法。
关键要点
Uses grid motion videos to visually encode camera parameters, enabling arbitrary trajectories without cross-paired data.
使用网格运动视频直观编码相机参数,无需交叉配对数据即可支持任意轨迹。
Trains on a large-scale camera grid-video dataset, avoiding the need for traditional cross-paired video supervision.
在大规模相机网格-视频对数据集上训练,避免了传统交叉配对视频监督的需求。
Integrates multimodal diffusion transformers for joint control over characters, actions, and cameras at director level.
集成了多模态扩散变换器,实现对角色、动作和相机的导演级联合控制。
Employs a hierarchical prompt expansion agent to fuse different control signals, improving camera motion and visual descriptions.
采用分层提示扩展代理融合不同控制信号,提升相机运动质量与视觉内容描述。
Demonstrates superior controllability and performance in extensive video generation experiments.
在大量视频生成实验中展现了卓越的可控性和性能。