WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory
中文标题: WorldDirector:构建具备持久动态记忆的可控世界模拟器
英文摘要
WorldDirector is a controllable video world model framework that explicitly decouples semantic motion orchestration from visual generation. It uses a large language model to coordinate 3D object trajectories and camera movements, then employs these trajectories as control signals for a video generator. This design ensures strict physical consistency, stable appearance, and persistent memory of dynamic objects—maintaining their exact visual identity even when they re-enter a scene after long occlusions. The framework supports unrestrained viewpoint exploration and can synthesize complex, extended events with high controllability.
中文摘要
WorldDirector 是一个可控的视频世界模型框架,首次明确将语义运动编排与视觉生成过程解耦。它利用大语言模型协调三维物体轨迹与相机运动,并将这些编排好的轨迹作为视频生成的控制信号。这样的设计保证了严格的物理一致性和外观稳定性,并实现持久的动态物体记忆——即使物体长时间离开视野后再次进入场景,其精确的视觉身份也能得以保留。该框架支持不受限制的视角探索,能够合成高可控性的复杂、长时间事件。
关键要点
Decouples semantic motion orchestration (via LLM-driven 3D trajectories) from visual generation, preserving physical logic and appearance stability.
将语义运动编排(通过大模型驱动的3D轨迹)与视觉生成解耦,确保物理逻辑和外观稳定性。
Achieves persistent dynamic object memory, maintaining exact visual identities after objects leave and re-enter the scene.
实现持久动态物体记忆,物体离开视野再返回后仍保持精确的视觉身份。
Enables unrestricted viewpoint exploration through coordinated camera movement control.
通过协调的相机运动控制,支持不受限制的视角探索。
Demonstrates synthesis of complex, extended events with high controllability, overcoming limitations of previous world models.
展示了高可控的复杂长时间事件合成,突破了以往世界模型的局限。