WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

Loading / 加载中

英文摘要

WorldDirector is a controllable video world model framework that explicitly decouples semantic motion orchestration from visual generation. It uses a large language model to coordinate 3D object trajectories and camera movements, then employs these trajectories as control signals for a video generator. This design ensures strict physical consistency, stable appearance, and persistent memory of dynamic objects—maintaining their exact visual identity even when they re-enter a scene after long occlusions. The framework supports unrestrained viewpoint exploration and can synthesize complex, extended events with high controllability.

中文摘要

WorldDirector 是一个可控的视频世界模型框架，首次明确将语义运动编排与视觉生成过程解耦。它利用大语言模型协调三维物体轨迹与相机运动，并将这些编排好的轨迹作为视频生成的控制信号。这样的设计保证了严格的物理一致性和外观稳定性，并实现持久的动态物体记忆——即使物体长时间离开视野后再次进入场景，其精确的视觉身份也能得以保留。该框架支持不受限制的视角探索，能够合成高可控性的复杂、长时间事件。

关键要点

Decouples semantic motion orchestration (via LLM-driven 3D trajectories) from visual generation, preserving physical logic and appearance stability.

将语义运动编排（通过大模型驱动的3D轨迹）与视觉生成解耦，确保物理逻辑和外观稳定性。

Achieves persistent dynamic object memory, maintaining exact visual identities after objects leave and re-enter the scene.

实现持久动态物体记忆，物体离开视野再返回后仍保持精确的视觉身份。

Enables unrestricted viewpoint exploration through coordinated camera movement control.

通过协调的相机运动控制，支持不受限制的视角探索。

Demonstrates synthesis of complex, extended events with high controllability, overcoming limitations of previous world models.

展示了高可控的复杂长时间事件合成，突破了以往世界模型的局限。