SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE
English summary
The paper introduces SpheRoPE, a zero-shot, training-free, and optimization-free framework for generating 360° panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. A complementary Semantic Distortion classifier-free guidance (CFG) steers geometry. The method is demonstrated on text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive results without any fine-tuning or inference-time optimization.
Chinese summary
该论文提出 SpheRoPE,一种零样本、无需训练和优化的框架,通过将球面先验直接注入预训练扩散变换器来生成360°全景图像和视频。Spherical RoPE 替代了标准旋转位置嵌入:低频通道被重新参数化为三维笛卡尔坐标以原生编码球面流形,高频通道则通过谐波量化强制严格周期性。附加的语义畸变无分类器引导(CFG)显式引导几何结构。该方法在 Flux.1、Flux.2 和 LTX-Video 等骨干网络上实现文本到全景生成,在不进行任何微调或推理时优化的情况下取得竞争性能。
Key points
Proposes a zero-shot, training-free and optimization-free framework for 360° panorama generation using pre-trained diffusion transformers.
提出一种零样本、无需训练和优化的框架,利用预训练扩散变换器进行360°全景生成。
Introduces Spherical RoPE, which replaces standard rotary position embeddings: low-frequency channels become 3D Cartesian coordinates for spherical manifold encoding, and high-frequency channels are harmonically quantized for exact periodicity.
提出 Spherical RoPE 替代标准旋转位置嵌入:低频通道转化为三维笛卡尔坐标以编码球面流形,高频通道经谐波量化实现严格周期性。
Adds Semantic Distortion classifier-free guidance to explicitly steer geometric consistency.
增加语义畸变无分类器引导以显式操控几何一致性。
Demonstrates generality on Flux.1, Flux.2, and LTX-Video backbones for text-to-panorama, achieving competitive performance without any retraining.
在 Flux.1、Flux.2 和 LTX-Video 骨干上展示文本到全景的通用性,无需重新训练即达到竞争性能。