Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

1 item

MEDIUM LARGE LANGUAGE MODELSJun 11, 2026

GELATO: The Frozen Towers Approach to Multimodal Embeddings

GELATO investigates extending a strong pre-trained text embedding model to handle multimodal data rather than training a new model from scratch. The text encoder remains frozen (the 'text tower') while separate modality-specific encoders are trained to align images, audio, or other modalities into the same embedding space. This 'frozen towers' strategy leverages existing text understanding and avoids retraining the core model. The blog post outlines the method and its motivation for efficient multimodal representation learning.