We use cookies to improve your browsing experience, save your preferences and provide us with informations on how you use our website. For more information about cookies, please read our

We value your privacy

Manage cookies settings

Real-world perception for autonomous driving (AD) requires models that can learn from massive, uncurated, multi-sensor datasets with minimal, costly labeling. Self-supervised learning (SSL) is key to training foundation models in this regime. This talk moves beyond canonical SSL to present open-source foundation models from our team. We will highlight key examples, including Franca, our fully open-source foundation vision encoder (data, models, and code), and VaVIM/VaVAM, our open generative video and video-action models. We focus on how these models can be built on top of each other, creating a composable stack for perception. We'll show how this enables advanced applications like cross-sensor distillation, auto-labeling, and architecture repurposing, ultimately saving time, money, and improving performance.

This work exemplifies why open models are critical for developing the next generation of annotation-efficient and reliable autonomous systems.

Open & Repurposable: Foundation Models for the Automotive Industry