Open & Repurposable: Foundation Models for the Automotive Industry
Real-world perception for autonomous driving (AD) requires models that can learn from massive, uncurated, multi-sensor datasets with minimal, costly labeling. Self-supervised learning (SSL) is key to training foundation models in this regime. This talk moves beyond canonical SSL to present open-source foundation models from our team. We will highlight key examples, including Franca, our fully open-source foundation vision encoder (data, models, and code), and VaVIM/VaVAM, our open generative video and video-action models. We focus on how these models can be built on top of each other, creating a composable stack for perception. We'll show how this enables advanced applications like cross-sensor distillation, auto-labeling, and architecture repurposing, ultimately saving time, money, and improving performance.
This work exemplifies why open models are critical for developing the next generation of annotation-efficient and reliable autonomous systems.
