Cracking Product taxonomy at scale: A multimodal, zero-shot approach

Central room
Faster

Modern retail platforms rely on complex product taxonomies to power sourcing, merchandising, logistics, pricing, and search. But Veepee’s taxonomy : ∼1,400 nodes, evolving constantly, non-MECE and highly imbalanced, makes full automation challenging.

In this talk, Anis Gandoura (Principal AI, Veepee) presents how his team moved from a zero-shot CLIP baseline to a production-ready multimodal classifier that reaches or exceeds human-level accuracy.

He will walk through:

- how zero-shot CLIP performs on real, high-noise retail data,

- how contextualizing category embeddings boosts performance,

- and how a multimodal LLM + “label book” achieved 94% Top-1 accuracy with no heavy training.

The result: a flexible, low-maintenance system that scales with taxonomy evolution, cuts manual corrections, reduces time-to-list, and improves downstream data quality.