Inference Everywhere: optimizing performance
Creativity Room
Steeve Morin of ZML will walk through how to build inference systems that run efficiently across a variety of hardware — from edge devices to servers — without sacrificing latency, throughput, or reliability. He will dive into architectural decisions, hardware constraints, trade-offs between generality and specialization, and strategies to make inference flexible, portable, and performant. Expect real insights from his work at ZML and lessons learned from putting inference into production at scale.
