Real-time AI: building 100x faster inference
Inference speed is the bottleneck holding back the next generation of AI applications. Agentic workflows, real-time creation, and test-time compute scaling all demand much faster token generation at runtime.
This talk reveals Kog's path to 100x faster AI inference through systematic GPU optimization and novel Transformer architectural innovations that circumvent hardware constraints. When AI operates at 10,000 or 100,000 tokens per second, computers stop being static rule-execution engines and become smart, real-time adaptive platforms. Programming in natural language becomes practical. Complex games and applications generate themselves dynamically. Agents and Deep Research are instant.
Gaël, founder and CEO of Kog, will share the unique research insights and GPU engineering breakthroughs used to build the future of real-time AI computing