IPAB Workshop - 15/1/26

Title: What’s Holding Video Understanding Back? A Tale of Two Bottlenecks

Abstract: Video Understanding technology has made huge progress over the last few years. It went from struggling to recognize actions from a predefined set to being able to generate captions for a video with a reasonable degree of accuracy. However, zero-shot tasks remain very challenging, where even the most sophisticated versions of Gemini or chatGPT struggle to count steps, understand hand gestures or recognise subtle emotions — much more than in other modalities like image or text. In this talk I will explore the core issues of video understanding technology and propose a few solutions and directions to overcome them.