3D Aware Representation Learning

Evaluating Monocular Depth Perception in Large Vision Models

The world in which we interact, navigate, and conduct visual inference is fundamentally three-dimensional (3D). Furthermore, the objects and structures present in the world also have their own intrinsic 3D properties, e.g., they have distinct shape, size, material properties, and may be rigid or deformable. Despite this clear structure, the vast majority of current large-scale image understanding approaches mostly ignore explicit shape information.  The goal of this project is to develop new approaches for visual reasoning that are grounded in learned general-purpose 3D aware representations. If successful, the project will result in a new understanding of how 3D information is encoded in current self-supervised methods and will devise new approaches for injecting 3D priors into visual representation learning algorithms. 

Danier, Aygün, Li, Bilen, and Oisin Mac Aodha. DepthCues: Evaluating Monocular Depth Perception in Large Vision Models. arXiv preprint arXiv:2411.17385, 2024
 

"DepthCues: Evaluating Monocular Depth Perception in Large Vision Models" by Duolikun Danier at the VisualAI group in Oxford, Dec 2024