Evaluating Monocular Depth Perception in Large Vision Models Team Members Duolikun Danier, Changjian Li ,Hakan Bilen, and Oisin Mac Aodha Project Summary The world in which we interact, navigate, and conduct visual inference is fundamentally three-dimensional (3D). Furthermore, the objects and structures present in the world also have their own intrinsic 3D properties, e.g., they have distinct shape, size, material properties, and may be rigid or deformable. Despite this clear structure, the vast majority of current large-scale image understanding approaches mostly ignore explicit shape information. The goal of this project is to develop new approaches for visual reasoning that are grounded in learned general-purpose 3D aware representations. If successful, the project will result in a new understanding of how 3D information is encoded in current self-supervised methods and will devise new approaches for injecting 3D priors into visual representation learning algorithms. Publications Danier, Aygün, Li, Bilen, and Oisin Mac Aodha. DepthCues: Evaluating Monocular Depth Perception in Large Vision Models. arXiv preprint arXiv:2411.17385, 2024 Talks "DepthCues: Evaluating Monocular Depth Perception in Large Vision Models" by Duolikun Danier at the VisualAI group in Oxford, Dec 2024 This article was published on 2025-01-08