Detail of the student project

Topic:Using single view depth and normal prediction for fast camera pose estimation
Department: Skupina vizuálního rozpoznávání
Supervisor:Torsten Sattler
Announce as:Diplomová práce, Semestrální projekt
Description:Visual localization is the problem of estimating the position and
orientation from which an image was taken in a scene. Solving the
localization problem is essential for a wide range of applications,
including self-driving cars and other robots as well as Augmented /
Mixed / Virtual Reality systems.
A standard approach to approach the visual localization problem is to
represent the scene through a 3D model. Given a test image, 2D-3D
correspondences are then established between feature positions in the
image and 3D points in the model. These 2D-3D matches can then be used
for camera pose estimation, e.g., inside a RANSAC loop. In very large or
complex scenes, the largest fraction of 2D-3D matches is incorrect,
complicating the pose estimation process. While outlier filtering
techniques exist, they are either complicated to implement [Zeisl et
al., 2015] or have quadratic computational complexity [Camposeco et al.,
2017], making them unsuitable for practical applications.
Given knowledge about the distance between the camera and the 3D point
in a 2D-3D match, an estimate of the point’s surface normal in the local
coordinate system of the test image, one can compute the full camera
pose from a single 2D-3D match. Traditionally, such information has not
been available. Yet, recent advances in modern deep neural networks
allow the prediction of depth [Ranftl. et al., 2019] and surface normals
[Zhang et al., 2019] from a single image. However, such predictions are
typically noisy. Thus, the resulting camera pose estimate will not be
too accurate. The goal of this thesis is to explore using the predicted
poses as a rough estimate of the true camera pose in the form of an
outlier filter that detects incorrect matches. The remaining matches can
then be used by classical pose estimation approaches. The focus of the
thesis is on developing and implementing an efficient outlier filter.
Bibliography:[Zeisl et al., Camera Pose Voting for Large-Scale Image-Based
Localization, ICCV 2015]
[Camposeco et al., Toroidal Constraints for Two Point Localization Under
High Outlier Ratios, CVPR 2017]
[Ranftl et al., Towards Robust Monocular Depth Estimation: Mixing
Datasets for Zero-shot Cross-dataset Transfer, arXiv:1907.01341]
[Zhang et al., Pattern-Affinitive Propagation across Depth, Surface
Normal and Semantic Segmentation, CVPR 2019]
Responsible person: Petr Pošík