List |

Topic: | Using single view depth and normal prediction for fast camera pose estimation |
---|---|

Department: | Katedra kybernetiky |

Supervisor: | Torsten Sattler |

Announce as: | Diplomová práce, Semestrální projekt |

Description: | Visual localization is the problem of estimating the position and
orientation from which an image was taken in a scene. Solving the localization problem is essential for a wide range of applications, including self-driving cars and other robots as well as Augmented / Mixed / Virtual Reality systems. A standard approach to approach the visual localization problem is to represent the scene through a 3D model. Given a test image, 2D-3D correspondences are then established between feature positions in the image and 3D points in the model. These 2D-3D matches can then be used for camera pose estimation, e.g., inside a RANSAC loop. In very large or complex scenes, the largest fraction of 2D-3D matches is incorrect, complicating the pose estimation process. While outlier filtering techniques exist, they are either complicated to implement [Zeisl et al., 2015] or have quadratic computational complexity [Camposeco et al., 2017], making them unsuitable for practical applications. Given knowledge about the distance between the camera and the 3D point in a 2D-3D match, an estimate of the point’s surface normal in the local coordinate system of the test image, one can compute the full camera pose from a single 2D-3D match. Traditionally, such information has not been available. Yet, recent advances in modern deep neural networks allow the prediction of depth [Ranftl. et al., 2019] and surface normals [Zhang et al., 2019] from a single image. However, such predictions are typically noisy. Thus, the resulting camera pose estimate will not be too accurate. The goal of this thesis is to explore using the predicted poses as a rough estimate of the true camera pose in the form of an outlier filter that detects incorrect matches. The remaining matches can then be used by classical pose estimation approaches. The focus of the thesis is on developing and implementing an efficient outlier filter. |

Bibliography: | [Zeisl et al., Camera Pose Voting for Large-Scale Image-Based
Localization, ICCV 2015] [Camposeco et al., Toroidal Constraints for Two Point Localization Under High Outlier Ratios, CVPR 2017] [Ranftl et al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, arXiv:1907.01341] [Zhang et al., Pattern-Affinitive Propagation across Depth, Surface Normal and Semantic Segmentation, CVPR 2019] |