Martin Cífka presents RoMa: Robust Dense Feature Matching

On 2024-06-25 11:00:00 at
Feature matching is an important computer vision task that involves estimating
correspondences between two images of a 3D scene, and dense methods estimate
such correspondences. The aim is to learn a robust model, i.e., a model able to
match under challenging real-world changes. In this work, we propose such a
model, leveraging frozen pretrained features from the foundation model DINOv2.
Although these features are significantly more robust than local features
trained from scratch, they are inherently coarse. We therefore combine them
specialized ConvNet fine features, creating a precisely localizable feature
pyramid. To further improve robustness, we propose a tailored transformer match
decoder that predicts anchor probabilities, which enables it to express
multimodality. Finally, we propose an improved loss formulation through
regression-by-classification with subsequent robust regression. We conduct a
comprehensive set of experiments that show that our method, RoMa, achieves
significant gains, setting a new state-of-the-art. In particular, we achieve a
36% improvement on the extremely challenging WxBS benchmark.
Responsible person: Petr Pošík