Detail of the student project

Topic:Object detection for day-night visual localization
Department:Katedra kybernetiky
Supervisor:Assia Benbihi
Announce as:Diplomová práce, Bakalářská práce, Semestrální projekt
Description:Establishing pixel-level correspondences between images is a fundamental computer vision task common to 3D reconstruction, scene recognition, and visual localization. All these tasks are relevant to applications such as Google Photo Location, Google Live View, and most autonomous systems such as self-driving cars and autonomous robots.

The standard pipeline first detects keypoints in the images, derives a feature description of these points, then matches them based on their descriptor distance. The main challenge is to design features invariant to visual appearance variations such as illumination or viewpoint. Although there has been impressive work on the design of such invariant features, extreme variations such as day-night or season changes remain a challenge. Rather than investigating new features, we developed the first method that relies on visual cues that are more robust to visual changes, such as objects, to guide the feature matching. The method has been tested on urban scenes with windows as guiding objects and performed extremely well. The goal is now to detect other objects such as statues, doors, poles to rely on.

There are currently three contributions to be made (1 per student):
1. Develop a 'lazy' annotation tool.
2. Minimize the annotation burden.
3. Train a detection network on an unbalanced dataset.
All topics, if done well, will result in valuable open-source code/publications.

1. Develop a 'lazy' annotation tool: starting from existing work (e.g. [](labelme)), the first goal is to get the tool to run for specific datasets and labels (e.g windows and statues on SfM dataset). A second goal is to develop a label propagation technique by integrating the 3D geometry information of the image. To do this, we will take advantage of the 3D-models of city scenes to propagate annotations and reduce the annotation load.

2. Minimize the annotation burden: the goal is to define an efficient annotation protocol. We want to take advantage of the 3D information of a scene to annotate only a minimal set of images and propagate the annotation to the whole scene. This is equivalent to finding the minimal set of images that covers a scene with the constraint that they must be easy to annotate for the human operator.

3. Train a detection network on an unbalanced dataset: using the already gathered annotations, the goal is to train a detection network. The main challenge comes from the unbalance in the dataset distribution, which is still an open problem for neural network training. For example, there are many more windows than doors. Possible solutions to explore are loss weighting, prioritized data sampling, hierarchical detection, network branching.
Responsible person: Petr Pošík