Konstantin Khokhlov presents Is Pseudo-Lidar needed for Monocular 3D Object detection?
On 2022-02-17 11:00:00 at https://feectu.zoom.us/j/98555944426
Is Pseudo-Lidar needed for Monocular 3D Object detection?
Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon
Recent progress in 3D object detection from single images leverages monocular
depth estimation as a way to produce 3D pointclouds, turning cameras into
pseudo-lidar sensors. These two-stage detectors improve with the accuracy of
the
intermediate depth estimation network, which can itself be improved without
manual labels via large-scale self-supervised learning. However, they tend to
suffer from overfitting more than end-to-end methods, are more complex, and the
gap with similar lidar-based detectors remains significant. In this work, we
propose an end-to-end, single stage, monocular 3D object detector, DD3D, that
can benefit from depth pre-training like pseudo-lidar methods, but without
their
limitations. Our architecture is designed for effective information transfer
between depth estimation and 3D detection, allowing us to scale with the amount
of unlabeled pre-training data. Our method achieves state-of-the-art results on
two challenging benchmarks, with 16.34% and 9.28% AP for Cars and Pedestrians
(respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.
Published at ICCV 2021
Url: https://arxiv.org/abs/2108.06417
See the page of reading groups
http://cmp.felk.cvut.cz/~toliageo/rg/index.html
Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon
Recent progress in 3D object detection from single images leverages monocular
depth estimation as a way to produce 3D pointclouds, turning cameras into
pseudo-lidar sensors. These two-stage detectors improve with the accuracy of
the
intermediate depth estimation network, which can itself be improved without
manual labels via large-scale self-supervised learning. However, they tend to
suffer from overfitting more than end-to-end methods, are more complex, and the
gap with similar lidar-based detectors remains significant. In this work, we
propose an end-to-end, single stage, monocular 3D object detector, DD3D, that
can benefit from depth pre-training like pseudo-lidar methods, but without
their
limitations. Our architecture is designed for effective information transfer
between depth estimation and 3D detection, allowing us to scale with the amount
of unlabeled pre-training data. Our method achieves state-of-the-art results on
two challenging benchmarks, with 16.34% and 9.28% AP for Cars and Pedestrians
(respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.
Published at ICCV 2021
Url: https://arxiv.org/abs/2108.06417
See the page of reading groups
http://cmp.felk.cvut.cz/~toliageo/rg/index.html