Filipe Gama presents Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods
On 2025-09-25 11:00:00 at G205, Karlovo náměstí 13, Praha 2
Presentation of: Gama, F.; Misar, M.; Navara, L.; Popescu, S. T. & Hoffmann, M.
(2025), 'Automatic infant 2D pose estimation from videos: comparing seven deep
neural network methods', Behavior Research Methods 57(280).
Abstract:
Automatic markerless estimation of infant posture and motion from ordinary
videos carries great potential for movement studies "in the wild", facilitating
understanding of motor development and massively increasing the chances of early
diagnosis of disorders. There is rapid development of human pose estimation
methods in computer vision thanks to advances in deep learning and machine
learning. However, these methods are trained on datasets that feature adults in
different contexts. This work tests and compares seven popular methods
(AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose,
OpenPose, and ViTPose) on videos of infants in supine position and in more
complex settings. Surprisingly, all methods except DeepLabCut and MediaPipe have
competitive performance without additional finetuning, with ViTPose performing
best. Next to standard performance metrics (average precision and recall), we
introduce errors expressed in the neck-mid-hip (torso length) ratio and
additionally study missing and redundant detections, and the reliability of the
internal confidence ratings of the different methods, which are relevant for
downstream tasks. Among the networks with competitive performance, only
AlphaPose could run close to real time (27 fps) on our machine. We provide
documented Docker containers or instructions for all the methods we used, our
analysis scripts, and the processed data at
https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.
(2025), 'Automatic infant 2D pose estimation from videos: comparing seven deep
neural network methods', Behavior Research Methods 57(280).
Abstract:
Automatic markerless estimation of infant posture and motion from ordinary
videos carries great potential for movement studies "in the wild", facilitating
understanding of motor development and massively increasing the chances of early
diagnosis of disorders. There is rapid development of human pose estimation
methods in computer vision thanks to advances in deep learning and machine
learning. However, these methods are trained on datasets that feature adults in
different contexts. This work tests and compares seven popular methods
(AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose,
OpenPose, and ViTPose) on videos of infants in supine position and in more
complex settings. Surprisingly, all methods except DeepLabCut and MediaPipe have
competitive performance without additional finetuning, with ViTPose performing
best. Next to standard performance metrics (average precision and recall), we
introduce errors expressed in the neck-mid-hip (torso length) ratio and
additionally study missing and redundant detections, and the reliability of the
internal confidence ratings of the different methods, which are relevant for
downstream tasks. Among the networks with competitive performance, only
AlphaPose could run close to real time (27 fps) on our machine. We provide
documented Docker containers or instructions for all the methods we used, our
analysis scripts, and the processed data at
https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.
External www: https://rdcu.be/eFwac