Zdeněk Straka presents PreCNet: Next-frame video prediction based on predictive coding
On 2023-03-21 11:00:00 at G205, Karlovo náměstí 13, Praha 2
In the seminar, this article will be presented: Straka, Z.; Svoboda, T. &
Hoffmann, M. (2023), 'PreCNet: Next-frame video prediction based on predictive
coding', IEEE Transactions on Neural Networks and Learning Systems.
https://doi.org/10.1109/TNNLS.2023.3240857
Predictive coding, currently a highly influential theory in neuroscience, has
not been widely adopted in machine learning yet. In this work, we transform the
seminal model of Rao and Ballard (1999) into a modern deep learning framework
while remaining maximally faithful to the original schema. The resulting
network
we propose (PreCNet) is tested on a widely used next-frame video prediction
benchmark, which consists of images from an urban environment recorded from a
car-mounted camera, and achieves state-of-the-art performance. Performance on
all measures (MSE, PSNR, and SSIM) was further improved when a larger training
set (2M images from BDD100k) pointed to the limitations of the KITTI training
set. This work demonstrates that an architecture carefully based on a
neuroscience model, without being explicitly tailored to the task at hand, can
exhibit exceptional performance.
Hoffmann, M. (2023), 'PreCNet: Next-frame video prediction based on predictive
coding', IEEE Transactions on Neural Networks and Learning Systems.
https://doi.org/10.1109/TNNLS.2023.3240857
Predictive coding, currently a highly influential theory in neuroscience, has
not been widely adopted in machine learning yet. In this work, we transform the
seminal model of Rao and Ballard (1999) into a modern deep learning framework
while remaining maximally faithful to the original schema. The resulting
network
we propose (PreCNet) is tested on a widely used next-frame video prediction
benchmark, which consists of images from an urban environment recorded from a
car-mounted camera, and achieves state-of-the-art performance. Performance on
all measures (MSE, PSNR, and SSIM) was further improved when a larger training
set (2M images from BDD100k) pointed to the limitations of the KITTI training
set. This work demonstrates that an architecture carefully based on a
neuroscience model, without being explicitly tailored to the task at hand, can
exhibit exceptional performance.
External www: https://ieeexplore.ieee.org/document/10040532