Louis Montaut presents Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

On 2021-07-29 11:00:00 at https://feectu.zoom.us/j/92527897197

Reading group on the work "Making Sense of Vision and Touch: Learning
Multimodal
Representations for Contact-Rich Tasks", Michelle A. Lee , Yuke Zhu, Peter
Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei,
Animesh Garg , and Jeannette Bohg, IEEE TRANSACTIONS ON ROBOTICS, VOL. 36, NO.
3, JUNE 2020

Video conference link: https://feectu.zoom.us/j/92527897197
Instructions: http://cmp.felk.cvut.cz/~toliageo/rg/instructions.html

Paper abstract: Contact-rich manipulation tasks in unstructured environments
often require both haptic and visual feedback. It is nontrivial to manually
design a robot controller that combines these modalities, which have very
different characteristics. While deep reinforcement learning has shown success
in learning control policies for high-dimensional inputs, these algorithms are
generally intractable to train directly on real robots due to sample
complexity.
In this article, we use self-supervision to learn a compact and multimodal
representation of our sensory inputs, which can then be used to improve the
sample efficiency of our policy learning. Evaluating our method on a peg
insertion task, we show that it generalizes over varying geometries,
configurations, and clearances, while being robust to external perturbations.We
also systematically study different self-supervised learning objectives and
representation learning architectures. Results are presented in simulation and
on a physical robot

Note: background on VAEs is helpful to follow the work, but will not be covered
extensively. Material from an earlier reading group can be used to freshen it
up:
(1) https://cmp.felk.cvut.cz/~toliageo/rg/papers/slides/vae_patel.pdf
(2)
https://drive.google.com/file/d/11TQvz__s-APL-_o3Qpsv0WHxugq5-tlO/view?usp=sharing

Paper URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9043710

Instructions for participants: The reading group studies the literature in the
field of pattern recognition and computer vision. At each meeting one or more
papers are prepared for presentation by a single person, the presenter. The
meetings are open to anyone, disregarding their background. It is assumed that
everyone attending the reading group has, at least briefly, read the paper –
not necessarily understanding everything. Attendants should preferably send
questions about the unclear parts to the speaker at least one day in advance.
During the presentation we aim to have a fruitful discussion, a critical
analysis of the paper, as well as brainstorming for creative extensions.

See the page of reading groups
http://cmp.felk.cvut.cz/~toliageo/rg/index.html