Tomas Soucek presents Support-set bottlenecks for video-text representation learning

On 2021-01-12 11:00:00 at
Reading group on the work "Support-set bottlenecks for video-text
representation learning" by Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian
Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi (ICLR 2021)
presented by Tomas Soucek.

Video conference link:

Paper abstract: The dominant paradigm for learning video-text representations
-- noise contrastive learning -- increases the similarity of the
representations of pairs of samples that are known to be related, such as text
and video from the same sample, and pushes away the representations of all other
pairs. We posit
that this last behaviour is too strict, enforcing dissimilar representations
even for samples that are semantically-related -- for example, visually similar
videos or ones that share the same depicted action. In this paper, we propose a
novel method that alleviates this by leveraging a generative model to naturally
push these related samples together: each sample's caption must be
reconstructed as a weighted combination of other support samples' visual
representations. This simple idea ensures that representations are not
overly-specialized to
individual samples, are reusable across the dataset, and results in
representations that explicitly encode semantics shared between samples, unlike
noise contrastive learning. Our proposed method outperforms others by a large
margin on MSR-VTT, VATEX and ActivityNet, for video-to-text and text-to-video

Paper link:

Instructions for participants: The reading group studies the literature in the
field of pattern recognition and computer vision. At each meeting one or more
papers are prepared for presentation by a single person, the presenter. The
meetings are open to anyone, disregarding their background. It is assumed that
everyone attending the reading group has, at least briefly, read the paper –
not necessarily understanding everything. Attendants should preferably send
questions about the unclear parts to the speaker at least one day in advance.
During the presentation we aim to have a fruitful discussion, a critical
analysis of the paper, as well as brainstorming for creative extensions.

See the page of reading groups
Responsible person: Petr Pošík