Yash Patel presents Variational Autoencoders with application to unsupervised representation learning

On 2020-05-14 11:00:00
Video conference link for the online reading group:
Instructions: http://cmp.felk.cvut.cz/~toliageo/rg/index2.html

Variational Autoencoders (VAEs) have emerged as one of the most
popular approaches to unsupervised learning of complicated distributions.
VAEs are appealing because they are built on top of standard function
approximators (neural networks), and can be trained with stochastic gradient
descent. We will cover some necessary background on VAEs and then discus
a recent variant that aims to learn Transformation Equivariant Representations
(TER), called Autoencoding Variational Transformations (AVT). Formally,
given transformed images, the AVT seeks to train the networks by maximizing
the mutual information between the transformations and representations.
This ensures the resultant TERs of individual images contain the intrinsic
information about their visual structures that would equivary under various
transformations in a generalized nonlinear case. Technically, we show that
the resultant optimization problem can be efficiently solved by maximizing
a variational lower-bound of the mutual information.

Material for the reading group:

Main paper:
[1] Qi et al, ICCV 2019, AVT: Unsupervised Learning of Transformation
Equivariant Representations by Autoencoding Variational Transformations

Tutorial for VAE:
[2] C. Doersch, 2016, "Tutorial on Variational Autoencoders"

Extended version of AVT:
[3] Qi et al. arxiv 2019, Learning Generalized Transformation Equivariant
Representations via Autoencoding Transformations (extended version)

Additional material for VAE:
[4] Kingma and Welling, Technical Report, 2019, "An introduction to Variational
Autoencoders" https://arxiv.org/pdf/1906.02691.pdf

The main paper we are studying is [1]. For background on VAEs, we will mostly
rely on [2] which is aimed at audience that might not have background on
variatonal Bayesian methods and has a bias towards a computer vision audience.

Instructions for participants: The reading group studies the literature in the
field of pattern recognition and computer vision. At each meeting one or more
papers are prepared for presentation by a single person, the presenter. The
meetings are open to anyone, disregarding their background. It is assumed that
everyone attending the reading group has, at least briefly, read the paper –
not necessarily understanding everything. Attendants should preferably send
questions about the unclear parts to the speaker at least one day in advance.
During the presentation we aim to have a fruitful discussion, a critical
analysis of the paper, as well as brainstorming for creative extensions.

See the page of reading groups
Responsible person: Petr Pošík