Yannis Kalantidis presents Improving Self-supervised Learning and Measuring Concept Generalization
On 2021-09-14 11:00:00 at G205, Karlovo náměstí 13, Praha 2
Hybrid attendance:
- physical attendance at G205 is possible if wearing a mask
- online attendance via zoom https://feectu.zoom.us/j/98921642351 (do not share
the link
on social media)
Contrastive self-supervised learning is a highly effective way of learning
representations that are useful for, i.e. generalise, to a wide range of
downstream vision tasks and datasets. In the first part of the talk, I will
present MoCHi, our recently published contrastive self-supervised learning
approach (NeurIPS 2020) that is able to learn transferable representations
faster by synthesising hard negatives. Training with MoCHi learns models that
are a great starting point for downstream tasks like object detection and
segmentation and datasets like PASCAL VOC or MS-COCO. But how “far” are
these datasets and how many of the downstream concepts were actually also
encountered during training? In the second part of the talk, I will present
ImageNet-CoG (ICCV 2021), a novel benchmark that aims at studying concept
generalization, i.e., the extent to which models trained on a set of (seen)
visual concepts can be used to recognize a new set of (unseen) concepts, in a
principled way. We argue that semantic relationships between seen and unseen
concepts affect generalization performance and propose a novel benchmark on the
extended ImageNet-21K dataset that can evaluate models trained on the
ubiquitous
ImageNet-1K dataset out-of-the-box. In our extensive study, we benchmark over
thirty publicly available models (spanning different architectures, modes of
supervision and regularization) under the prism of concept generalization, and
show how our benchmark is able to uncover a number of interesting insights.
Bio:
Yannis Kalantidis is a Senior Research Scientist at NAVER LABS Europe. He got
his PhD on large-scale visual similarity search and clustering from the
National
Technical University of Athens in 2014. He was a postdoc and research scientist
at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual
similarity search project at Flickr and participated in the creating of the
Visual Genome dataset. He then joined Facebook AI in Menlo Park in 2017 as a
research scientist at the video understanding team and his research interests
expanded to video understanding and deep learning architecture modelling. He
joined NAVER LABS Europe in March 2020. His research interests revolve around
visual representation learning and more specifically self-supervised learning,
continual and streaming learning, multi-modal learning, video understanding and
vision & language. He is further leading Computer Vision for Global Challenges
(cv4gc.org), an initiative to bring the computer vision community closer to
socially impactful tasks, datasets and applications for worldwide impact; CV4GC
has organized workshops at top venues like CVPR and ICLR.
- physical attendance at G205 is possible if wearing a mask
- online attendance via zoom https://feectu.zoom.us/j/98921642351 (do not share
the link
on social media)
Contrastive self-supervised learning is a highly effective way of learning
representations that are useful for, i.e. generalise, to a wide range of
downstream vision tasks and datasets. In the first part of the talk, I will
present MoCHi, our recently published contrastive self-supervised learning
approach (NeurIPS 2020) that is able to learn transferable representations
faster by synthesising hard negatives. Training with MoCHi learns models that
are a great starting point for downstream tasks like object detection and
segmentation and datasets like PASCAL VOC or MS-COCO. But how “far” are
these datasets and how many of the downstream concepts were actually also
encountered during training? In the second part of the talk, I will present
ImageNet-CoG (ICCV 2021), a novel benchmark that aims at studying concept
generalization, i.e., the extent to which models trained on a set of (seen)
visual concepts can be used to recognize a new set of (unseen) concepts, in a
principled way. We argue that semantic relationships between seen and unseen
concepts affect generalization performance and propose a novel benchmark on the
extended ImageNet-21K dataset that can evaluate models trained on the
ubiquitous
ImageNet-1K dataset out-of-the-box. In our extensive study, we benchmark over
thirty publicly available models (spanning different architectures, modes of
supervision and regularization) under the prism of concept generalization, and
show how our benchmark is able to uncover a number of interesting insights.
Bio:
Yannis Kalantidis is a Senior Research Scientist at NAVER LABS Europe. He got
his PhD on large-scale visual similarity search and clustering from the
National
Technical University of Athens in 2014. He was a postdoc and research scientist
at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual
similarity search project at Flickr and participated in the creating of the
Visual Genome dataset. He then joined Facebook AI in Menlo Park in 2017 as a
research scientist at the video understanding team and his research interests
expanded to video understanding and deep learning architecture modelling. He
joined NAVER LABS Europe in March 2020. His research interests revolve around
visual representation learning and more specifically self-supervised learning,
continual and streaming learning, multi-modal learning, video understanding and
vision & language. He is further leading Computer Vision for Global Challenges
(cv4gc.org), an initiative to bring the computer vision community closer to
socially impactful tasks, datasets and applications for worldwide impact; CV4GC
has organized workshops at top venues like CVPR and ICLR.