Yannis Kalantidis presents Improving generalization for classification and retrieval tasks

On 2022-10-18 11:00:00 at G205, Karlovo náměstí 13, Praha 2
In this talk I will present recent works that generally aim at improving the
generalization of visual representations on both classification and retrieval
tasks. I will start from our most recent work on supervised pre-training with
generalization in mind (arXiv 2022). I will present an approach that aims to
improve the transferability of encoders learned in a supervised manner, while
retaining their state-of-the-art performance on the supervised training task,
and introduce two models: t-ReX that achieves a new state of the art for
transfer learning and outperforms top methods such as DINO and PAWS on IN1K,
and
t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing
better on transfer tasks. I will then present TLDR (TMLR 2022), a
dimensionality
reduction method for generic input spaces that is porting the recent
self-supervised learning framework of Barlow Twins to learning linear encoders
that outperform methods like PCA for classification and retrieval. Finally, I
will briefly present two other recent works on retrieval, FIRe (ICLR 2022), a
way of learning mid-level “super-features” that help for landmark
retrieval
and visual localization and Grappa (ECCV 2022), a method to efficiently adapt a
large pre-trained model to perform better on multiple retrieval tasks jointly
using only unlabelled data and with only a small decrease in the zero-shot
performance outside those tasks.


Bio:

Yannis Kalantidis is a senior research scientist at NAVER LABS Europe. He
received his PhD on Computer Science from the National Technical University of
Athens in 2014 and was a research scientist at Yahoo Research San Francisco and
Facebook AI in Menlo Park before joining NAVER LABS Europe in 2020. His
research
revolves around visual representation and multi-modal learning, self-supervised
learning, as well as adaptive systems. He is also passionate about bringing the
computer vision community closer to socially impactful tasks, datasets and
applications for worldwide impact and co-organized workshops like “Computer
Vision for Global Challenges” (CV4GC @ CVPR 2019), “Computer Vision for
Agriculture” (CV4A @ ICLR 2020) and “Wikipedia and Multi-Modal &
Multi-Lingual Research” (Wiki-M3L @ ICLR 2022) in top-tier AI venues.
Za obsah zodpovídá: Petr Pošík