Nicu Sebe presents Cross-modal generation and understanding of multimodal content

On 2025-12-04 13:30:00 at G205, Karlovo náměstí 13, Praha 2

ABSTRACT
In the first part of the presentation, we will present our work on video
generation without annotations or prior object-specific information. Trained on
videos of similar objects (e.g. faces, bodies), our method generalizes across
the category. Building on this, we introduce a Learnable Game Engine (LGE),
trained from monocular annotated videos, that maintains scene and object states
and renders environments from controllable viewpoints. Like a game engine, it
simulates physics and logic, allowing users to control the game play or use a
director mode to guide agents via high-level language and goals, enabled by
learned game AI. The second part will investigate the safety and fairness of
the current generative models. While most of the existing research focuses on
detecting closed sets of biases defined a priori, we tackle the challenge of
open-set bias detection in text-to-image generative models. For this we proposed
OpenBias, a new pipeline that agnostically identifies and quantifies the
severity of biases without access to any precompiled set. We study the behavior
of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated
before. Via quantitative experiments, we demonstrate that OpenBias agrees with
current closed-set bias detection methods and human judgement.