Detail of the student project

Topic:Brzká vs. pozdní fúze v multimediálních datech - znovuotevření otázky
Department:Katedra kybernetiky
Supervisor:Dr. Ing. Jan Zahálka
Announce as:Diplomová práce, Bakalářská práce, Semestrální projekt
Description:When performing machine learning on multimodal data (i. e., data that have multiple information channels, such as images - they contain not only the visual information, but also metadata, possibly annotations, social media comments...), the often-used industry standard is to use late fusion - perform machine learning on each modality separately and then fuse the results (rankings) obtained in each (see e.g., [1]). This is in contrast to early fusion, which corresponds to putting (selected) features from individual modalities into one dataset, and then performing machine learning on that.

Is this still true in the case of modern data features? The groundbreaking work that established late fusion as the standard (e.g., [1]) is 15 years old. Since modern features can be often interpreted of semantic labels of the data, this standard might need revisiting - and that is the topic of the project/thesis.
Bibliography:[1] C. G. M. Snoek, M. Worring, A. W. M. Smeulders: Early versus late fusion in semantic video analysis. Proc. ACM MM, pages 399-402, 2005. Link:
Responsible person: Petr Pošík