Podrobnosti studentského projektu

Téma:Brzká vs. pozdní fúze v multimediálních datech - znovuotevření otázky
Katedra:Katedra kybernetiky
Vedoucí:Dr. Ing. Jan Zahálka
Vypsáno jako:Diplomová práce, Bakalářská práce, Semestrální projekt
Popis:When performing machine learning on multimodal data (i. e., data that have multiple information channels, such as images - they contain not only the visual information, but also metadata, possibly annotations, social media comments...), the often-used industry standard is to use late fusion - perform machine learning on each modality separately and then fuse the results (rankings) obtained in each (see e.g., [1]). This is in contrast to early fusion, which corresponds to putting (selected) features from individual modalities into one dataset, and then performing machine learning on that.

Is this still true in the case of modern data features? The groundbreaking work that established late fusion as the standard (e.g., [1]) is 15 years old. Since modern features can be often interpreted of semantic labels of the data, this standard might need revisiting - and that is the topic of the project/thesis.
Literatura:[1] C. G. M. Snoek, M. Worring, A. W. M. Smeulders: Early versus late fusion in semantic video analysis. Proc. ACM MM, pages 399-402, 2005. Link: https://dl.acm.org/doi/10.1145/1101149.1101236
Za obsah zodpovídá: Petr Pošík