List |
Topic: | Interaktivní klasifikátory & multimediální data |
---|---|
Department: | Katedra kybernetiky |
Supervisor: | Dr. Ing. Jan Zahálka |
Announce as: | Diplomová práce, Bakalářská práce, Semestrální projekt |
Description: | Interactive multimodal learning (IML) has recently gained traction in the research of machine learning on multimedia data (images, video, text...). IML, closely related to relevance feedback and active learning, presents multimedia items (mainly images) to the user, the user provides feedback (marks relevant and not relevant items), the classifier collects this feedback, retrains itself, and produces a list of relevant suggestions (starting a new interaction round). In the interactive setting, each full interaction round (incl. training and fetching new relevant items) should complete in ~1 second, which is quite challenging on large datasets.
IML based approaches were very popular in 2000s, but in 2010s the interest in them faded - deep nets became the de facto standard, and IML approaches struggled with the dataset scale explosion (in 2000s, large datasets were ~10K items, nowadays we're talking 10-100M). Recent work [1][2] has brought IML back to the interactive realm even on 100M datasets. Since IML does not rely on costly precomputations and is able to work with dynamic datasets whilst performing very adequately on static datasets, it is an interesting alternative to tackling large-scale multimedia analytics and an exciting research avenue. The recent research has revolved around efficient data compression and handling. The actual interactive classifier responsible for the intelligence is the classic linear SVM (an algorithm from the 1990s). Is there a better alternative looking at the existing algorithms? Or could we possibly develop a new interactive multimodal classifier ourselves? |
Bibliography: | [1] Zahálka et al.: Blackthorn: Large-Scale Interactive Multimodal Learning. IEEE Transactions on Multimedia, 20 (3), pages 687-698, 2018.
[2] Khan et al.: Interactive Learning for Multimedia at Large. Proc. ECIR, pages 495-510, 2020. |