SEMANTIC ASSOCIATION NETWORK FOR VIDEO CORPUS MOMENT RETRIEVAL
Dahyun Kim, Sunjae Yoon, Ji Woo Hong, Chang D Yoo
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:08:40
This paper considers Semantic Association Network (SAN) for Video Corpus Moment Retrieval (VCMR) which localizes temporal moment that best corresponds to the given text query in a corpus of videos. Collaborations among common semantics from multi-modal inputs are essential for effectively understanding video together with subtitle and text query. For this collaboration, SAN associates common semantics within the same modality (by Intra Semantic Association) and across different modalities (by Inter Semantic Association) with dedicated module referred to as Modality Semantic Association (MSA). SAN surpasses existing state-of-the-art performance on the TVR and DiDeMo benchmark datasets. Extensive ablation studies and qualitative analyses show the effectiveness of the proposed model.