A FEATURE PAIR FUSION AND HIERARCHICAL LEARNING FRAMEWORK FOR VIDEO RE-LOCALIZATION
Ruolin Wang, Yuan Zhou
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 08:16
Video re-localization has become an emerging research topic nowadays but existing methods still have many deficiencies. The existing deficiencies mainly lie in the interference caused by the irrelevant information in the input reference video and the ignorance of the correlation between query and reference video features. Therefore, we present a novel framework named Semantic Relevance Learning Network to address these shortcomings. First, we extract effective proposals from reference video as new inputs to reduce interference from irrelevant video frames. Second, two key components of our proposed model, the Attention-based Fusion Tensor and Semantic Relevance Measurement, jointly explore the intrinsic correlation between video feature pairs and finally get a score as measurement. To better evaluate our proposed model, we reorganize Thumos14 to obtain another new dataset for the video re-localization task. For both ActivityNet and Thumos14, our model achieves the best performance reported so far.