Multi-Focus Guided Semantic Aggregation for Video Object Detection

Haihui Ye, Guangge Wang, Yang Lu, Yan Yan, Hanzi Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:19

09 May 2022

For the task of video object detection, it is useful to aggregate semantic information from supporting frames. However, existing methods only focus on the current frame during the semantic aggregation, called Single-Focus methods. They neglect semantic information among supporting frames and deteriorate overall performance. In this work, we propose a method called Multi-Focus guided Semantic Aggregation (MFSA) for video object detection. We introduce a novel Relation Propagation Module (RPM) to capture and propagate proposal-to-proposal semantic dependencies. Moreover, we propose a simple yet effective Multi-Focus strategy to leverage captured dependencies to guide feature enhancement at a batch level. Aided by this strategy, our method can greatly improve aggregation efficiency of Single-Focus methods and enhance the accuracy of a per-frame detector significantly with negligible computing overhead. We perform extensive experiments on the ImageNet VID dataset. The results show that MFSA achieves excellent performance and a superior speed-accuracy tradeoff among the competing methods.

Tags:

semantic aggregation

video object detection

relation propagation

Multi-Focus Guided Semantic Aggregation for Video Object Detection

Haihui Ye, Guangge Wang, Yang Lu, Yan Yan, Hanzi Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

YOLO-MAXVOD FOR REAL-TIME VIDEO OBJECT DETECTION

DF-Net: Diversity-Focused Network for Video Object Detection

Join the IEEE Signal Processing Society