Multi-Focus Guided Semantic Aggregation for Video Object Detection
Haihui Ye, Guangge Wang, Yang Lu, Yan Yan, Hanzi Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:19
For the task of video object detection, it is useful to aggregate semantic information from supporting frames. However, existing methods only focus on the current frame during the semantic aggregation, called Single-Focus methods. They neglect semantic information among supporting frames and deteriorate overall performance. In this work, we propose a method called Multi-Focus guided Semantic Aggregation (MFSA) for video object detection. We introduce a novel Relation Propagation Module (RPM) to capture and propagate proposal-to-proposal semantic dependencies. Moreover, we propose a simple yet effective Multi-Focus strategy to leverage captured dependencies to guide feature enhancement at a batch level. Aided by this strategy, our method can greatly improve aggregation efficiency of Single-Focus methods and enhance the accuracy of a per-frame detector significantly with negligible computing overhead. We perform extensive experiments on the ImageNet VID dataset. The results show that MFSA achieves excellent performance and a superior speed-accuracy tradeoff among the competing methods.