Skip to main content

Multi-Focus Guided Semantic Aggregation for Video Object Detection

Haihui Ye, Guangge Wang, Yang Lu, Yan Yan, Hanzi Wang

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:07:19
09 May 2022

For the task of video object detection, it is useful to aggregate semantic information from supporting frames. However, existing methods only focus on the current frame during the semantic aggregation, called Single-Focus methods. They neglect semantic information among supporting frames and deteriorate overall performance. In this work, we propose a method called Multi-Focus guided Semantic Aggregation (MFSA) for video object detection. We introduce a novel Relation Propagation Module (RPM) to capture and propagate proposal-to-proposal semantic dependencies. Moreover, we propose a simple yet effective Multi-Focus strategy to leverage captured dependencies to guide feature enhancement at a batch level. Aided by this strategy, our method can greatly improve aggregation efficiency of Single-Focus methods and enhance the accuracy of a per-frame detector significantly with negligible computing overhead. We perform extensive experiments on the ImageNet VID dataset. The results show that MFSA achieves excellent performance and a superior speed-accuracy tradeoff among the competing methods.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00