Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:22
06 Oct 2022

Video object detection is a challenging task due to the presence of appearance deterioration in video frames. Recently, feature aggregation based methods which aggregate context information from object proposals in different frames to improve the performance, have dominated the task. However, much invalid information may be introduced during feature aggregation since frames and proposals are usually selected at random. in this paper, we propose a guided sampling based feature aggregation network (GSFA) to perform more effective feature aggregation. Specifically, we introduce a frame-level sampling module and a proposal-level sampling module to sample informative frames and proposals from a video sequence adaptively. As a result, the proposed GSFA can effectively aggregate context information from the semantically rich frames and proposals to boost the performance. Experimental results on the ImageNet VID dataset show the proposed GSFA achieves the state-of-the-art performance of 84.8% mAP with ResNet-101 and 85.8% mAP with ResNeXt-101.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00