Spatial-Temporal Feature Aggregation Network For Video Object Detection

Zhu Chen, Weihai Li, Chi Fei, NengHai Yu, Bin Liu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:38

04 May 2020

Video object detection is a challenging problem in computer vision. In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. Specifically, we present a novel instance-level feature aggregation module as complementary to traditional pixel-level feature aggregation, in which we build a new movement estimation module to learn instance movements across frames. Then the Graph Convolutional Networks (GCNs) is applied to obtain temporal relation among instances over frames to implement instance-level feature aggregation. At last, we combine pixel-level and instance-level features by learnable soft weights to make use of their complementary information. Our framework is simple to implement and enables end-to-end training, which achieves state-of-art performance on the ImageNet VID dataset by extensive experiments.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Spatial-Temporal Feature Aggregation Network For Video Object Detection

Zhu Chen, Weihai Li, Chi Fei, NengHai Yu, Bin Liu

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society