FIANET: VIDEO OBJECT DETECTION VIA JOINT FEATURE-LEVEL AND INSTANCE-LEVEL AGGREGATION

Zhengshuai Wang, Yali Li, Shengjin Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:25

07 Jul 2020

Video object detection task is challenging due to the non-rigid and rigid appearance deformations in videos. Most of the typical competitive methods are to enhance per-frame features through aggregating lots of previous and future frames. But feature-level aggregation isn’t robust to rigid deformations such as occlusion and rare postures. In this paper, we propose an online video object detection method with joint feature-level aggregation and instance-level aggregation network (FIANet). Besides feature-level aggregation, we design a spatial-temporal instance calibration module (STIC) to aggregate the instance as a whole, which can reduce the interference of local distorted and missed pixels. Joint feature-level and instance-level aggregation can work collaboratively to overcome different deformations. Only using less previous frames, our method can achieve 81.6% mAP with relatively high speed on ImageNet VID, which is state-of-the-art compared with causal and non-causal methods.

Tags:

icme 2020

sps conference

FIANET: VIDEO OBJECT DETECTION VIA JOINT FEATURE-LEVEL AND INSTANCE-LEVEL AGGREGATION

Zhengshuai Wang, Yali Li, Shengjin Wang

Value-Added Bundle(s) Including this Product

ICME 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society