Image Data Augmentation With Unpaired Image-To-Image Camera Model Translation

Chi Fa Foo, Stefan Winkler

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:22

06 Oct 2022

Video object detection is a challenging task due to the presence of appearance deterioration in video frames. Recently, feature aggregation based methods which aggregate context information from object proposals in different frames to improve the performance, have dominated the task. However, much invalid information may be introduced during feature aggregation since frames and proposals are usually selected at random. in this paper, we propose a guided sampling based feature aggregation network (GSFA) to perform more effective feature aggregation. Specifically, we introduce a frame-level sampling module and a proposal-level sampling module to sample informative frames and proposals from a video sequence adaptively. As a result, the proposed GSFA can effectively aggregate context information from the semantically rich frames and proposals to boost the performance. Experimental results on the ImageNet VID dataset show the proposed GSFA achieves the state-of-the-art performance of 84.8% mAP with ResNet-101 and 85.8% mAP with ResNeXt-101.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip