Image Data Augmentation With Unpaired Image-To-Image Camera Model Translation
Chi Fa Foo, Stefan Winkler
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:22
Video object detection is a challenging task due to the presence of appearance deterioration in video frames. Recently, feature aggregation based methods which aggregate context information from object proposals in different frames to improve the performance, have dominated the task. However, much invalid information may be introduced during feature aggregation since frames and proposals are usually selected at random. in this paper, we propose a guided sampling based feature aggregation network (GSFA) to perform more effective feature aggregation. Specifically, we introduce a frame-level sampling module and a proposal-level sampling module to sample informative frames and proposals from a video sequence adaptively. As a result, the proposed GSFA can effectively aggregate context information from the semantically rich frames and proposals to boost the performance. Experimental results on the ImageNet VID dataset show the proposed GSFA achieves the state-of-the-art performance of 84.8% mAP with ResNet-101 and 85.8% mAP with ResNeXt-101.