YOLO-MAXVOD FOR REAL-TIME VIDEO OBJECT DETECTION
Pradeep Moturi, Mukund Khanna, Kunal Singh
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Video Object Detection (VOD) is one of the fundamental problems in video understanding with applications ranging from surveillance to autonomous driving. But many such real-world applications are unable to leverage the existing VOD models owing to their higher computational / integra- tion complexity which reduces inference speed. Single stage still image object detection models are naively used with- out any use of video information. In this paper, we present YOLOX based VOD model, YOLO-MaxVOD, which pro- vides a better trade-off between accuracy and inference time than the current real-time VOD solutions. Specifically, we propose a temporal fusion module which integrates within YOLOX architecture in order to take advantage of the high speed that YOLOX model offers. In our experimentation on Imagenet-VID dataset, we show that YOLO-MaxVOD shows 4.4-5.6% AP50 improvement over the baseline YOLOX, across different versions, with just 1-2 ms increase in latency on NVIDIA 1080Ti GPU.