DETECTION TRANSFORMER WITH DIVERSIFIED OBJECT QUERIES
Tharsan Senthivel, Ngoc-Son Vu, Boris Borzic
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper addresses the issue of redundancy in object queries in the fully end-to-end transformer-based detector (DETR). We demonstrate that the redundancy stems from the positional dependence of object queries, the multiple interactions between object queries and feature maps through self- and cross-attention, and the unstable Hungarian matching algorithm. To maintain the expressiveness of the object queries, we propose a novel loss that reduces the pairwise correlation of learned object query features across different decoder layers. This novel plug-and-play approach, called DOQ-DETR, can be applied to various DETR variants. Our experiments on the large-scale COCO2017 dataset demonstrate that the proposed training scheme improves various state-of-the-art DETR-like models, including Deformable-, Conditional-, and DAB-DETRs.