Group-wise Co-salient Object Detection with Siamese Transformers via Brownian Distance Covariance Matching
Yang Wu (nuist); Hao Zhang (Nuist); lingyan liang (inspur); Yaqian Zhao (Inspur); Kaihua Zhang (Inspur,NUIST)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Co-salient object detection (CoSOD) aims to discover and segment foreground targets in a group of images with the same semantic category. Existing mainstream approaches often employ convolutional neural networks (CNNs) as feature extractors to learn the semantic-invariant features from a group of images. Despite the demonstrated success, there exist two limitations: 1) The CNNs have limited receptive fields that are unable to capture long-range dependent information, limiting their feature representation capability. 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. Specifically, the designed Siamese Transformer takes two groups of images as input for feature contrastive learning. Each group is processed by a Transformer branch with shared weights to capture the long-range interaction information. Besides, to model the complex non-linear interactions between these two branches, we further design a Brownian distance covariance (BDC) module that uses joint distribution to measure the inter- and intra-group semantic similarity. The BDC can be efficiently calculated in closed form that can fully characterize independence for effective feature contrastive learning. Extensive evaluations on the three largest and most challenging benchmark datasets (CoSal2015, CoCA, and CoSOD3k) have demonstrated the superiority of our method over state-of-the-art methods.