SQA: STRONG GUIDANCE QUERY WITH SELF-SELECTED ATTENTION FOR HUMAN-OBJECT INTERACTION DETECTION
Feng Zhang (Zhejiang University of Technology); Sheng Liu (Zhejiang University of Technology); BIngnan Guo (Zhejiang University of Technology); ruixiang chen (Zhejiang University of Technology); Junhao Chen (Zhejiang University of Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The attention mechanism in Transformer-based HOI models plays important role in the comprehension of human and object interaction. However, most previous Transformer-based models ignore the guidance on the query and attention, which leads to a poor understanding of interaction behaviour. In this paper, we propose a strong guidance query model with self-selected attention called SQA. The model includes two novel modules, query feature extraction (QFE) and attention mask construction (AMC). QFE builds strong guidance query by concatenating guidance features. The strong guidance query effectively improves the ability to capture both human and object relationships. Meanwhile, AMC establishes distinctive attention masks for each query. The masks allow each query to contact self-selected particular attention regions. It facilitates directing query to obtain more accurate information during cross-attention even in the rare-sample case. We evaluate our SQA model on the mainstream HICO-DET and V-COCO datasets and it achieves a state-of-the-art result. The codes are available at https://github.com/nmbzdwss/SQA.