VLKP:VIDEO INSTANCE SEGMENTATION WITH VISUAL-LINGUISTIC KNOWLEDGE

ruixiang chen (Zhejiang University of Technology); Sheng Liu (Zhejiang University of Technology); Junhao Chen (Zhejiang University of Technology); BIngnan Guo (Zhejiang University of Technology); Feng Zhang (Zhejiang University of Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Most video instance segmentation(VIS) models only focused on visual knowledge and ignored intrinsic linguistic knowledge. Based on the observation that incorporating linguistic knowledge can significantly improve the model’s contextual understanding of the video, in this paper, we present a Video Instance Segmentation approach with VisualLinguistic Knowledge Prompts(VLKP), a novel paradigm for offline video instance Segmentation. Specifically, we propose the visual-linguistic knowledge prompt training strategy, which incorporates linguistic features with visual features to obtain Visual-Linguistic features and processes it instead of traditional visual features. In addition, we design a new temporal shift encoder to convey information between frames and enhance the temporal sensitivity of the model. On two widely adopted VIS benchmarks, i.e., YouTube-VIS-2019 and YouTube-VIS-2021, VLKP with ResNet-50 obtains state-of-the-art results,e.g.,47.7 AP on YouTube-VIS-2019 and 42.0 AP on YouTube-VIS-2021. Code is available at https:// github.com/ruixiangC/VLKP.

Tags:

Machine learning for image processing