LDCFORMER: INCORPORATING LEARNABLE DESCRIPTIVE CONVOLUTION TO VISION TRANSFORMER FOR FACE ANTI-SPOOFING
Pei Kai Huang, Cheng-Hsuan Chiang, Jun-Xiong Chong, Tzu-Hsien Chen, Hui-Yu Ni, Chiou-Ting Hsu
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Face anti-spoofing (FAS) aims to counter facial presentation attacks and heavily relies on identifying live/spoof discriminative features. While vision transformer (ViT) has shown promising potential in recent FAS methods, there remains a lack of studies examining the values of incorporating local descriptive feature learning with ViT. In this paper, we propose a novel LDCformer by incorporating Learnable Descriptive Convolution (LDC) with ViT and aim to learn distinguishing characteristics of FAS through modeling long-range dependency of locally descriptive features. In addition, we propose to extend LDC to a Decoupled Learnable Descriptive Convolution (Decoupled-LDC) for improving the optimization efficiency. With the new Decoupled-LDC, we further develop an extended model LDCformer$^D$ for FAS. Extensive experiments on FAS benchmarks show that LDCformer$^D$ outperforms previous methods on most of the protocols in both intra-domain and cross-domain testings.