FSSPOTTER: SPOTTING FACE-SWAPPED VIDEO BY SPATIAL AND TEMPORAL CLUES
Peng Chen, Jin Liu, Tao Liang, Guangzhi Zhou, Hongchao Gao, Jiao Dai, Jizhong Han
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 09:52
Recent advances in face generation and manipulation have enabled the creation of sophisticated face-swapped videos, also known as DeepFakes, which brings great potential threats to our society. Hence, it is crucial to develop effective approaches to distinguish them. Currently, face-swapped videos produced by existing methods are prone to exhibit some subtle spatial and temporal manipulated traces, which can be utilized as distinctive clues for face-swapped video detection. In this paper, we propose a unified framework, named FSSpotter, to explore rich spatial and temporal information in the video simultaneously. It consists of a Spatial Feature Extractor (SFE), which aims to discover spatial evidences within a single frame, and a Temporal Feature Aggregator (TFA), which is responsible for capturing temporal inconsistencies between frames. Moreover, a novel data processing strategy is adopted to highlight the inconsistencies of forged face with its surrounding regions. The evaluations on Deepfakes of FaceForensics++, DeepfakeTIMIT, UADFV and Celeb-DF datasets demonstrate that the proposed approach achieves better or comparable performance on AUC scores.