Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:26
09 May 2022

The spread of fake face videos leads to severe social concerns, which promotes the development of detection methods for these videos. Existing patch-based methods focus on local regions to find forgery common clues, while ignoring the important role of the global information. In this paper, a novel spatiotemporal network is proposed which can better utilize the implicit complementary advantages of global and local information. Specifically, the spatial module consists of the global information stream and local information stream extracted from patches selected by attention layers. Then, the fusion features of these two streams are fed into the temporal module to further capture temporal clues. Besides, a regularization loss is designed to guide the selection of local information and the extraction of fusion temporal information with the reference to the global information. Extensive experiments on different datasets demonstrate the superiority of our framework.