OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING
Zixuan Li, Yuan-Gen Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In recent years, leveraging Convolutional Neural Network (CNN) to optimize Transformer (called hybrid model) has achieved great progress in image inpainting. However, the slow growth of the effective receptive field of CNN in processing large-hole regions significantly limits the overall performance. To alleviate this problem, this paper proposes a new Transformer-CNN-based hybrid framework (termed PUT+) by introducing the fast Fourier convolution (FFC) into the CNN-based refinement network. The proposed framework introduces an improved Patch-based Vector Quantized Variational Auto-Encoder (P-VQVAE+). The encoder transforms the masked region into non-overlapping patch-based unquantized feature vectors as the input of Un-Quantized Transformer (UQ-Transformer). The decoder restores the masked region from the predicted quantized features output by the UQ-Transformer while maintaining the unmasked region unchanged. Many experimental results show that the proposed method outperforms the state-of-the-art by a large margin, especially for image inpainting with large masked areas.