CTTSR: A Hybrid CNN-Transformer Network for Scene Text Image Super-Resolution
Kaiwei Dai (Central South University); Nan Kang (Central South University); Li Kuang (Central South University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The accuracy of scene text recognition has been significantly improved, which can be attributed to the development of deep learning. However, the blurring and low-resolution text images usually lead to unsatisfactory results in text recognition. Several researchers design super-resolution models that adopt convolutional neural networks (CNNs) to relieve the image blurring, while these models are limited to the receptive field of the convolution kernel and fail to extract the long-distance semantic relations of text images enough. In this paper, we propose a CNN-Transformer Text Super Resolution Network (CTTSR) to capture the semantic features of text images by the multi-head attention mechanism of the transformer. Furthermore, we propose the text position loss to optimize the network and make the text regions of images more effectively detectable. Experimental results demonstrate that our model can improve the quality of images and outperform the existing methods in text recognition tasks.