DTTR: DETECTING TEXT WITH TRANSFORMERS
Jing Yang (Hunan University); Zhiqiang You (Hunan University); Zhiwei Zhong (Hunan University); peng liu (Guangdong university of technology); Langqi Mei (npic); Shenguang Huang (Ningbo Port Information Communication Co., Ltd.)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recently, most transformer-based approaches have achieved considerable success on vision tasks, even better than those with convolution neural networks (CNNs). In this paper, we present a novel transformer-based model, named detecting text with transformers (DTTR), for scene text detection. In DTTR, a CNN backbone extracts local connectivity features and a transformer decoder captures global context information from a scene text, effectively. In addition, we propose a dynamic scale fusion (DSF) module that can fuse multis-cale feature maps dynamically, thus significantly improving the scale robustness and rendering powerful representations for subsequent decoding. Experimental results show that DTTR achieves 0.5% H-mean improvements and 20.0% faster in inference speed than the SOTA model with a backbone of ResNet-50 on MMOCR. Code will be released at: https://github.com/ahsdx/DTTR.