DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

Jingyu Lin (厦门大学); Yan Yan (Xiamen University); Hanzi Wang (Xiamen University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

The prosperity of deep learning contributes to the rapid progress of scene text detection. Among all the methods, segmentation-based methods have drawn extensive attention due to their superiority in detecting text instances of arbitrary shapes and extreme aspect ratios. However, the bottom-up methods are limited to the performance of their segmentation models. In this paper, we propose DPTNet (Dual-Path Transformer Network), a simple yet effective architecture to model the global and local information for the scene text detection task. Moreover, we propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues. In addition, a bi-directional interaction module across the two paths is developed to provide complementary clues in the channel and spatial dimensions. Our DPTNet achieves state-of-the-art results on other several standard benchmarks in terms of both detection accuracy and speed.

Tags:

Machine learning for image processing

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

Jingyu Lin (厦门大学); Yan Yan (Xiamen University); Hanzi Wang (Xiamen University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Multistage Spatial Context Models for Learned Image Compression

PRIME: 3D Human Pose and Body Shape Recovery with Perspective Projection

RETIFORMER: RETINEX-BASED ENHANCEMENT IN TRANSFORMER FOR LOW-LIGHT IMAGE

Join the IEEE Signal Processing Society