DIFFICULTY-AWARE DATA AUGMENTOR FOR SCENE TEXT RECOGNITION

Guanghao Meng (Tsinghua University); Tao Dai (Shenzhen University); Bin Chen (Harbin Institute of Technology, Shenzhen); Naiqi Li (Tsinghua-Berkeley Shenzhen Institute); Yong Jiang (Tsinghua University); Shu-Tao Xia (Tsinghua University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Deep neural network (DNN) based scene text recognition (STR) methods usually require a large amount of annotated data for training, which is time-consuming and cost-expensive in practice. To address this issue, many data augmentation methods have been developed to train recognizers by improving the diversity of training samples. However, most existing methods neglect the difficulty inherent in samples, and easily suffer from the problem of \textit{over-diversity}, i.e., the distribution of the augmented data significantly deviates from that of clean data. In this paper, we propose a novel difficulty-aware data augmentation framework for scene text recognition, which jointly considers the difficulty of samples and the strength of augmentations. Specifically, our framework first predicts the sample difficulty, followed by an adaptive data augmentation strategy. Furthermore, we build a more diverse set of augmentation methods for STR and integrate it into our augmentation framework. Extensive experiments on scene text recognition benchmarks show that our augmentation framework significantly improves the performance of recognizers.

Tags:

Pattern recognition and classification

DIFFICULTY-AWARE DATA AUGMENTOR FOR SCENE TEXT RECOGNITION

Guanghao Meng (Tsinghua University); Tao Dai (Shenzhen University); Bin Chen (Harbin Institute of Technology, Shenzhen); Naiqi Li (Tsinghua-Berkeley Shenzhen Institute); Yong Jiang (Tsinghua University); Shu-Tao Xia (Tsinghua University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

HalluAudio: Hallucinate frequency as concepts for few-shot audio classification

FedSD: A New Federated Learning Structure Used in Non-iid Data

BOOSTING TRANSFERABILITY OF ADVERSARIAL EXAMPLE VIA AN ENHANCED EULER'S METHOD

Join the IEEE Signal Processing Society