Cross-Modal Deep Networks for Document Image Classification

Souhail Bakkali, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 10:22

27 Oct 2020

As a fundamental step of document related tasks, document classification has been widely adopted to various document image processing applications. Unlike the general image classification problem in the computer vision field, text document images contain both the visual cues and the corresponding text within the image. However, how to bridge these two different modalities and leverage textual and visual features to classify text document images remains challenging. In this paper, we present a cross-modal deep network that enables to capture both the textual content and the visual information included in document images. Thanks to the efficient jointly learning of text and image features, the proposed cross-modal approach shows its superiority to the state-of-the-art single-modal methods. In this paper, we propose to use NASNet-Large and Bert to extract image and text features respectively. Experimental results demonstrate that the proposed cross-modal approach achieves new state-of-the-art results for text document image classification on the benchmark Tobacco-3482 dataset, outperforming the current state-of-the-art method by 3.91% of classification accuracy.

Tags:

sps conference

icip 2020

Cross-Modal Deep Networks for Document Image Classification

Souhail Bakkali, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol

Value-Added Bundle(s) Including this Product

ICIP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society