Skip to main content

DISTILLING KNOWLEDGE OF BIDIRECTIONAL LANGUAGE MODEL FOR SCENE TEXT RECOGNITION

Shota Orihashi, Yoshihiro Yamazaki, Mihiro Uchida, Akihiko Takashima, Ryo Masumura

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Lecture 09 Oct 2023

This paper proposes a knowledge distillation method for an external bidirectional language model trained by masked language modeling to achieve high accuracy in scene text recognition. In Asian languages such as Japanese, it is necessary to perform text recognition in units of multiple words or sentences rather than individual words because words are not separated by spaces, and so high-level linguistic knowledge is needed to recognize text correctly. To enhance linguistic knowledge, several methods that use an external language model have been proposed, but these methods fail to consider future context well in performing text recognition because they revise the text candidates yielded by autoregressive text recognition models, which consider mainly past context. To overcome this deficiency, our key idea is to enhance a text recognition model by utilizing knowledge of an external bidirectional language model trained by masked language modeling, which reflects not only past but also future context. So as to actively consider future context in text recognition, our proposed method introduces a distillation loss term that makes the output probability of the text recognition model closer to that of the bidirectional language model. Experiments on Japanese scene text recognition demonstrate the effectiveness of the proposed method.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00