TrOMR:Transformer-based Polyphonic Optical Music Recognition
Yixuan Li (Hangzhou Netease cloud Music Technology Co., Ltd); Huaping Liu ( Hangzhou Netease cloud Music Technology Co., Ltd); Qiang Jin (Hangzhou Netease cloud Music Technology Co., Ltd); Miaomiao Cai (Hangzhou Netease cloud Music Technology Co., Ltd); Peng Li (NetEase Cloud Music)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.