Toward Universal Text-to-Music Retrieval

Seungheon Doh (KAIST); Minz Won (ByteDance); Keunwoo Choi (Gaudio Lab); Juhan Nam (KAIST)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

This paper introduces effective design choices for text-to-music retrieval systems. An ideal text-based retrieval system would support various input queries such as pre-defined tags, unseen tags, and sentence-level descriptions. In reality, most previous works mainly focused on a single query type (tag or sentence) which may not generalize to another input type. Hence, we review recent text-based music retrieval systems using our proposed benchmark in two main aspects: input text representation and training objectives. Our findings enable a universal text-to-music retrieval system that achieves comparable retrieval performances in both tag- and sentence-level inputs. Furthermore, the proposed multimodal representation generalizes to 9 different downstream music classification tasks. We present the code and demo online.

Tags:

Audio for multimedia and audio processing systems

Toward Universal Text-to-Music Retrieval

Seungheon Doh (KAIST); Minz Won (ByteDance); Keunwoo Choi (Gaudio Lab); Juhan Nam (KAIST)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Building Keyword Search System from End-to-End ASR Systems

MUSIC REARRANGEMENT USING HIERARCHICAL SEGMENTATION

Incorporating lip features into audio-visual multi-speaker DOA estimation by gated fusion

Join the IEEE Signal Processing Society