SIMULTANEOUSLY LEARNING ROBUST AUDIO EMBEDDINGS AND BALANCED HASH CODES FOR QUERY-BY-EXAMPLE
Anup Singh (Ghent University); Kris Demuynck (Ghent Universitty); Vipul Arora (IIT Kanpur)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Audio fingerprinting systems must efficiently and robustly identify
query snippets in an extensive database. To this end, state-of-the-art
systems use deep learning to generate compact audio fingerprints.
These systems deploy indexing methods, which quantize finger-
prints to hash codes in an unsupervised manner to expedite the
search. However, these methods generate imbalanced hash codes,
leading to their suboptimal performance. Therefore, we propose
a self-supervised learning framework to compute fingerprints and
balanced hash codes in an end-to-end manner to achieve both fast
and accurate retrieval performance. We model hash codes as a
balanced clustering process, which we regard as an instance of the
optimal transport problem. Experimental results indicate that the
proposed approach improves retrieval efficiency while preserving
high accuracy, particularly at high distortion levels, compared to the
competing methods. Moreover, our system is efficient and scalable
in computational load and memory storage.