Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:59
04 May 2020

In this paper, the use of word embeddings for the segments found in audio and real-time magnetic resonance imaging (rtMRI) videos is addressed. In this study, word embeddings are created to store and retrieve data efficiently, and their representation power of the original data is evaluated by the same-different word-discrimination task that is defined for both unimodal and cross-view settings. In order to create the word embeddings for two different data modalities independently for the unimodal setting, a Siamese neural network is designed. For the rtMRI videos, inputs to the network are generated through a correspondence autoencoder. In the cross-view setting, a recurrent neural network (RNN), which inputs data of different modalities, is trained to generate embeddings jointly for both data sources. The problem of objective function selection to the RNN is also investigated. The results on the USC-TIMIT rtMRI dataset outperform the conventional dynamic time warping (DTW) baseline by a clear margin. Outcomes demonstrate that the proposed word embeddings can be a step towards faster unimodal or cross-view query-by-example search tasks.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00