Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:08:05
08 May 2022

Speech separation dataset typically consists of hard and non-hard samples, and the former is minority and latter majority. The data imbalance problem biases the model towards non-hard samples and weakens the generalization capability. Given that the average separation performance is sufficiently good, improving hard samples may contribute more to back-end tasks. In this paper, we propose two methods to alleviate data imbalance in speech separation task, based on local and global hard sample mining. For the local, we propose weighted loss to compensate for hard samples by increasing their weights in each batch. For the global, we perform global hard sample mining and re-sample to increase the proportion of hard samples in the training set. Because hard sample mining using objective loss in dynamic mixing leads to local results, we propose an indirect method using speaker-specific parameters, based on the fact that pitch median difference and x-vector cosine distance of two speakers in a mixture are closely correlated with separation SI-SNRi. Experimental results show that both methods decrease the percentage of hard samples in the test set than using dynamic mixing only while keeping the average SI-SNRi comparable, and the global method shows more promising results than the local one.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00