Singing Voice Separation

This page is an on-line demo of our recent research results on singing voice separation with recurrent inference and skip-filtering connections.

Manuscript and results can be found in our paper entitled "Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask" submitted to ICASSP 2018.

Code can be found here.

Paper can be found here.


This web-page serves as a demonstration for our work in singing voice source separation via a recurrent inference algorithm and skip-filtering connections. We propose a method to directly learn time-frequency masks from observed mixture magnitude spectra and optimize them according to their effectiveness in separating the singing voice. Our method does not require post-processing steps such as generalized Wiener filtering.

figure of the method

An illustration of our proposed method.

Audio demonstration contains files from our three best models:

  • Recurrent inference model using 3 iterations and a higher threshold, denoted as GRU-RIS-S.
  • The recurrent inference algorithm is not applied, denoted as GRU-NRI.
  • Recurrent inference model using 10 iterations and a low threshold, denoted as GRU-RIS-L.

Audio Demo on Evaluation subset:

Our best proposed method processing commercial music mixtures:


The research leading to these results has received funding from the European Union's H2020 Framework Programme (H2020-MSCA-ITN-2014) under grant agreement no 642685 MacSeNet.