Source-Filter Based Clustering for Monaural Blind Source Separation
DAFx 2009
Conference homepageAbstract
In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.
Keywords:
Clustering, Monaural Blind Sound Source Separation, NMF, Audio
Paper : SpGn09a.pdf
Slides : Talk_DAFx09.pdf
Matlab-Code:
An example implementation is available under the GNU General Public License:
download
 
Sound Examples with 2 active Sources
Prand | PMFCC | PNMF,Div | PNMF,Euc | Pref | |
Bass Guitar | 4.53 | 20.30 | 13.59 | 13.59 | 20.30 |
Bass Keyboard | 1.61 | 14.37 | 14.37 | 14.37 | 14.45 |
Bass Drums | 1.67 | 1.76 | 1.76 | 1.76 | 3.15 |
Guitar Keyboard | 3.19 | 2.85 | 1.47 | 4.83 | 5.88 |
Guitar Drums | 1.53 | 8.34 | 8.34 | 18.60 | 19.14 |
Keyboard Drums | 4.32 | 8.71 | 15.87 | 15.88 | 15.92 |
Remarks:
- Results are shown in dB.
- Mixtures are created with a dynamic difference of 0 dB.
- For such mixing scenarios PNMF,Euc leads generally to good clustering results, as mentioned in the paper.
- The mixture Bass Drums could be separated well except the base drum, which is separated and clustered to the bass output. The very low SER could be explained by the high energy of the base drum.
 
Sound Examples with 2 active Sources and dynamic differences
Prand | PMFCC | PNMF,Div | PNMF,Euc | Pref | ||
DD 0dB | Picollo | 3.50 | 3.63 | 9.60 | 9.63 | 10.06 |
DD 0dB | Horn | 3.53 | 3.89 | 9.71 | 9.71 | 10.15 |
DD 10dB | Picollo | 6.01 | 3.44 | 17.12 | 5.69 | 17.43 |
DD 10dB | Horn | -3.96 | -6.55 | 7.17 | -4.29 | 7.42 |
Remarks:
- DD stands for dynamic difference between the two input signals
- Results are shown in dB.
- Sound files can be found here.
- For a dynamic difference of 0 dB PNMF,Euc leads to slightly better separation results than PNMF,Div.
- For a dynamic difference of 10 dB PNMF,Div is significantly better than PNMF,Euc.
 
Sound Examples with 3 active Sources: Bass, Harp, and Piccolo
Prand | PMFCC | PNMF,Div | PNMF,Euc | PMFCC,Hier | PNMF,Div,Hier | PNMF,Euc,Hier | Pref | |
mean | 2.83 | 7.98 | 20.60 | 20.62 | 1.92 | 20.57 | 20.62 | 20.95 |
Bass | 4.94 | 18.75 | 18.69 | 18.69 | 2.66 | 18.72 | 18.69 | 18.85 |
Harp | 1.35 | 2.52 | 17.84 | 17.93 | 3.13 | 17.75 | 17.93 | 18.68 |
Piccolo | 2.22 | 2.67 | 25.26 | 25.25 | -0.01 | 25.26 | 25.25 | 25.32 |
Remarks:
- Results are shown in dB.
- Sound files can be found here.
 
Sound Examples with 3 active Sources: Castanets, Violoncello, and Flute
Prand | PMFCC | PNMF,Div | PNMF,Euc | PMFCC,Hier | PNMF,Div,Hier | PNMF,Euc,Hier | Pref | |
mean | 0.30 | 10.66 | 3.74 | 11.04 | 2.32 | 10.77 | 6.09 | 11.25 |
Castanets | -0.28 | 14.10 | 10.89 | 14.93 | 5.46 | 14.31 | 5.46 | 14.94 |
Violoncello | 1.74 | 8.63 | 0.04 | 8.85 | 1.43 | 8.65 | 8.64 | 9.20 |
Flute | -0.57 | 9.25 | 0.29 | 9.35 | 0.07 | 9.35 | 4.16 | 9.60 |
Remarks:
- Results are shown in dB.
- Sound files can be found here.
- For PNMF,Div the violoncello and the flute could not be separated.
- The hierarchical clustering PNMF,Div,Hier increases the separation quality by first separating an obvious source (the castanets). After that the remaining channels are clustered again into two other sources (violoncello and flute).
 
(C) by Martin Spiertz - 07. September 2009 - spiertz@ient.rwth-aachen.de