Detection of Clicks in Audio Signals Using Warped Linear Prediction
Paulo A. A. Esquef, Matti Karjalainen, and Vesa Välimäki
Companion webpage with sound examples to the homonym paper presented at the 14th IEEE International Digital Signal Processing Conference, (DSP 2002, Santorini, Greece, July 2002)
1. Original Audio Signals
The test signals are high quality musical excerpts of orchestral pieces. The signal excerpts were sampled at 44100 Hz and converted to .WAV files (monaural, 16 bits, signed).
S1.wav: Excerpt of Ravel's Alborada del Graciozo
S2.wav: Excerpt of Scriabin's Fire's Poem.
The rich orchestration in the first excerpt is a good way to evaluate the performance of the detection method under "non-stationary" conditions. The challenge of the second excerpt is to deal with its huge dynamics as well as the signal modeling of the loud brass session, which usually produces periodic pulses in the excitation signal thus, placing a problem to the threshold-based detection scheme.
2. Corrupted Versions
Two corrupting noise signals with different numbers and types impulsive disturbances were used to artificially corrupt the original signals.
S1c.wav: The corrupted version of S1.wav, which has approximately 0.6% of its samples corrupted.
S2c.wav: The corrupted version of S2.wav, which has approximately 5% of its samples corrupted.
3.1 The effect of the warping factor
The detection method was initially calibrated as to produce restored versions with reasonable perceptual quality. During this initial stage, the warping factor was set to 0 as to compute the conventional linear prediction. Then, all parameters were frozen but the warping factor, which was made to vary from -0.9 to 0.9 with step of 0.1. For each value of the warping factor the percentages of missing and false detection associated to the corresponding restored signals were computed. In general, it was observed that negative values of the warping factor favor the detection scheme, since the percentage of missing detection is reduced. However, a small increase in the false detection is also observed but does not seem to produce any damage to the perceptual quality of the restored versions. Some of the restored versions are available as .WAV files in Table 1.
3.2 Two case studies
The idea here is to evaluate the performance of the detection scheme by confronting the percentages of missing and false detection when using conventional linear prediction (lambda=0) and warped linear prediction with lambda equal to -0.7.
The chosen strategy consists of setting a certain target for the percentage of missing, adjust the threshold gain, K, in order to fulfill the requirement, and then compare the percentages of false detection for each case. Of course, if the requirement can be achieved, the lower the percentage of false alarm the better the click detection performance. In this particular simulation, the corrupted version of the signal S1 was used; the length of the frames was set to 1024 samples; the model order was set to 40; the prefiltering W(z) and the time reverse filtering TRF were not employed. The obtained results for setting a missing below 1% are shown in Table 2.
|warping factor||missing [%]||false alarm [%]||restored signal|
By listening to the results given in Table 2 it can be concluded that the14% of false alarm in wS1p0TRF0.wav is unacceptable, since this restored version is severely distorted. The perceptual quality is much better for the restored signal wS1mp7TRF0.wav due to the lower level of false detection. Even so, some distortion can still be perceived at the cymbals passages.
Now, if the TRF is included in the detection procedure, and the missing is targetted to stay below 2%, the results shown in Table 3 are achieved.
|warping factor||missing [%]||false detection [%]||restored signal|
It can be said that, in the case of the signal S1c, allowing about 2% of missing does not yield to substantial losses in the percertual quality of the restored versions. The 7% of false detection of wS1p0TRF1.wav still produces some distortions, for instance, during the cymbals part. Again, the perceptual quality is slightly better for the restored version wS1mp7TRF1.wav which was obtained using the WLP-based scheme with negative value for the warping factor.
This URL: http://www.acoustics.hut.fi/publications/papers/dsp2002-declick
Last modified: 01.03.2002