Companion page for the paper published in IEEE Trans. on Audio, Speech and Language Processing, vol. 21, no. 11, pp. 2356-2367, Nov. 2013 [1].
Abstract
A parametric spatial filtering algorithm with a fixed beam direction is proposed in this paper. The algorithm utilizes the normalized cross-spectral density between signals from microphones of different orders as a criterion for focusing in specific directions. The correlation between microphone signals is estimated in the time-frequency domain. A post-filter is calculated from a multichannel input and is used to assign attenuation values to a coincidentally captured audio signal. The proposed algorithm is simple to implement and offers the capability of coping with interfering sources at different azimuthal locations with or without the presence of diffuse sound. It is implemented by using directional microphones placed in the same look direction and have the same magnitude and phase response. Experiments are conducted with simulated and real microphone arrays employing the proposed post-filter and compared to previous coherence-based approaches, such as the McCowan post-filter. A significant improvement is demonstrated in terms of objective quality measures. Formal listening tests conducted to assess the audibility of artifacts of the proposed algorithm in real acoustical scenarios show that no annoying artifacts existed with certain spectral floor values. Examples of the proposed algorithm are shown here.
Demos
All files are real multichannel recordings, processed with the CroPaC spatial filtering algorithm as described in the published paper. A simultaneous dual-talker scenario is recorded in a room with loudspeakers acting as the talkers. The original recordings were conducted with an 8-microphone uniform cylindrical array of 1.3cm radius in a reverberant space (500ms). The following examples demonstrate the performance of CroPaC with different values of spectral floor. English talker is at 0° and Danish talker at 90°. Three scenarios are generated with different SNR=10, 1 and -10dB and the CroPaC algorithm is utilized to focus first on the English talker and secondly to the Danish talker.
Note: the samples have been updated according to the newest additions of the algorithm, as they are proposed in [2].
Instructions: Click on the || button to listen to a single sample. Click on a different case to switch to the corresponind sample for direct comparison.
Focusing at the position of the english talker (0°)
SnR = 10:
microphone noisy input, SNR = 10
CroPaC output (spectral floor = 0), SNR = 10
CroPaC output (spectral floor = 0.1), SNR = 10
CroPaC output (spectral floor = 0.2), SNR = 10
CroPaC output (spectral floor = 0.3), SNR = 10
SnR = 1:
microphone noisy input, SNR = 1
CroPaC output (spectral floor = 0), SNR = 1
CroPaC output (spectral floor = 0.1), SNR = 1
CroPaC output (spectral floor = 0.2), SNR = 1
CroPaC output (spectral floor = 0.3), SNR = 1
SnR = -10:
microphone noisy input, SNR = -10
CroPaC output (spectral floor = 0), SNR = -10
CroPaC output (spectral floor = 0.1), SNR = -10
CroPaC output (spectral floor = 0.2), SNR = -10
CroPaC output (spectral floor = 0.3), SNR = -10
Focusing at the position of the danish talker (90°)