Spatial filtering with microphone arrays is a technique that can be utilized to obtain the signal of a target sound source from a specific direction. Typical approaches in the field of audio underperform in practical environments with multiple sound sources and diffuse sound. In this contribution we propose a post-filtering technique to suppress the effect of interferers and diffuse sound. The proposed technique utilizes the cross-spectral estimates of the output of two beamformers to formulate a time-frequency soft masker. The beamformers' outputs are used only for parameter estimation and not for generating an audio signal. Two sets of beamformer weights, a constant and an adaptive, are applied to the microphone array signals for the parameter estimation. The weights of the constant beamformer are designed such that they provide a spatially narrow beam pattern that is time and frequency invariant, having a unity gain towards the direction of interest. The weights of the adaptive beamformer are formulated using linearly constrained optimization with the constraint of weighted orthogonality with respect to the constant beamformer weights, as well as the unity gain towards the look direction. The orthogonality constraint provides diffuse sound suppression while the unity gain distortionless response. The cross spectrum of these two beamformers provides the target energy at a given look direction for the post filter. The study focuses on compact microphone arrays with which the typical beamforming techniques feature a trade-off between noise amplification and spatial selectivity, especially in the low frequency region. The proposed method is evaluated with instrumental measures and listening tests under different reverberation times, in dual and multi-talker scenarios. The evaluation shows that the proposed method provides a better performance when compared with a previous state-of-the-art spatial filter based on cross-pattern coherence, a linearly constrained beamformer and a Wiener post-filter.
Demos
All files are real multichannel recordings, processed with the proposed spatial filtering algorithm as described in the submitted paper. A simultaneous two-talker scenario is generated in a virtual room. The simulated recordings were generated with an 16-channel microphone uniform spherical array of 1.5cm radius in reverberant environments. A selection of two-talker scenarios are shown here, consisted of an interfering sound source at 50, 70, 90 and 140 degrees off the look direction and at different reverberation times.
Instructions: Click on the || button to listen to a single sample. Click on a different case to switch to the corresponind sample for direct comparison.
Simultaneous talker scenario
clean target signal
Interferer at 50 degrees with RT = 0.4sec:
omnidirectional
CroPaC output
Proposed method output
Interferer at 70 degrees with RT = 0.2sec:
omnidirectional
CroPaC output
Proposed method output
Interferer at 90 degrees with RT = 0.3sec:
omnidirectional
CroPaC output
Proposed method output
Interferer at 140 degrees with RT = 0.5sec:
omnidirectional
CroPaC output
Proposed method output
Updated on Saturday March 12, 2016
This page uses HTML5, CSS, and JavaScript