http://www.acoustics.hut.fi/research/robustness/

Robustness and noise reduction

Demos and implementations of robust audio and speech signal processing and analysis methods.

Overview
This is a method to parametrize the typical long-term time dynamics of short-term acoustic parameters and features within some context or signal class. Multiple autoregressive filters on different time scales are trained to represent the typical time behavior. The multi-scale filter can then be used on new data to emphasize class-specific modulation frequencies and to reduce the effect of noise.
References
[1] J. Pohjalainen, P. Alku: Multi-scale modulation filtering in automatic detection of emotions in telephone speech, in Proc. ICASSP, Florence, Italy, May 4-9, 2014. pdf
[2] J. Pohjalainen, P. Alku: "Filtering and subspace selection for spectral features in detecting speech under physical stress", in Proc. Interspeech, Singapore, September 14-18, 2014. pdf
Implementations
- Filter coefficient estimation [1][2]
- Filtering and residual computation [1][2]

Overview
A Matlab toolbox of feature selection algorithms applicable to high-dimensional problems, evaluated in paralinguistic speech analysis tasks.
References
[1] J. Pohjalainen, O. Räsänen, S. Kadioglu: Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits, Computer Speech and Language 29(1), 2015. pdf
[2] O. Räsänen, J. Pohjalainen: "Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech", in Proc. Interspeech, Lyon, France, 2013. pdf
[3] J. Pohjalainen, S. Kadioglu, O. Räsänen: "Feature selection for speaker traits", in Proc. Interspeech, Portland, Oregon, 2012. pdf
Implementations
- Feature selection Matlab code toolbox [1][2][3]
- For an overview and further developments on feature selection for audio machine learning applications, see also here.

Overview
Some examples of smoothing gain functions for musical noise reduction in multi-microphone enhancement applications are shown here.

(by Symeon Delikaris-Manias)
Demos
All files are rendered for a monophonic setup using conventional postfiltering methods.
1. Single talker (English) at 0° with different levels of background noise
- SnR = 27: unsmoothed | AR smoothed
- SnR = 21: unsmoothed | AR smoothed
- SnR = 19: unsmoothed | AR smoothed

Overview
This is an experimental, interactive web system for speech and audio noise reduction and enhancement that applies many of the research ideas. Users can upload audio files and process them using different methods, including novel ones based on noise modulation rate.
References
[1] J. Pohjalainen, P. Alku: Multi-scale modulation filtering in automatic detection of emotions in telephone speech, in Proc. ICASSP, Florence, Italy, May 4-9, 2014. pdf
[2] J. Pohjalainen, F. Ringeval, Z. Zhang, B. Schuller: "Spectral and cepstral audio noise reduction techniques in speech emotion recognition", in Proc. ACM Multimedia, Amsterdam, The Netherlands, 2016. pdf
To the online noise reduction system
- https://audiodenoise.com/

Overview
Temporally weighted linear predictive methods have been studied for improving the robustness of speech feature extraction in many applications. Matlab implementations can be found below.
References
[1] J. Pohjalainen, P. Alku: "Gaussian mixture linear prediction", in Proc. ICASSP, Florence, Italy, May 4-9, 2014. pdf
[2] J. Pohjalainen, C. Hanilçi, T. Kinnunen and P. Alku: "Mixture linear prediction in speaker verification under vocal effort mismatch", IEEE Signal Processing Letters 21(12), 2014. pdf
[3] J. Pohjalainen, P. Alku: "Extended weighted linear prediction using the autocorrelation snapshot - a robust speech analysis method and its application to recognition of vocal emotions", in Proc. Interspeech, Lyon, France, August 25-29, 2013. pdf
[4] J. Pohjalainen, R. Saeidi, T. Kinnunen, P. Alku: "Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions", in Proc. Interspeech, Makuhari, Japan, September 26-30, 2010. pdf
[5] C. Magi, J. Pohjalainen, T. Bäckström, P. Alku: ""Stabilised weighted linear prediction"", Speech Communication, 51(5), pp. 401-411, April 2009.
[6] C. Ma, Y. Kamp, L. F. Willems: "Robust signal selection for linear prediction analysis of voiced speech", Speech Communication 12(2):69--81, 1993.
Implementations
- Matlab code (see also README file) for
- For further material and discussion on robust audio feature extraction, see also here.

Last modified: Thu Dec 29 16:38:22 EET 2022