Speech Synthesis using Reverberant and Feature-Enhanced Data

Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo, "On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech", submitted to Interspeech 2014.

[Ref-3] H. Kallasjoki, J. F. Gemmeke, K. J. Palomäki, A. V. Beeston, and G. J. Brown, "Recognition of reverberant speech by missing data imputation and NMF feature enhancement," in Proc. REVERB Challenge Workshop, 2014.

Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
Clean
Reverb
Cln-SpkDep
Rev-SpkAda
Enh-SpkAda
RevLSF-SpkAda
EnhLSF-SpkAda

Clean - Original clean recording

Reverb - Reverberant speech generated by convolving the clean recording with the room impulse response of a lecture room.

Cln-SpkDep - Speaker-dependent voice built using clean data

Rev-SpkAda - Speaker-adapted from a clean average male voice using reverberant data

Enh-SpkAda - Speaker-adapted from a clean average male voice using enhanced LSF stream and other streams from reverberant data

RevLSF-SpkAda - Adatation of only the reverberant LSF stream. All other streams of the avgerage model untouched.

EnhLSF-SpkAda - Adatation of only the enhanced LSF stream. All other streams of the avgerage model untouched.