Speech Synthesis using Reverberant and Feature-Enhanced Data
- Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo, "On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech", in Proc. Interspeech, Singapore, 2014. [Poster]
|
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 |
Sample 5 |
| Clean |
|
|
|
|
|
| Cln-SpkDep |
|
|
|
|
|
| Meeting Room Data |
| Reverb |
|
|
|
|
|
| Rev-SpkDep |
|
|
|
|
|
| Rev-SpkAda |
|
|
|
|
|
| Enh-SpkAda |
|
|
|
|
|
| EnhLSF-SpkAda |
|
|
|
|
|
| Lecture Room Data |
| Reverb |
|
|
|
|
|
| Rev-SpkDep |
|
|
|
|
|
| Rev-SpkAda |
|
|
|
|
|
| Enh-SpkAda |
|
|
|
|
|
| EnhLSF-SpkAda |
|
|
|
|
|
Clean - Original clean recording
Reverb - Reverberant speech generated by convolving the clean recording with the room impulse response of a meeting room or a lecture room.
Cln-SpkDep - Speaker-dependent voice built using clean data
Rev-SpkDep - Speaker-dependent voice built using reverberant data
Rev-SpkAda - Speaker-adapted from a clean average male voice using reverberant data
Enh-SpkAda - Speaker-adapted from a clean average male voice using reverberant data but with enhanced LSF stream
EnhLSF-SpkAda - Adatation with only the enhanced LSF stream. All other streams from the avgerage model.
Reference(s):