| Literature DB >> 28868149 |
Ian McLoughlin1,2, Jingjie Li2, Yan Song2, Hamid R Sharifzadeh3.
Abstract
Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.Entities:
Keywords: Boltzmann machines; DNN structure; Gaussian mixture models; deep partially supervised neural network; larynx related dysphonia; medical disorders; medical signal processing; partially supervised training approach; restricted Boltzmann machine arrays; speech processing; statistical speech reconstruction; voice-loss patients
Year: 2017 PMID: 28868149 PMCID: PMC5569940 DOI: 10.1049/htl.2016.0103
Source DB: PubMed Journal: Healthc Technol Lett ISSN: 2053-3713
Fig. 1RBM networks first trained on time-aligned whisper and speech spectral features (top), then used to train feature mapping network weights (bottom)
Fig. 2Standard DNN implementation for performing WSC
Fig. 3Semi-DNN training methodology showing the two-pass training arrangement in the shaded box to the right
Results of the four binary subjective evaluation tests
| GMM | RBM | DNN | Semi-DNN | No preference |
|---|---|---|---|---|
| 13.3 | — | — | 70.0 | 16.7 |
| — | — | 3.3 | 77.8 | 18.9 |
| — | 13.3 | — | 43.3 | 43.3 |
| 43.3 | — | 32.2 | — | 24.4 |
Cepstral distortion measure
| Whispers | GMM | RBM | DNN | Semi-DNN | |
|---|---|---|---|---|---|
| mean | 8.45 | 5.37 | 6.43 | 5.76 | 6.06 |
| Standard | 4.3 | 2.61 | 2.89 | 2.68 | 2.62 |
Fig. 4Mean objective performance scores obtained from symmetrical IS distance measure, segmental SNR, and LLR
Fig. 5Comparison of the spectral envelope shape of each method for a short section of voiced speech
Fig. 6Spectrogram plots of
a Original whispers, speech reconstructed using
b GMM
c DNN
d RBM
e Semi- DNN methods
f Matching speech aligned below