| Literature DB >> 28194222 |
Chunying Fang1, Haifeng Li2, Lin Ma2, Mancai Zhang2.
Abstract
Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.Entities:
Mesh:
Year: 2017 PMID: 28194222 PMCID: PMC5282458 DOI: 10.1155/2017/2431573
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
BAFS Feature Set Construction.
| Types | Feature | Dimension |
|---|---|---|
| Prosodic features | fundamental frequency | 15 |
|
| ||
| Sound quality features | Jitter | 15 |
| shimmer | 15 | |
| HNR | 15 | |
|
| ||
| Related features based on spectral | Spectral Centroid | 10 |
| Spectral Entropy | 10 | |
| Spectral Flux | 10 | |
| Spectral Asymmetry | 10 | |
| Spectral Slope | 10 | |
| Spectral Kurtosis | 10 | |
| Spectral Roll-off | 40 | |
Figure 1MSCC extraction based on S-transform.
Figure 2Features fusion and optimization.
Figure 3Speech intelligibility evaluation schema.
The NKI-CCRT corpus.
| NCSC | Training set | Test set |
|---|---|---|
| I | 384 | 341 |
| NI | 517 | 405 |
The SVD corpus.
| SVD | Training set | Test set |
|---|---|---|
| Healthy | 434 | 198 |
| Pathology | 651 | 211 |
MSCC and MFCC are compared based on NKI-CCRT corpus.
| Feature | Sensitivity | Specificity | UA | Accuracy |
|---|---|---|---|---|
| MSCC | 67.15% | 62.36% | 64.76% | 63.67% |
| MFCC | 56.25% | 46.90% | 51.58% | 50.54% |
MSCC and MFCC are compared based on SVD corpus.
| Feature | Sensitivity | Specificity | UA | Accuracy |
|---|---|---|---|---|
| MSCC | 70.62% | 69.20% | 69.91% | 68.95% |
| MFCC | 61.61% | 56.56% | 59.09% | 59.17% |
Figure 4F-score of MSCC and MFCC.
Basis acoustic feature and Chaotic features results based on NKI-CCRT corpus.
| Feature | Sensitivity | Specificity | UA | Accuracy |
|---|---|---|---|---|
| BAFS (430) | 63.70% | 57.77% | 60.74% | 60.99% |
| CF (12) | 55.31% | 61.00% | 58.16% | 57.91% |
| MSCC + BAFS (514) | 82.72% | 65.10% | 73.91% | 74.66% |
| CF + MSCC + BAFS (526) | 82.96% | 65.68% | 74.10% | 75.07% |
Basis acoustic feature and Chaotic features results based on SVD corpus.
| Feature | Sensitivity | Specificity | UA | Accuracy |
|---|---|---|---|---|
| BAFS (430) | 73.93% | 69.70% | 69.91% | 68.95% |
| CF (12) | 62.56% | 57.58% | 60.07% | 60.15% |
| MSCC + BAFS (514) | 80.09% | 71.21% | 75.65% | 75.79% |
| CF + MSCC + BAFS (526) | 79.15% | 73.23% | 76.19% | 76.28% |
Features optimization results.
| Feature | Sensitivity | Specificity | UA | Accuracy |
|---|---|---|---|---|
| Re_fea (96- NKI-CCRT) | 84.44% | 65.69% | 75.07% | 75.87% |
| Re_fea (104- SVD) | 78.67% | 79.30% | 78.99% | 78.97% |
| Baseline (NKI-CCRT) | — | — | 61.40% | — |