| Literature DB >> 28926572 |
Ling He1, Yin Liu1, Heng Yin2, Junpeng Zhang1, Jing Zhang1, Jiang Zhang1.
Abstract
The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.Entities:
Mesh:
Year: 2017 PMID: 28926572 PMCID: PMC5604964 DOI: 10.1371/journal.pone.0184267
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The flowchart of automatic initial and final segmentation system in cleft palate speech.
Fig 2The flowchart of automatic syllable segmentation.
Fig 3An example of automatic syllable extraction method.
Fig 4The flowchart of automatic initials/finals segmentation in a Mandarin syllable.
Fig 5An example of I/F segmentation for syllables with quasi-unvoiced initials.
Fig 6An example of I/F segmentation for syllables with quasi-voiced initials.
The I/F segmentation accuracy for cleft palate speech data and control samples.
| syllables with quasi-unvoiced initials | syllables with quasi-voiced initials | All syllables | ||||
|---|---|---|---|---|---|---|
| Cleft palate speech | Control samples | Cleft palate speech | Control samples | Cleft palate speech | Control samples | |
| Mean time errors (ms) | 4.4 | 5.3 | 25.7 | 32.1 | 9.6 | 10.6 |
| Deviation of time errors (ms) | 12.3 | 14.7 | 56.4 | 64.2 | 31.1 | 24.5 |
| P10 (%) | 91.24 | 89.77 | 61.86 | 58.21 | 84.14 | 82.86 |
| P20 (%) | 94.52 | 93.44 | 70.3 | 74.63 | 88.68 | 89.24 |
| P30 (%) | 96.22 | 95.27 | 77.47 | 76.21 | 91.69 | 91.24 |
The syllable extraction accuracies using state-of-the-art methods and our proposed method (%).
| short-time ZCR + energy + amplitude [ | wavelet transformation + entropy [ | Spectrum entropy + filtering [ | double sliding window energy + short-time ZCR [ | Power + filtering + short-time ZCR [ | Our proposed method | |
|---|---|---|---|---|---|---|
| Cleft palate speech | 50.16 | 56.34 | 50.30 | 75.6 | 78.3 | 90.62 |
| Control samples | 60.13 | 70.97 | 74.19 | 90.5 | 94.2 | 93.93 |
The I/F segmentation accuracies P30 using state-of-the-art methods and our proposed method (%).
| auditory model [ | short-time energy + amplitude + ZCR [ | discrete wavelet transform [ | entropy [ | auditory event detection [ | short-time energy + filtering [ | Our proposed method | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UV | V | UV | V | UV | V | UV | V | UV | V | UV | V | UV | V | |
| Cleft palate speech | 78.5 | 62.8 | 71.4 | 75.5 | 92.8 | 61.0 | 60.0 | 50.3 | 68.5 | 52.8 | 78.4 | 64.2 | 96.2 | 77.5 |
| Control samples | 88.2 | 69.1 | 89.1 | 68.7 | 94.1 | 63.7 | 63.7 | 55.0 | 78.2 | 59.1 | 88.2 | 68.7 | 95.3 | 76.2 |
aUV: syllables with unvoiced initials.
bV: syllables with voiced initials.