| Literature DB >> 24062794 |
Jun-Su Jang1, Young-Su Kim, Boncho Ku, Jong Yeol Kim.
Abstract
Sasang constitutional medicine is a unique form of tailored medicine in traditional Korean medicine. Voice features have been regarded as an important cue to diagnose Sasang constitution types. Many studies tried to extract quantitative voice features and standardize diagnosis methods; however, they had flaws, such as unstable voice features which vary a lot for the same individual, limited data collected from only few sites, and low diagnosis accuracy. In this paper, we propose a stable diagnosis model that has a good repeatability for the same individual. None of the past studies evaluated the repeatability of their diagnosis models. Although many previous studies used voice features calculated by averaging feature values from all valid frames in monotonic utterance like vowels, we analyse every single feature value from each frame of a sentence voice signal. Gaussian mixture model is employed to deal with a lot of voice features from each frame. Total 15 Gaussian models are used to represent voice characteristics for each constitution. To evaluate repeatability of the proposed diagnosis model, we introduce a test dataset consisting of 10 individuals' voice recordings with 50 recordings per each individual. Our result shows that the proposed method has better repeatability than the previous study which used averaged features from vowels and the sentence.Entities:
Year: 2013 PMID: 24062794 PMCID: PMC3770004 DOI: 10.1155/2013/920384
Source DB: PubMed Journal: Evid Based Complement Alternat Med ISSN: 1741-427X Impact factor: 2.629
Figure 1Frame window shift for extracting voice features for each frame.
Figure 2Gaussian mixture model along the time axis to cover each part of the voice signal.
Mean values of ts in each Gaussian model.
| No. of | Female | Male | |||||
|---|---|---|---|---|---|---|---|
| TE | SE | SY | TE | SE | SY | ||
| 1 | 0.050 | 0.046 | 0.035 | 0.062 | 0.064 | 0.065 | |
| 2 | 0.133 | 0.149 | 0.127 | 0.116 | 0.109 | 0.109 | |
| 3 | 0.172 | 0.179 | 0.177 | 0.170 | 0.178 | 0.187 | |
| 4 | 0.279 | 0.268 | 0.235 | 0.225 | 0.227 | 0.214 | |
| 5 | 0.332 | 0.283 | 0.307 | 0.296 | 0.279 | 0.279 | |
| 6 | 0.421 | 0.338 | 0.396 | 0.407 | 0.332 | 0.339 | |
| 7 | 0.502 | 0.412 | 0.432 | 0.459 | 0.417 | 0.416 | |
| 8 | 0.526 | 0.524 | 0.582 | 0.529 | 0.488 | 0.478 | |
| 9 | 0.590 | 0.592 | 0.590 | 0.590 | 0.591 | 0.549 | |
| 10 | 0.660 | 0.654 | 0.655 | 0.691 | 0.645 | 0.637 | |
| 11 | 0.687 | 0.694 | 0.721 | 0.773 | 0.649 | 0.700 | |
| 12 | 0.694 | 0.731 | 0.794 | 0.798 | 0.817 | 0.774 | |
| 13 | 0.820 | 0.819 | 0.814 | 0.832 | 0.837 | 0.823 | |
| 14 | 0.926 | 0.922 | 0.923 | 0.885 | 0.932 | 0.896 | |
| 15 | 0.981 | 0.978 | 0.979 | 0.894 | 0.980 | 0.937 | |
Comparison results of diagnosis stability between the previous study and the proposed method.
| Subject | Do et al. [ | Proposed | |||||
|---|---|---|---|---|---|---|---|
| Repeatability | Probability | Probability standard deviation | Repeatability | Probability | Probability standard deviation | ||
| 1 | 68 | 0.561 | 0.098 | 100 | 0.460 | 0.016 | |
| 2 | 76 | 0.632 | 0.119 | 100 | 0.434 | 0.039 | |
| 3 | 100 | 0.732 | 0.090 | 98 | 0.406 | 0.027 | |
| 4 | 54 | 0.500 | 0.059 | 100 | 0.397 | 0.025 | |
| 5 | 98 | 0.735 | 0.112 | 100 | 0.451 | 0.031 | |
| 6 | 100 | 0.672 | 0.090 | 70 | 0.381 | 0.020 | |
| 7 | 84 | 0.603 | 0.100 | 82 | 0.362 | 0.013 | |
| 8 | 56 | 0.567 | 0.082 | 76 | 0.381 | 0.013 | |
| 9 | 70 | 0.689 | 0.130 | 90 | 0.384 | 0.019 | |
| 10 | 76 | 0.629 | 0.140 | 100 | 0.408 | 0.017 | |
|
| |||||||
| Average | 78.2 | 0.632 | 0.102 | 91.6 | 0.406 | 0.022 | |
Standard deviations of feature values from the sentence and vowels.
| Subject | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
| Sentence features | MFCC1 | 0.290 | 1.287 | 0.535 | 0.414 | 0.333 | 0.317 | 0.432 | 0.269 | 0.395 | 0.245 |
| MFCC2 | 0.292 | 0.358 | 0.270 | 0.407 | 0.356 | 0.353 | 0.363 | 0.391 | 0.264 | 0.290 | |
| MFCC3 | 0.325 | 0.597 | 0.302 | 0.350 | 0.635 | 0.413 | 0.511 | 0.205 | 0.313 | 0.259 | |
| MFCC4 | 0.246 | 0.200 | 0.223 | 0.434 | 0.286 | 0.180 | 0.448 | 0.322 | 0.260 | 0.139 | |
| MFCC5 | 0.451 | 0.320 | 0.352 | 0.423 | 0.479 | 0.309 | 0.708 | 0.470 | 0.288 | 0.280 | |
| MFCC6 | 0.213 | 0.292 | 0.199 | 0.227 | 0.258 | 0.240 | 0.428 | 0.523 | 0.329 | 0.160 | |
| MFCC7 | 0.336 | 0.287 | 0.428 | 0.266 | 0.341 | 0.294 | 0.311 | 0.278 | 0.262 | 0.307 | |
| MFCC8 | 0.340 | 0.346 | 0.257 | 0.289 | 0.269 | 0.293 | 0.303 | 0.347 | 0.258 | 0.204 | |
| MFCC9 | 0.268 | 0.399 | 0.279 | 0.521 | 0.374 | 0.307 | 0.480 | 0.420 | 0.362 | 0.211 | |
| MFCC10 | 0.359 | 0.279 | 0.284 | 0.461 | 0.387 | 0.412 | 0.381 | 0.485 | 0.286 | 0.193 | |
| MFCC11 | 0.251 | 0.334 | 0.304 | 0.412 | 0.375 | 0.411 | 0.432 | 0.410 | 0.251 | 0.235 | |
| MFCC12 | 0.329 | 0.380 | 0.178 | 0.325 | 0.590 | 0.398 | 0.669 | 0.278 | 0.265 | 0.313 | |
|
| |||||||||||
| Vowel features | aENG | 0.069 | 0.878 | 0.208 | 0.417 | 0.394 | 0.129 | 0.912 | 0.244 | 1.549 | 1.659 |
| aF1 | 0.634 | 0.336 | 0.453 | 0.380 | 0.282 | 0.552 | 0.489 | 0.652 | 0.366 | 0.346 | |
| aSHIM | 0.848 | 1.030 | 0.875 | 1.167 | 1.094 | 1.094 | 2.843 | 2.195 | 1.448 | 0.749 | |
| eSHIM | 1.127 | 1.397 | 0.986 | 1.555 | 0.873 | 1.120 | 2.870 | 3.048 | 1.393 | 0.541 | |
| iDTF0 | 5.234 | 7.824 | 1.775 | 9.167 | 3.757 | 4.080 | 10.320 | 6.106 | 9.598 | 6.307 | |
| iJITT | 0.933 | 1.066 | 0.695 | 5.042 | 1.623 | 1.154 | 2.869 | 2.789 | 1.096 | 0.915 | |
| oDTF0 | 2.330 | 6.399 | 1.296 | 3.616 | 3.767 | 3.623 | 9.513 | 8.807 | 1.316 | 1.178 | |
| oPW | 0.046 | 0.454 | 0.206 | 0.500 | 0.294 | 0.272 | 1.133 | 0.225 | 1.198 | 2.080 | |
| uF1 | 0.263 | 0.286 | 0.874 | 0.580 | 0.188 | 0.356 | 0.422 | 0.273 | 0.216 | 0.659 | |
Vowel features, xENG (energy), xF1 (1st formant), xSHIM (shimmer), xDTF0 (average difference of pitch over the time interval), xJITT (jitter), and xPW (power) were used in the study of Do et al. [4] x ∈{a, e, i, o, u}.