| Literature DB >> 36202829 |
Hae-Yeon Park1, DoGyeom Park2, Seungchul Lee3,4, Sun Im5, Hye Seon Kang6,7, HyunBum Kim8.
Abstract
Abnormal voice may identify those at risk of post-stroke aspiration. This study was aimed to determine whether machine learning algorithms with voice recorded via a mobile device can accurately classify those with dysphagia at risk of tube feeding and post-stroke aspiration pneumonia and be used as digital biomarkers. Voice samples from patients referred for swallowing disturbance in a university-affiliated hospital were collected prospectively using a mobile device. Subjects that required tube feeding were further classified to high risk of respiratory complication, based on the voluntary cough strength and abnormal chest x-ray images. A total of 449 samples were obtained, with 234 requiring tube feeding and 113 showing high risk of respiratory complications. The eXtreme gradient boosting multimodal models that included abnormal acoustic features and clinical variables showed high sensitivity levels of 88.7% (95% CI 82.6-94.7) and 84.5% (95% CI 76.9-92.1) in the classification of those at risk of tube feeding and at high risk of respiratory complications; respectively. In both cases, voice features proved to be the strongest contributing factors in these models. Voice features may be considered as viable digital biomarkers in those at risk of respiratory complications related to post-stroke dysphagia.Entities:
Mesh:
Year: 2022 PMID: 36202829 PMCID: PMC9537337 DOI: 10.1038/s41598-022-20348-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Algorithm development. Raw data from voice signals were preprocessed after normalization. Clinical data were concatenated to the Praat features in the machine learning models. A two step-process was then used to first classify those with oral feeding versus tube feeding (algorithm 1) and, among the latter, classify those at high risk of respiratory complications (algorithm 2). ML machine learning, SVM support vector machine, GMM Gaussian mixture model, XGBoost extreme gradient boosting.
Demographic features and acoustic parameters between mild versus severe dysphagia.
| Mild dysphagia (Oral feeding) (N = 215) | Tube feeding (Tube feeding) (N = 234) | ||
|---|---|---|---|
| Age (years) | 65.7 ± 13.2 | 72.2 ± 11.2 | < 0.001* |
| Male | 63.5 ± 13.9 | 71.4 ± 11.4 | < 0.001* |
| Female | 69.5 ± 11.1 | 73.4 ± 10.9 | 0.020* |
| Gender (Male) | 135 (62.8%) | 137 (58.5%) | 0.411 |
| Weight (kg) | 61.3 ± 11.6 | 57.2 ± 10.8 | < 0.001* |
| PAS | 3.7 ± 1.9 | 7.2 ± 1.2 | < 0.001* |
| Aspiration (Yes) | 37 (17.2%) | 222 (94.9%) | < 0.001* |
| FOIS | 5.0 ± 1.0 | 1.7 ± 1.0 | < 0.001* |
| MASA | 182.6 ± 13.3 | 157.2 ± 17.3 | < 0.001* |
| PCF (L/min) | 231.7 ± 130.0 | 115.8 ± 87.8 | < 0.001* |
| MMSE | 22.9 ± 6.2 | 16.5 ± 8.7 | < 0.001* |
| MBI | 65.9 ± 29.4 | 29.8 ± 29.3 | < 0.001* |
| NIHSS | 5.3 ± 4.4 | 10.3 ± 5.1 | < 0.001* |
| BBS | 35.4 ± 20.1 | 15.2 ± 18.8 | < 0.001* |
| F0 (Hz) | 199.9 ± 58.8 | 206.0 ± 73.4 | 0.324 |
| Male | 183.7 ± 57.2 | 185.9 ± 67.7 | |
| Female | 227.1 ± 51.2 | 234.6 ± 71.9 | |
| F0_SD (Hz) | 5.0 ± 11.1 | 7.2 ± 12.5 | 0.051 |
| HNR (dB) | 16.45 ± 4.91 | 13.46 ± 5.36 | < 0.001* |
| LocalJitter (%) | 1.21 ± 0.98 | 2.03 ± 1.49 | < 0.001* |
| LocalAbsoluteJitter (μs) | 64.42 ± 55.12 | 112.64 ± 96.42 | < 0.001* |
| RAP (%) | 0.60 ± 0.52 | 1.05 ± 0.86 | < 0.001* |
| PPQ5Jitter (%) | 0.68 ± 0.60 | 1.18 ± 0.99 | < 0.001* |
| DdpJitter (%) | 1.80 ± 1.56 | 3.16 ± 2.59 | < 0.001* |
| LocalShimmer (%) | 6.71 ± 4.08 | 9.74 ± 5.47 | < 0.001* |
| LocaldbShimmer | 0.65 ± 0.34 | 0.91 ± 0.44 | < 0.001* |
| APQ3Shimmer (%) | 3.30 ± 2.02 | 5.04 ± 3.21 | < 0.001* |
| APQ5Shimmer (%) | 4.12 ± 3.05 | 6.12 ± 3.81 | < 0.001* |
| APQ11Shimmer (%) | 5.63 ± 4.83 | 8.05 ± 5.50 | < 0.001* |
| DdaShimmer (%) | 9.91 ± 6.07 | 15.14 ± 9.62 | < 0.001* |
| CPP | 23.36 ± 0.18 | 24.09 ± 0.15 | < 0.001* |
Values are presented in mean ± standard deviation (SD) or number (%).
*p < 0.05 is used for statistical significance.
PAS penetration-aspiration scale, FOIS functional oral intake scale, MASA mann assessment of swallowing ability, PCF peak cough flow, MMSE mini-mental state examination, MBI modified barthel index, NIHSS national institutes of health stroke scale, BBS berg balance scale, F0 fundamental frequency, HNR harmonic to noise ratio, RAP relative average perturbation, PPQ period perturbation quotient, APQ amplitude perturbation quotient, CPP cepstral peak prominence.
Demographic features and acoustic parameters according to respiratory complication risk within those with tube feedings.
| Low risk (N = 121) | High risk (N = 113) | ||
|---|---|---|---|
| Age (years) | 70.2 ± 12.1 | 74.4 ± 9.8 | 0.004* |
| Male | 70.1 ± 12.3 | 73.7 ± 9.5 | 0.078 |
| Female | 70.4 ± 11.7 | 75.0 ± 10.1 | 0.044* |
| Gender (Male) | 86 (71.1%) | 51 (45.1%) | 1.000 |
| Weight (kg) | 59.2 ± 9.9 | 55.0 ± 11.3 | 0.003* |
| PAS | 7.1 ± 1.1 | 7.3 ± 1.2 | 0.131 |
| FOIS | 1.9 ± 1.0 | 1.6 ± 0.9 | 0.061 |
| MASA | 165.1 ± 13.4 | 148.6 ± 16.9 | < 0.001* |
| PCF (L/min) | 175.1 ± 85.2 | 52.4 ± 20.0 | < 0.001* |
| MMSE | 20.7 ± 7.5 | 11.9 ± 7.5 | < 0.001* |
| MBI | 41.6 ± 31.5 | 17.1 ± 20.3 | < 0.001* |
| NIHSS | 9.1 ± 5.3 | 11.6 ± 4.6 | < 0.001* |
| BBS | 21.9 ± 20.9 | 7.9 ± 12.8 | < 0.001* |
| F0 (Hz) | 199.0 ± 68.7 | 213.5 ± 77.7 | 0.13 |
| Male | 181.92 ± 60.32 | 192.48 ± 78.89 | |
| Female | 241.04 ± 70.78 | 230.88 ± 72.83 | |
| F0_SD (Hz) | 5.00 ± 9.27 | 9.48 ± 14.89 | 0.007* |
| HNR (dB) | 14.29 ± 4.83 | 12.57 ± 5.76 | 0.013* |
| LocalJitter (%) | 1.63 ± 1.08 | 2.45 ± 1.74 | < 0.001* |
| LocalAbsoluteJitter (μs) | 93.15 ± 74.20 | 133.51 ± 112.21 | 0.001* |
| RAP (%) | 0.83 ± 0.62 | 1.29 ± 1.02 | < 0.001* |
| PPQ5Jitter (%) | 0.93 ± 0.71 | 1.44 ± 1.17 | < 0.001* |
| DdpJitter (%) | 2.49 ± 1.85 | 3.88 ± 3.05 | < 0.001* |
| LocalShimmer (%) | 8.38 ± 4.31 | 11.19 ± 6.19 | < 0.001* |
| LocaldbShimmer | 0.80 ± 0.36 | 1.03 ± 0.49 | < 0.001* |
| APQ3Shimmer (%) | 4.24 ± 2.37 | 5.91 ± 3.73 | < 0.001* |
| APQ5Shimmer (%) | 5.18 ± 2.80 | 7.13 ± 4.45 | < 0.001* |
| APQ11Shimmer (%) | 6.64 ± 3.25 | 9.55 ± 6.86 | < 0.001* |
| DdaShimmer (%) | 12.72 ± 7.10 | 17.73 ± 11.20 | < 0.001* |
| CPP | 23.81 ± 0.30 | 24.52 ± 0.37 | < 0.001* |
Values are presented in mean ± standard deviation (SD) or number (%).
*p < 0.05 is used for statistical significance.
PAS penetration-aspiration scale, FOIS functional oral intake scale, MASA mann assessment of swallowing ability, PCF peak cough flow, MMSE mini-mental state examination, MBI modified barthel index, NIHSS national institutes of health stroke scale, BBS berg balance scale, F0 fundamental frequency, HNR harmonic to noise ratio, RAP relative average perturbation, PPQ period perturbation quotient, APQ amplitude perturbation quotient, CPP cepstral peak prominence.
Figure 2Correlation analysis between the Praat features and the clinical parameters. The correlation graph shows that nearly all the voice features showed significant association with the clinical parameters, especially with those related to swallowing, and peak cough flow values. An exception was observed with the fundamental frequencies, which failed to show any association with the clinical parameters. *p < 0.05; **p < 0.01; ***p < 0.001. HNR harmonic to noise ratio, F0 fundamental frequency, MBI modified barthel index, NIHSS national institutes of health stroke scale. F0 fundamental frequency, SD standard deviation, APQ amplitude perturbation quotient, PPQ period perturbation quotient, RAP relative average perturbation, PAS penetration-aspiration scale, NIHSS national institutes of health stroke scale, HNR harmonic to noise ratio, MASA mann assessment of swallowing ability, FOIS functional oral intake scale, PCF peak cough flow, MMSE mini-mental state examination, MBI modified barthel index.
Evaluation metric table of samples for voice signals in classifying tube feeding.
| Accuracy (%) | Sensitivity (%) | Specificity (%) | NPV (%) | PPV (%) | F1 | AUC | |
|---|---|---|---|---|---|---|---|
| LR | 68.2 (64.3–72.1) | 65.7 (58.2–73.1) | 70.7 (65.4–75.9) | 67.8 (63.1–72.6) | 69.3 (65.0–73.7) | 0.67 (0.62–0.72) | 0.69 (0.64–0.74) |
| DT | 69.0 (64.5–73.5) | 62.0 (56.5–67.5) | 76.0 (67.2–84.8) | 66.6 (63.2–70.0) | 73.3 (66.1–80.6) | 0.67 (0.62–0.71) | 0.70 (0.65–0.75) |
| RF | 73.7 (70.2–77.1) | 70.7 (66.1–75.3) | 76.7 (70.5–82.8) | 72.5 (69.2–75.7) | 75.7 (70.8–80.6) | 0.73 (0.69–0.76) | 0.78 (0.73–0.82) |
| SVM | 69.7 (65.9–73.5) | 71.0 (66.9–75.1) | 68.3 (62.8–73.9) | 70.2 (66.4–74.0) | 69.4 (65.2–73.5) | 0.70 (0.67–0.74) | 0.68 (0.63–0.73) |
| GMM | 66.2 (61.3–71.0) | 64.7 (51.3–78.1) | 67.7 (60.1–75.2) | 67.5 (61.2–73.7) | 66.3 (61.1–71.5) | 0.64 (0.55–0.74) | 0.64 (0.55–0.72) |
| XGBoost | |||||||
| LR | 77.2 (74.4–80.0) | 76.7 (67.1–86.2) | 78.3 (72.5–84.2) | 78.7 (73.5–83.8) | 0.77 (0.73–0.81) | 0.82 (0.79–0.85) | |
| DT | 74.5 (70.6–78.4) | 80.0 (72.9–87.1) | 69.0 (64.8–73.2) | 78.3 (72.6–84.0) | 72.1 (68.8–75.3) | 0.76 (0.71–0.80) | 0.75 (0.71–0.80) |
| RF | 79.7 (75.9–83.4) | 85.0 (78.6–91.4) | 74.3 (67.0–81.7) | 83.9 (78.9–89.0) | 77.5 (72.3–82.7) | 0.81 (0.77–0.84) | 0.84 (0.80–0.88) |
| SVM | 77.0 (73.8–80.2) | 84.3 (78.7–90.0) | 69.7 (64.1–75.2) | 82.3 (77.4–87.1) | 73.8 (70.4–77.2) | 0.79 (0.75–0.82) | 0.81 (0.77–0.84) |
| GMM | 73.2 (68.9–77.5) | 76.3 (69.5–83.1) | 70.0 (63.6–76.4) | 75.4 (69.4–81.3) | 72.1 (67.6–76.6) | 0.74 (0.70–0.78) | 0.75 (0.71–0.79) |
| XGBoost | 76.3 (71.4–81.3) | ||||||
Values are presented in mean (95% confidence interval). Values with bold-text represent the highest values among the models.
NPV negative predictive value, PPV positive predictive value, AUC area under curve, LR logistic regression, DT decision tree, RF random forest, SVM support vector machine, GMM Gaussian mixture model, XGBoost extreme gradient boosting.
Figure 3AUC-ROC curve of the XGBoost model for classifying tube feeding and risk of respiratory complications. AUC-ROC curves show that multimodal models that combine phonation and clinical data demonstrate high levels of AUC in classifying (a) risk of tube feeding and (b) respiratory complications. AUC area under curve, ROC receiver operating characteristic.
Evaluation metric table of samples for voice signals in classifying risk of respiratory complications.
| Accuracy (%) | Sensitivity (%) | Specificity (%) | NPV (%) | PPV (%) | F1 | AUC | |
|---|---|---|---|---|---|---|---|
| LR | 66.2 (61.5–71.0) | 55.0 (49.5–60.5) | 77.5 (70.5–84.5) | 63.3 (59.4–67.2) | 72.2 (65.1–79.3) | 0.62 (0.56–0.67) | 0.64 (0.59–0.70) |
| DT | 70.0 (64.3–75.7) | 72.5 (65.3–79.7) | 67.5 (57.5–77.5) | 71.5 (64.6–78.4) | 70.2 (62.5–77.9) | 0.71 (0.66–0.76) | 0.71 (0.64–0.77) |
| RF | 70.5 (66.5–74.5) | 71.5 (60.4–82.6) | 69.5 (57.9–81.1) | 72.7 (66.5–78.8) | 72.0 (65.1–79.0) | 0.70 (0.65–0.75) | 0.73 (0.67–0.78) |
| SVM | 67.0 (63.4–70.6) | 56.0 (47.9–64.1) | 64.5 (60.3–68.7) | 72.2 (66.8–77.6) | 0.63 (0.57–0.68) | 0.65 (0.61–0.69) | |
| GMM | 62.2 (57.8–66.7) | 69.0 (62.7–75.3) | 55.5 (45.9–65.1) | 64.3 (59.7–68.9) | 61.5 (56.5–66.4) | 0.65 (0.61–0.68) | 0.61 (0.55–0.66) |
| XGBoost | 71.5 (59.3–83.7) | ||||||
| LR | 75.8 (71.3–80.2) | 78.0 (70.4–85.6) | 73.5 (64.1–82.9) | 78.2 (71.1–85.3) | 75.9 (68.9–82.8) | 0.76 (0.72–0.80) | 0.79 (0.74–0.84) |
| DT | 73.8 (68.7–78.8) | 74.5 (66.7–82.3) | 73.0 (66.6–79.4) | 74.8 (68.3–81.4) | 73.7 (68.2–79.1) | 0.74 (0.68–0.79) | 0.73 (0.68–0.79) |
| RF | 76.5 (73.1–79.9) | 82.0 (77.6–86.4) | 71.0 (64.1–77.9) | 80.3 (75.7–84.9) | 74.4 (69.8–79.1) | 0.78 (0.75–0.81) | 0.81 (0.77–0.86) |
| SVM | 74.5 (70.4–78.6) | 80.5 (70.0–91.0) | 68.5 (59.6–77.4) | 80.1 (72.0–88.2) | 72.7 (68.2–77.2) | 0.76 (0.71–0.80) | 0.76 (0.72–0.80) |
| GMM | 74.2 (69.7–78.8) | 81.5 (75.1–87.9) | 67.0 (57.6–76.4) | 79.6 (73.3–85.9) | 72.8 (65.5–80.2) | 0.76 (0.72–0.80) | 0.76 (0.71–0.81) |
| XGBoost | |||||||
Values are presented in mean (95% confidence interval). Values with bold-text represent the highest values among the models.
NPV negative predictive value, PPV positive predictive value, AUC area under curve, LR logistic regression, DT decision tree, RF random forest, SVM support vector machine, GMM Gaussian mixture model, XGBoost extreme gradient boosting.
Figure 4Feature importance analysis. Feature importance analysis from the XGBoost with plots demonstrating that APQ11Shimmer and RAP values are the major features even after including the clinical variables in (a) classifying those with tube feeding and (b) at risk of respiratory complications. XGBoost extreme gradient boosting, RAP relative average perturbation, APQ amplitude perturbation quotient, HNR harmonic to noise ratio, F0 fundamental frequency, MBI modified barthel index, NIHSS national institutes of health stroke scale.