| Literature DB >> 28088191 |
Sylvester O Orimaye1, Jojo S-M Wong2, Karen J Golden3, Chee P Wong3, Ireneous N Soyiri4.
Abstract
BACKGROUND: The manual diagnosis of neurodegenerative disorders such as Alzheimer's disease (AD) and related Dementias has been a challenge. Currently, these disorders are diagnosed using specific clinical diagnostic criteria and neuropsychological examinations. The use of several Machine Learning algorithms to build automated diagnostic models using low-level linguistic features resulting from verbal utterances could aid diagnosis of patients with probable AD from a large population. For this purpose, we developed different Machine Learning models on the DementiaBank language transcript clinical dataset, consisting of 99 patients with probable AD and 99 healthy controls.Entities:
Keywords: Alzheimer’s disease; Clinical diagnostics; Machine learning; Neurolinguistics; Prediction
Mesh:
Substances:
Year: 2017 PMID: 28088191 PMCID: PMC5237556 DOI: 10.1186/s12859-016-1456-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistical analysis of syntactic and lexical features from the PrADG and HEG based on Student’s t-test
| PrADG | HEG |
| df |
| 95% | |
|---|---|---|---|---|---|---|
| MEAN(SD) | MEAN(SD) | CI(Difference) | ||||
| Confounding feature | ||||||
| Age | 70.45(8.916) | 65.26(8.388) | 3.621 | 148 | <0.000* | 2.36 to 8.01 |
| Syntactic features | ||||||
| Coordinated sentences | 5.09(3.22) | 4.85(2.99) | 0.55 | 196 | 0.584 | -0.63 to 1.11 |
| Subordinated sentences | 5.42(3.63) | 5.13(3.19) | 0.60 | 196 | 0.547 | -0.66 to 1.25 |
| Reduced sentences | 2.95(2.48) | 4.08(2.57) | -3.15 | 196 | 0.002* | -1.84 to -0.42 |
| Number of Predicates | 5.54 (3.44) | 6.94(3.53) | -2.83 | 196 | 0.005* | -2.38 to -0.43 |
| Avr. predicates per sentence | 0.42(0.19) | 0.58(0.22) | -5.48 | 196 | <0.000* | -0.22 to -0.10 |
| Number of dependencies | 100.90(53.36) | 100.81(51.44) | 0.01 | 196 | 0.990 | -14.60 to 14.78 |
| Avr.dependency per sentence | 8.21(2.69) | 8.78(2.36) | -1.58 | 196 | 0.115 | -1.28 to 0.14 |
| Dependency distance | 16.21(7.75) | 17.09(7.05) | -0.83 | 196 | 0.405 | -2.95 to 1.197 |
| Production rules | 128.61(52.00) | 126.75(46.35) | 0.26 | 196 | 0.791 | -11.95 to 15.67 |
| Lexical features | ||||||
| Utterances | 50.52(35.61) | 31.05(15.49) | 4.99 | 196 | <0.000* | 11.77 to 27.16 |
| MLU | 2.65(1.70) | 4.03(2.25) | -4.86 | 196 | <0.000* | -1.94 to -0.82 |
| Function words | 58.00(35.84) | 59.71(35.33) | -0.34 | 196 | 0.736 | -11.68 to 8.27 |
| Unique words | 115.92(63.96) | 116.03(59.92) | -0.01 | 196 | 0.990 | -17.48 to 17.26 |
| Word count | 127.79(72.62) | 127.69(68.45) | 0.01 | 196 | 0.992 | -19.68 to 19.88 |
| Character length | 562.35(313.33) | 583.21(316.65) | -0.47 | 196 | 0.642 | -109.15 to 67.44 |
| Total sentences | 14.01(8.33) | 12.48(5.56) | 1.52 | 196 | 0.131 | -0.46 to 3.51 |
| Repetitions | 2.09(3.08) | 0.70(1.03) | 4.27 | 196 | <0.000* | 0.75 to 2.04 |
| Revision | 4.54(5.27) | 2.02(2.20) | 4.38 | 196 | <0.000* | 1.38 to 3.65 |
| Number of morphemes | 117.35(76.56) | 117.09(69.65) | 0.02 | 196 | 0.980 | -20.25 to 20.78 |
| Trailing off | 0.85(1.18) | 0.14(0.38) | 5.67 | 196 | <0.000* | 0.46 to 0.95 |
| Word replacement | 1.28(1.37) | 0.44(0.77) | 5.30 | 196 | <0.000* | 0.53 to 1.15 |
| Incomplete words | 5.56(4.05) | 3.11(3.42) | 4.59 | 196 | <0.000* | 1.39 to 3.49 |
| Filler words | 6.47(6.89) | 4.30(3.53) | 2.79 | 196 | 0.006* | 0.64 to 3.71 |
*PrADG Probable Alzheimer’s Disease Group (N=99), HEG Healthy Elderly Group (N=99), SD Standard Deviation, df degree of freedom, CI Confidence Interval
Multiple logistic regression analysis on confounding, syntactic, and lexical features from the PrADG and HEG
| Features |
| S.E | Wald |
| OR | 95% CI or OR |
|---|---|---|---|---|---|---|
| Age | -0.095 | 0.030 | 9.70 | 0.002* | 0.91 | 0.86 to 0.96 |
| Reduced Sentences | 0.185 | 0.083 | 5.02 | 0.025* | 1.20 | 1.02 to 1.41 |
| MLU | 0.300 | 0.142 | 4.49 | 0.034* | 1.35 | 1.02 to 1.78 |
| Trailing off | -1.300 | 0.437 | 8.85 | 0.003* | 0.27 | 0.12 to 0.64 |
| Constant | 2.319 | 2.108 | 1.21 | 0.271 | 10.17 | - |
*PrADG(N=99); HEG(N=99); S.E Standard Error, OR Odds Ratio or Exp(β), CI Confidence Interval
Best hyperparameters found for SVM on PrADG/HEG validation dataset (PrADG=40;HEG=40) with Auto-Weka
| Algorithm | Seed | Training time | Optimisation method | Hyperparameters |
|---|---|---|---|---|
| SVM-top-1000- | 2 | 3 hours | SMAC | -C 1.4786727172414378 -N 1 -K "RBFKernel -G |
| PrADG/HEG | 0.0014243946679106075” |
seed = random integer for randomising the data during training; SMAC is a Bayesian optimisation method proposed as part of Auto-Weka
Classification AUC and standard deviation comparison between the proposed and baseline features on the PrADG/HEG
| Models | AUC(s.d.) |
|---|---|
| 23-syntactic-lexical-only |
|
| 23-syntactic-lexical-Roark-7 |
|
| 11-t-test-syntactic-lexical-sig. |
|
| 3-MLR-syntactic-lexical-sig. | 0.70(3.71) |
| top-combined-1000 |
|
| top-1000-n-gram-only |
|
| Orimaye-5-baseline | 0.75(3.43) |
| Delira-3-baseline | 0.54(4.06) |
| Roark-7-baseline | 0.73(3.53) |
23-syntactic-lexical-only = proposed syntactic and lexical features; 23-syntactic-lexical-Roark-7 = proposed syntactic and lexical features combined with Roark’s Wechsler Logical Memory I 7 significant features; 3-MLR-syntactic-lexical-sig. = MLR significant features; 11-t-test-syntactic-lexical-sig. = t-test significant features; top-combined-1000 = top ranked 1000 features consisting of syntactic, lexical, and n-gram features; top-1000-n-gram-only = top 1000 bigrams and trigrams without syntactic and lexical features
Boldfaced means better results
Statistical analysis of the top 20 n-gram features from the PrADG and HEG based on Student’s t-test
|
| PrADG MEAN(SD) | HEG MEAN(SD) |
| df |
| 95% CI(Difference) |
|---|---|---|---|---|---|---|
| the window | 0.12(0.38) | 0.62(0.82) | -5.45 | 196 | <0.000* | -0.67 to -0.32 |
| mother is | 0.15(0.44) | 0.60(0.70) | -5.37 | 196 | <0.000* | -0.61 to -0.28 |
| be quiet | 0.01(0.10) | 0.22(0.44) | -4.66 | 196 | <0.000* | -0.30 to -0.12 |
| is open | 0.04 (0.24) | 0.31(0.58) | -4.29 | 196 | <0.000* | -0.40 to -0.15 |
| the mother | 0.22(0.46) | 0.61(0.74) | -4.37 | 196 | <0.000* | -0.56 to -0.21 |
| tipping over | 0.00(0.00) | 0.17(0.43) | -3.98 | 196 | <0.000* | -0.26 to -0.09 |
| window is | 0.01(0.10) | 0.19(0.40) | -4.43 | 196 | <0.000* | -0.26 to -0.10 |
| girl is | 0.14(0.40) | 0.44(0.57) | -4.29 | 196 | <0.000* | -0.44 to -0.16 |
| is tipping | 0.00(0.00) | 0.14(0.35) | -4.02 | 196 | <0.000* | -0.21 to -0.07 |
| the window is | 0.01(0.10) | 0.18(0.39) | -4.27 | 196 | <0.000* | -0.25 to -0.09 |
| the mother is | 0.11(0.35) | 0.42(0.61) | -4.45 | 196 | <0.000* | -0.45 to -0.17 |
| of the cookie | 0.08(0.31) | 0.32(0.49) | -4.16 | 196 | <0.000* | -0.36 to -0.13 |
| the stool | 0.33(0.74) | 0.58(0.67) | -2.41 | 196 | 0.017* | -0.44 to -0.04 |
| is overflowing | 0.05(0.33) | 0.22(0.42) | -3.20 | 196 | 0.002* | -0.28 to -0.07 |
| the sink | 0.68(0.91) | 1.17(1.01) | -3.62 | 196 | <0.000* | -0.76 to -0.22 |
| this is | 0.25(0.61) | 0.02(0.14) | 3.68 | 196 | <0.000* | 0.11 to 0.36 |
| cookie out | 0.00(0.00) | 0.11(0.32) | -3.50 | 196 | 0.001* | -0.17 to -0.05 |
| cookie out of | 0.00(0.00) | 0.11(0.32) | -3.50 | 196 | 0.001* | -0.17 to -0.05 |
| a cookie out | 0.00(0.00) | 0.10(0.30) | -3.32 | 196 | 0.001* | -0.16 to -0.04 |
| off the cookie | 0.01(0.10) | 0.14(0.35) | -3.59 | 196 | <0.000* | -0.20 to -0.06 |
*PrADG Probable Alzheimer’s Disease Group (N=99), HEG Healthy Elderly Group (N=99), SD Standard Deviation, df degree of freedom, CI Confidence Interval
Multiple logistic regression analysis on confounding and n-gram features from the PrADG and HEG
| Features |
| S.E | Wald |
| OR | 95% CI of OR |
|---|---|---|---|---|---|---|
| Age | -0.152 | 0.040 | 14.56 | <0.000* | 0.86 | 0.79 to 0.93 |
| the window | 1.946 | 0.519 | 14.08 | <0.000* | 7.00 | 2.53 to 19.35 |
| the mother | 1.526 | 0.674 | 5.13 | 0.024* | 4.60 | 1.23 to 17.23 |
| be quiet | 4.263 | 1.299 | 10.77 | 0.001* | 71.00 | 5.57 to 905.46 |
| girl is | 1.905 | 0.541 | 12.40 | 0.000* | 6.72 | 2.33 to 19.39 |
| this is | -2.967 | 1.256 | 5.58 | 0.018* | 0.05 | 0.00 to 0.60 |
| Constant | 7.742 | 2.512 | 9.50 | 0.002* | 2302.57 | - |
*PrADG(N=99); HEG(N=99); SE Standard Error, OR Odds Ratio or Exp(β), CI Confidence Interval