| Literature DB >> 34465384 |
Chonghua Xue1, Cody Karjadi2,3, Ioannis Ch Paschalidis4, Rhoda Au2,3,5,6, Vijaya B Kolachalama7,8,9.
Abstract
BACKGROUND: Identification of reliable, affordable, and easy-to-use strategies for detection of dementia is sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data are not readily available. METHODS ANDEntities:
Keywords: Dementia; Digital health; Machine learning; Neuropsychological testing; Voice recording
Mesh:
Year: 2021 PMID: 34465384 PMCID: PMC8409004 DOI: 10.1186/s13195-021-00888-3
Source DB: PubMed Journal: Alzheimers Res Ther Impact factor: 8.823
Demographics and participant characteristics. For each participant, digital voice recordings of neuropsychological examinations were collected. A, B and C show the demographics of the participants with normal cognition, mild cognitive impairment, and dementia, respectively, at the time of the voice recordings. Here, N represents the number of unique participants. The mean age (± standard deviation) is reported at the time of the recordings. Mean MMSE scores (± standard deviation) were computed closest to the time of the voice recording. For cognitively normal participants, ApoE data was unavailable for one Generation 1 (Gen 1) participant and eight Generation (Gen) 2 participants; MMSE data was not collected on Generation (Gen) 3 participants. For MCI participants, ApoE data was unavailable for one Gen 1 participant, six Gen 2 participants, and one New Offspring Spouse Cohort (NOS) participant; MMSE data was also not collected for OmniGen2 and NOS participants and was not available for one Gen 1 participant. For demented participants, ApoE data was unavailable for six Gen 1 participants and three Gen 2 participants; MMSE data was not collected for Gen 3, OmniGen2, and NOS participants and not available for one Gen 1 participant
| Cohort | Female | ApoE4+ | Recordings | Age (years) | Mean MMSE | |
|---|---|---|---|---|---|---|
| Gen 1 | 42 | 28 | 6 | 75 | 90.9 ± 3.0 | 27.5 ± 2.0 |
| Gen 2 | 238 | 117 | 40 | 392 | 73.9 ± 7.8 | 28.1 ± 2.0 |
| Gen 3 | 4 | 1 | 0 | 7 | 60.4 ± 11.5 | NA |
| OmniGen 1 | 7 | 2 | 2 | 9 | 71.4 ± 9.3 | 26.6 ± 2.6 |
| Total | 291 | 148 | 48 | 483 | 76.3 ± 9.8 | 27.9 ± 2.0 |
| Gen 1 | 64 | 47 | 13 | 85 | 91.4 ± 3.1 | 26.0 ± 2.4 |
| Gen 2 | 235 | 124 | 67 | 353 | 79.3 ± 6.8 | 27.2 ± 2.2 |
| Gen 3 | 1 | 0 | 0 | 1 | 60.0 ± 0.0 | NA |
| OmniGen 1 | 6 | 1 | 1 | 7 | 73.1 ± 6.8 | 25.3 ± 2.3 |
| OmniGen 2 | 1 | 1 | 1 | 1 | 74.0 ± 0.0 | NA |
| NOS | 2 | 1 | 0 | 4 | 86.5 ± 5.3 | NA |
| Total | 309 | 174 | 82 | 451 | 81.5 ± 8.0 | 26.9 ± 2.3 |
| Gen 1 | 78 | 56 | 16 | 99 | 92.2 ± 3.1 | 21.3 ± 5.7 |
| Gen 2 | 139 | 84 | 40 | 224 | 82.1 ± 6.7 | 23.3 ± 5.5 |
| Gen 3 | 1 | 0 | 0 | 1 | 80.0 ± 0.0 | NA |
| OmniGen 1 | 3 | 1 | 0 | 4 | 71.5 ± 1.7 | 24.8 ± 1.5 |
| OmniGen 2 | 1 | 1 | 1 | 1 | 77.0 ± 0.0 | NA |
| NOS | 1 | 1 | 0 | 1 | 80.0 ± 0.0 | NA |
| Total | 223 | 143 | 57 | 330 | 84.9 ± 7.6 | 22.7 ± 5.6 |
Fig. 1Time spent on the neuropsychological tests. Boxplots showing the time spent by the FHS participants on each neuropsychological test. For each test, the boxplots were generated on participants with normal cognition (NC), those with mild cognitive impairment (MCI), and those who had dementia (DE); those who were non-demented (NDE) combined the NC and MCI individuals. We also indicated the number of recordings that were processed to generate each boxplot. We also computed pairwise statistical significance between two groups (NC vs. MCI, MCI vs. DE, NC vs. DE, and DE vs. NDE). We evaluated the differences in means of the durations of all three cognitive statuses using a pairwise t-test. The symbol “*” indicates statistical significance at p < 0.05, the symbol “**” indicates statistical significance at p < 0.01, the symbol “***” indicates statistical significance at p < 0.001, and “n.s.” indicates p > 0.05. Logical Memory (LM) tests with a (†) symbol denote that an alternative story prompt was administered for the test. It is possible that one participant may receive a prompt under each of the LM recall conditions (one recording). Because many neuropsychological tests were administered on the participants, we chose a representation scheme that combined colors and hatches. The colored hatches were used to represent each individual neuropsychological test and this information was used to aid visualization in subsequent figures
Time spent on the neuropsychological tests. Average time spent (± standard deviation) by the FHS participants on each neuropsychological test is shown. For each test, average values (± standard deviation) were computed on participants with normal cognition (NC), those with mild cognitive impairment (MCI), and those who had dementia (DE); the no dementia group (NDE) combined the NC and MCI individuals. All reported time values are in minutes. Logical Memory (LM) tests with a (†) symbol denote that an alternative story prompt was administered for the test. It is possible that one participant may receive a prompt under each of the LM recall conditions (one recording)
| NC | MCI | NDE | DE | |
|---|---|---|---|---|
| 244.4 ± 151.0 | 178.7 ± 106.4 | 231.0 ± 144.4 | 295.5 ± 163.2 | |
| 135.0 ± 28.6 | 116.9 ± 19.5 | 131.6 ± 27.8 | 128.9 ± 38.5 | |
| 123.0 ± 19.3 | ||||
| 219.3 ± 79.2 | 190.8 ± 37.9 | 213.9 ± 73.8 | 236.1 ± 67.9 | |
| 367.6 ± 80.4 | 366.4 ± 65.3 | 367.4 ± 77.3 | 414.3 ± 155.1 | |
| 115.9 ±31.4 | 100.1 ± 32.3 | 113.0 ± 31.9 | 107.5 ± 36.7 | |
| 109.0 ±42.5 | 128.2 ± 44.7 | 112.6 ± 43.1 | 132.6 ± 52.6 | |
| 86.3 ± 37.8 | 57.1 ± 20.2 | 80.8 ± 36.8 | 54.0 ± 25.3 | |
| 33.0 ± 7.1 | ||||
| 89.4 ± 18.3 | 103.8 ± 31.2 | 91.9 ± 21.4 | 138.9 ± 62.3 | |
| 159.5 ± 14.8 | ||||
| 140.6 ± 73.2 | 114.8 ± 28.3 | 136.0 ± 67.9 | 79.1 ± 48.2 | |
| 63.5 ± 26.5 | 60.9 ± 27.8 | 63.0 ± 26.5 | 77.5 ±37.1 | |
| 66.2 ± 23.5 | 85.6 ± 43.7 | 69.5 ± 28.4 | 90.3 ± 48.7 | |
| 75.8 ± 21.6 | 90.0 ± 26.4 | 78.3 ± 22.9 | 126.0 ± 52.2 | |
| 227.8 ± 89.6 | 217.3 ± 79.5 | 225.9 ± 87.3 | 228.0 ± 120.4 | |
| 80.9 ± 31.2 | 83.3 ± 29.1 | 81.3 ± 30.5 | 133.0 ± 82.2 | |
| 325.5 ± 39.0 | 330.0 ± 29.2 | 326.3 ± 37.2 | 336.1 ± 65.4 | |
| 405.9 ± 176.8 | 321.2 ± 93.8 | 390.7 ± 167.4 | 611.1 ± 260.2 | |
| 70.1 ± 33.5 | 54.7 ± 14.0 | 67.0 ± 31.1 | 90.9 ± 44.4 | |
| 115.4 ± 58.3 | 105.3 ± 23.7 | 113.5 ± 53.3 | 199.0 ± 107.1 | |
| 219.2 ± 107.6 | 241.0 ± 117.5 | 223.4 ± 108.5 | 431.2 ± 278.0 | |
| 119.7 ± 36.8 | 115.1 ± 52.3 | 118.8 ± 39.7 | 142.8 ± 70.4 | |
| 398.5 ± 182.1 | 284.1 ± 90.8 | 376.6 ± 173.7 | 424.6 ± 187.4 | |
| 357.3 ± 213.6 | 389.7 ± 145.0 | 360.9 ± 205.1 | 487.3 ± 214.3 | |
| 294.4 ± 106.9 | 323.8 ± 87.6 | 300.3 ± 102.3 | 289.0 ± 130.6 | |
| 392.7 ± 181.2 | 335.8 ± 156.9 | 385.1 ± 176.7 | 372.7 ± 153.6 | |
| 198.9 ± 75.3 | 244.6 ± 101.7 | 206.5 ± 79.6 | 186.9 ± 147.0 | |
| 204.6 ± 76.1 | 204.6 ± 76.1 | 269.5 ± 116.2 | ||
| 125.0 ± 75.7 | 125.0 ± 75.7 | 90.6 ± 78.7 | ||
| 67.4 ± 29.8 | 67.4 ± 29.8 | 91.4 ± 122.7 | ||
| 47.6 ± 40.6 | 51.8 ± 20.5 | 48.5 ± 37.2 | 81.6 ± 71.5 | |
| 44.0 ± 26.9 | 37.9 ± 7.6 | 42.8 ± 24.5 | 75.3 ± 59.3 | |
| 53.1 ± 66.0 | 53.1 ± 66.0 | 187.4 ± 84.2 | ||
| 341.2 ± 80.3 | 341.2 ± 80.3 | 202.5 ± 71.4 |
Fig. 2Schematics of the deep learning frameworks. A The hierarchical long short-term memory (LSTM) network model that encodes an entire audio file into a single vector to predict dementia status on the individuals. All LSTM cells within the same row share the parameters. Note that the hidden layer dimension is user-defined (e.g., 64 in our approach). B Convolutional neural network that uses the entire audio file as the input to predict the dementia status of the individual. Each convolutional block reduces the input length by a common factor (e.g., 2) while the very top layer aggregates all remaining vectors into one by averaging them
Fig. 3Receiver operating characteristic (ROC) and precision-recall (PR) curves of the deep learning models. The long short-term memory (LSTM) network and the convolutional neural network (CNN) models were constructed to classify participants with normal cognition and dementia as well as participants who are non-demented and the ones with dementia, respectively. On each model, a 5-fold cross-validation was performed and the model predictions (mean ± standard deviation) were generated on the test data (see Figure S1), followed by the creation of the ROC and PR curves. Plots A and B denote the ROC and PR curves for the LSTM and the CNN models for the classification of normal versus demented cases. Plots C and D denote the ROC and PR curves for the LSTM and CNN models for the classification of non-demented versus demented cases
Performance of the deep learning models. The long short-term memory (LSTM) network and the convolutional neural network (CNN) models were constructed to classify participants with normal cognition and dementia as well as participants who are non-demented and the ones with dementia, respectively. On each model, a 5-fold cross-validation was performed and the model predictions (mean ± standard deviation) were generated on the test data (see Figure S1). A and B report the performances of the LSTM and the CNN models for the classification of participants with normal cognition versus those with dementia. C and D report the performances of the LSTM and the CNN models for the classification of participants who are non-demented versus those who have dementia
| | 0.581 ± 0.039 | 0.578 ± 0.037 | 0.593 ± 0.051 | |
| | 0.642 ± 0.029 | 0.641 ± 0.027 | 0.647 ± 0.027 | |
| | 0.420 ± 0.065 | 0.412 ± 0.067 | 0.442 ± 0.093 | |
| | 0.865 ± 0.022 | 0.859 ± 0.034 | 0.824 ± 0.025 | |
| | 0.844 ± 0.019 | 0.846 ± 0.025 | 0.824 ± 0.010 | |
| | 0.558 ± 0.061 | 0.551 ± 0.062 | 0.575 ± 0.083 | |
| | 0.573 ± 0.046 | 0.569 ± 0.046 | 0.586 ± 0.061 | |
| | 0.294 ± 0.050 | 0.294 ± 0.049 | 0.294 ± 0.046 | |
| | 0.814 ± 0.016 | 0.803 ± 0.029 | 0.805 ± 0.022 | |
| | 0.742 ± 0.017 | 0.737 ± 0.020 | 0.740 ± 0.017 | |
| | 0.666 ± 0.035 | 0.674 ± 0.052 | 0.710 ± 0.021 | |
| | 0.587 ± 0.054 | 0.650 ± 0.035 | 0.698 ± 0.015 | |
| | 0.738 ± 0.118 | 0.740 ± 0.045 | 0.735 ± 0.094 | |
| | 0.300 ± 0.160 | 0.562 ± 0.095 | 0.656 ± 0.038 | |
| | 0.691 ± 0.036 | 0.750 ± 0.025 | 0.792 ± 0.013 | |
| | 0.769 ± 0.028 | 0.738 ± 0.064 | 0.765 ± 0.023 | |
| | 0.623 ± 0.061 | 0.672 ± 0.047 | 0.712 ± 0.019 | |
| | 0.207 ± 0.106 | 0.308 ± 0.077 | 0.389 ± 0.034 | |
| | 0.743 ± 0.038 | 0.801 ± 0.024 | 0.837 ± 0.012 | |
| | 0.640 ± 0.054 | 0.716 ± 0.038 | 0.759 ± 0.019 | |
| | 0.651 ± 0.016 | 0.659 ± 0.022 | 0.648 ± 0.023 | |
| | 0.651 ± 0.016 | 0.659 ± 0.022 | 0.648 ± 0.023 | |
| | 0.576 ± 0.048 | 0.565 ± 0.062 | 0.556 ± 0.059 | |
| | 0.726 ± 0.031 | 0.753 ± 0.024 | 0.740 ± 0.035 | |
| | 0.677 ± 0.016 | 0.694 ± 0.012 | 0.680 ± 0.025 | |
| | 0.621 ± 0.027 | 0.621 ± 0.040 | 0.610 ± 0.038 | |
| | 0.649 ± 0.016 | 0.655 ± 0.024 | 0.644 ± 0.025 | |
| | 0.306 ± 0.031 | 0.324 ± 0.040 | 0.302 ± 0.046 | |
| | 0.685 ± 0.012 | 0.682 ± 0.019 | 0.670 ± 0.025 | |
| | 0.720 ± 0.013 | 0.726 ± 0.009 | 0.711 ± 0.019 | |
| | 0.555 ± 0.022 | 0.624 ± 0.030 | 0.628 ± 0.042 | |
| | 0.555 ± 0.023 | 0.623 ± 0.030 | 0.627 ± 0.042 | |
| | 0.546 ± 0.101 | 0.486 ± 0.076 | 0.457 ± 0.106 | |
| | 0.447 ± 0.188 | 0.701 ± 0.065 | 0.769 ± 0.038 | |
| | 0.543 ± 0.011 | 0.646 ± 0.034 | 0.674 ± 0.053 | |
| | 0.576 ± 0.120 | 0.563 ± 0.063 | 0.560 ± 0.068 | |
| | 0.528 ± 0.035 | 0.619 ± 0.030 | 0.619 ± 0.045 | |
| | 0.128 ± 0.055 | 0.253 ± 0.062 | 0.265 ± 0.085 | |
| | 0.597 ± 0.041 | 0.643 ± 0.033 | 0.655 ± 0.044 | |
| | 0.595 ± 0.043 | 0.663 ± 0.033 | 0.683 ± 0.037 | |
Salient administered fractions derived from the CNN model. The average salient administered fraction (SAF) and standard deviation for true positive (SAF[+]) and true negative (SAF[−]) cases are listed in descending order based on the SAF[+] value. SAF[+] is calculated by summing up the time spent in a given neuropsychological test that intersects with a segment of time that is DE[+] salient and dividing by the total time spent in a given neuropsychological test. SAF[−] is calculated by summing up the time spent in a given neuropsychological test that intersects with a segment of time that is not DE[+] salient and dividing by the total time spent in a given neuropsychological test. The number of samples for SAF[+] and SAF[−] indicate the number of true positive and true negative recordings that contain each neuropsychological test
| Test | SAF[+] | SAF[−] | SAF[+] samples | SAF[−] samples |
|---|---|---|---|---|
| 0.88 ± 0.26 | 0.32 ± 0.42 | 37 | 32 | |
| 0.83 ± 0.22 | 0.23 ± 0.22 | 51 | 33 | |
| 0.82 ± 0.18 | 0.30 ± 0.29 | 47 | 31 | |
| 0.78 ± 0.37 | 0.16 ± 0.33 | 53 | 33 | |
| 0.76 ± 0.36 | 0.57 ± 0.43 | 30 | 27 | |
| 0.73 ± 0.28 | 0.36 ± 0.30 | 36 | 16 | |
| 0.72 ± 0.41 | 0.33 ± 0.41 | 38 | 32 | |
| 0.71 ± 0.33 | 0.47 ± 0.33 | 11 | 15 | |
| 0.70 ± 0.42 | 0.36 ± 0.43 | 53 | 32 | |
| 0.69 ± 0.33 | 0.22 ± 0.35 | 55 | 25 | |
| 0.68 ± 0.30 | 0.65 ± 0.35 | 34 | 28 | |
| 0.66 ± 0.42 | 0.72 ± 0.40 | 56 | 32 | |
| 0.66 ± 0.34 | 0.39 ± 0.27 | 49 | 31 | |
| 0.66 ± 0.38 | 0.56 ± 0.34 | 50 | 32 | |
| 0.66 ± 0.48 | 0.48 ± 0.51 | 24 | 25 | |
| 0.65 ± 0.44 | 0.74 ± 0.36 | 50 | 32 | |
| 0.64 ± 0.41 | 0.65 ± 0.39 | 19 | 14 | |
| 0.64 ± 0.27 | 0.70 ± 0.33 | 26 | 14 | |
| 0.63 ± 0.45 | 0.49 ± 0.49 | 26 | 26 | |
| 0.60 ± 0.47 | 0.44 ± 0.48 | 41 | 30 | |
| 0.58 ± 0.29 | 0.48 ± 0.26 | 39 | 28 | |
| 0.57 ± 0.42 | 0.87 ± 0.28 | 50 | 32 | |
| 0.55 ± 0.35 | 0.85 ± 0.21 | 48 | 32 | |
| 0.52 ± 0.41 | 0.51 ± 0.44 | 40 | 28 | |
| 0.52 ± 0.47 | 0.77 ± 0.33 | 40 | 30 | |
| 0.44 ± 0.45 | 0.36 ± 0.47 | 41 | 26 | |
| 0.39 ± 0.45 | 0.76 ± 0.37 | 47 | 31 |
Fig. 4Saliency maps highlighted by the CNN model. A This key is a representation that maps the colored hatches to the neuropsychological tests. B Saliency map representing a recording (62 min in duration) of a participant with normal cognition (NC) that was classified as NC by the convolutional neural network (CNN) model. C Saliency map representing a recording (94 min in duration) of a participant with dementia (DE) who was classified with dementia by the CNN model. For both B and C, the colormap on the left half corresponds to a neuropsychological test. The color on the right half represents the DE[+] value, ranging from dark blue (low DE[+]) to dark red (high DE[+]). Each DE[+] rectangle represents roughly 2 min and 30 s