| Literature DB >> 29907747 |
Tingyan Wang1,2, Robin G Qiu3, Ming Yu1.
Abstract
The number of service visits of Alzheimer's disease (AD) patients is different from each other and their visit time intervals are non-uniform. Although the literature has revealed many approaches in disease progression modeling, they fail to leverage these time-relevant part of patients' medical records in predicting disease's future status. This paper investigates how to predict the AD progression for a patient's next medical visit through leveraging heterogeneous medical data. Data provided by the National Alzheimer's Coordinating Center includes 5432 patients with probable AD from August 31, 2005 to May 25, 2017. Long short-term memory recurrent neural networks (RNN) are adopted. The approach relies on an enhanced "many-to-one" RNN architecture to support the shift of time steps. Hence, the approach can deal with patients' various numbers of visits and uneven time intervals. The results show that the proposed approach can be utilized to predict patients' AD progressions on their next visits with over 99% accuracy, significantly outperforming classic baseline methods. This study confirms that RNN can effectively solve the AD progression prediction problem by fully leveraging the inherent temporal and medical patterns derived from patients' historical visits. More promisingly, the approach can be customarily applied to other chronic disease progression problems.Entities:
Mesh:
Year: 2018 PMID: 29907747 PMCID: PMC6003986 DOI: 10.1038/s41598-018-27337-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The distribution of time intervals of two successive medical visits.
Ratios of Global CDR scores that are changed between two consecutive visits.
| Global CDR score |
|
|
|
|
|
|---|---|---|---|---|---|
|
| 0.5286 | 0.4320 | 0.0384 | 0.0010 | 0 |
|
| 0.0201 | 0.6304 | 0.3151 | 0.0299 | 0.0045 |
|
| 0.0004 | 0.0488 | 0.6134 | 0.2982 | 0.0392 |
|
| 0 | 0.0023 | 0.0454 | 0.6165 | 0.3358 |
|
| 0 | 0 | 0.0019 | 0.0254 | 0.9727 |
Figure 2The distribution of patients at different stages with respect to their first visits and last visits.
Missing data analysis.
| Ratio of missing data | Number of Variables |
|---|---|
| < = 5% | 29 |
| 5% < and < = 10% | 21 |
| 10% < and < = 15% | 4 |
| 15% < and < = 20% | 6 |
| 20% < and < = 25% | 6 |
| 25% < and < = 30% | 13 |
Imputation schemes for missing data.
| Characteristics of variables | Continuous variables | Ordinal variables | Nominal variables |
|---|---|---|---|
| The variable would not change with different visits for a patient | Imputed with the mean value of first visits of all the patients | Imputed with the median value of first visits of all the patients | Imputed with the mode value of first visits of all the patients |
| The variable would change with different visits for a patient | Imputed with the mean value of all the visits of all the patients | Imputed with the median value of all the visits of all the patients | Imputed with the mode value of all the visits of all the patients |
| The variable would change with different visits for a patient and is only related to a specific patient | Imputed with the mean value of other visits of the same patient | Imputed with the median value of other visits of the same patient | Imputed with the mode value of other visits of the same patient |
Figure 3The architecture of the proposed RNN model for AD stage prediction.
Figure 4A LSTM module in the proposed RNN model.
The performance comparison of the proposed models and the baseline models.
| Models | Accuracy | PPIA | SPIA |
|---|---|---|---|
| LSTM with TI | |||
| LSTM w/o TI | 0.9843 ± 0.0057 | 0.9792 ± 0.0117 | 0.9849 ± 0.0053 |
| LR with average aggregation | |||
| LR with two most recent visits | 0.6652 ± 0.0162 | 0.5057 ± 0.0409 | 0.6674 ± 0.0163 |
| LR with the most recent visit | 0.6803 ± 0.0243 | 0.5209 ± 0.0370 | 0.6825 ± 0.0243 |
| SVM with average aggregation | |||
| SVM with two most recent visits | 0.6533 ± 0.0165 | 0.4825 ± 0.0445 | 0.6552 ± 0.0163 |
| SVM with the most recent visit | 0.6746 ± 0.0209 | 0.4931 ± 0.0245 | 0.6757 ± 0.0208 |
| DT with average aggregation | |||
| DT with two most recent visits | 0.5810 ± 0.0199 | 0.4463 ± 0.0470 | 0.5829 ± 0.0196 |
| DT with the most recent visit | 0.5916 ± 0.0204 | 0.4705 ± 0.0458 | 0.5934 ± 0.0196 |
| RF with average aggregation | |||
| RF with two most recent visits | 0.6373 ± 0.0181 | 0.4517 ± 0.0430 | 0.6399 ± 0.0179 |
| RF with the most recent visit | 0.6416 ± 0.0183 | 0.4570 ± 0.0422 | 0.6441 ± 0.0186 |
LSTM with TI and LSTM w/o TI are implemented based on the dataset of patients with more than 3 visits. Baseline models with average aggregation are trained with aggregated features derived from the longitudinal data. Baseline models with two most recent visits are trained directly with the information of the visit and the th visit among historical visits. Baseline models with the most visit are trained directly with the Nth visit. LSTM with TI model and all the baseline models are trained with time intervals, while LSTM w/o TI is trained without time intervals. Note that the results presented here are mean values and the standard deviation values of the 10-fold cross validation, and the performances of each fold are provided in the Supplementary Tables S2–S7.
Results of the LSTM with TI model trained with different combinations of feature categories.
| Models | Accuracy | PPIA | SPIA |
|---|---|---|---|
| Full Model | |||
| Model without CDR | 0.9557 ± 0.0121 | 0.9470 ± 0.0273 | 0.9565 ± 0.0125 |
| Model without GDS | 0.9900 ± 0.0057 | 0.9832 ± 0.0103 | 0.9906 ± 0.0054 |
| Model without FAQ | 0.9615 ± 0.0140 | 0.9429 ± 0.0262 | 0.9624 ± 0.0143 |
| Model without CDR, GDS | 0.9699 ± 0.0094 | 0.9637 ± 0.0154 | 0.9708 ± 0.0091 |
| Model without CDR, FAQ | 0.7191 ± 0.0282 | 0.6867 ± 0.0471 | 0.7207 ± 0.0278 |
| Model without GDS, FAQ | 0.9689 ± 0.0144 | 0.9620 ± 0.0229 | 0.9698 ± 0.0141 |
| Model without CDR, GDS, FAQ | 0.7148 ± 0.0355 | 0.6868 ± 0.0437 | 0.7147 ± 0.0353 |
Note that the results presented here are mean values and the standard deviation values of the metrics for the 10-fold cross validation, and the performances of each fold are provided in the Supplementary Tables S8–S14.
Improvement by incorporating a specific feature of CDR/FAQ category compared with the basic model.
| Category | Incorporated Feature | ΔAccuracy | ΔPPIA | ΔSPIA |
|---|---|---|---|---|
| CDR | HOMEHOBB |
|
|
|
| CDR | COMMUN |
|
|
|
| CDR | ORIENT |
|
|
|
| CDR | JUDGMENT |
|
|
|
| CDR | CDRSUM |
|
|
|
| CDR | MEMORY | 0.0905 | 0.1083 | 0.0920 |
| FAQ | GAMES | 0.0844 | 0.1017 | 0.0825 |
| CDR | PERSCARE | 0.0817 | 0.1001 | 0.0820 |
| FAQ | BILLS | 0.0776 | 0.0990 | 0.0756 |
| FAQ | REMDATES | 0.0764 | 0.0929 | 0.0728 |
| FAQ | MEALPREP | 0.0731 | 0.0918 | 0.0723 |
| FAQ | TAXES | 0.0729 | 0.0914 | 0.0717 |
| FAQ | STOVE | 0.0717 | 0.0881 | 0.0679 |
| FAQ | TRAVEL | 0.0686 | 0.0825 | 0.0647 |
| FAQ | SHOPPING | 0.0631 | 0.0773 | 0.0621 |
| FAQ | PAYATTN | 0.0497 | 0.0650 | 0.0548 |
| CDR | COMPORT | 0.0350 | 0.0476 | 0.0329 |
| FAQ | EVENTS | 0.0311 | 0.0472 | 0.0321 |
| CDR | CDRLANG | 0.0225 | 0.0366 | 0.0147 |
Note that the results presented here are the mean differences from their 10-fold cross-validation runs. The mean values and the standard deviations for the 10-fold cross-validation runs are provided in the slementary Table S15.