| Literature DB >> 25521394 |
Piers Johnson, Luke Vandewater, William Wilson, Paul Maruff, Greg Savage, Petra Graham, Lance S Macaulay, Kathryn A Ellis, Cassandra Szoeke, Ralph N Martins, Christopher C Rowe, Colin L Masters, David Ames, Ping Zhang.
Abstract
BACKGROUND: Assessment of risk and early diagnosis of Alzheimer's disease (AD) is a key to its prevention or slowing the progression of the disease. Previous research on risk factors for AD typically utilizes statistical comparison tests or stepwise selection with regression models. Outcomes of these methods tend to emphasize single risk factors rather than a combination of risk factors. However, a combination of factors, rather than any one alone, is likely to affect disease development. Genetic algorithms (GA) can be useful and efficient for searching a combination of variables for the best achievement (eg. accuracy of diagnosis), especially when the search space is large, complex or poorly understood, as in the case in prediction of AD development.Entities:
Mesh:
Year: 2014 PMID: 25521394 PMCID: PMC4290638 DOI: 10.1186/1471-2105-15-S16-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The architecture of the system that combined the GA and LR in our study. The search process involved three principal steps. LR was used with each set of features to make a predictive model for each instance within the possible solutions in GA. The subsequent cycles of the GA search find better solutions that replace less fit solutions found previously. This process is iteratively repeated until the goal solutions are found.
A complete set of neuropsychological tests used in this study.
| Source | Cognitive test - feature | # | Source | Cognitive test - feature | # |
|---|---|---|---|---|---|
| MMSE | Mini Mental State Exam * | 1 | BNT | No Cue Australian | 20 |
| LM | Logical Memory I * | 2 | No Cue US | 21 | |
| Logical Memory II * | 3 | WAIS-III | Digit span | 22 | |
| Logical Memory Pass /Fail * | 4 | Digit symbol-Coding | 23 | ||
| CVLT-II | Total Learning (List A Trials 1-5) | 5 | UK Pred Full Score IQ * | 24 | |
| List A T6 Retention | 6 | US Pred Full Score IQ * | 25 | ||
| List A 30 min Delayed Recall | 7 | Stroop | Dots time | 26 | |
| List A Recognition | 8 | Dots errs * | 27 | ||
| List A False Positives | 9 | Words time | 28 | ||
| Total Recognition d' | 10 | Words errs * | 29 | ||
| RCFT | Rey Complex Figure Copy | 11 | Colours time | 30 | |
| Rey Complex Figure Copy time | 12 | Colours errs * | 31 | ||
| Rey Complex Figure 3 min delay | 13 | C/D | 32 | ||
| Rey Complex Figure 30 min delay | 14 | CDT | Clock score * | 33 | |
| Rey Complex Figure Recog | 15 | WTAR | WTAR IQ score * | 34 | |
| D-KEFS | Letter Fluency | 16 | CDRSoB | CDR Sum of Boxes * | 35 |
| Category Fluency | 17 | HADS | Depression | 36 | |
| Category Switching Total | 18 | Anxiety | 37 | ||
| Category Switching (switches) | 19 | ||||
Each test is given a number (column #) that would be referred in the rest of the paper. * Z scores were not available or not applicable.
Figure 2Diagram representing the GA search. Each solution within a generation is evaluated until the target solution is found. New generation of the population is produced using selection, crossover, and mutation operators that create new solutions. The "best" solution is returned when fitness function reaches target value
Figure 3An example of a 2-point crossover. Two parent solutions exchange segments of their genomes. The swapped segments are from position 1 to Point 1, and from point 2 to position 37 of the genomes. Subsequently, the fitness of each solution is assessed, and if the fitness is improved the new "best" solution is added to the population of solutions.
Figure 4Schematic showing a single point mutation. In this example, position 35 was mutated from 1 to 0. The meaning of this transformation is that the corresponding feature (HADS Anxiety) was selected before the mutation and unselected after mutation.
GA results, HC to MCI or AD Conversion over 36 months.
| Run# | MC_AUC | Number of | Variables | Run # | MC_AUC | Number of | Variables |
|---|---|---|---|---|---|---|---|
| 0.89 | 3 | 3;5;18 | 0.89 | 7 | 3;5;6;15;18;21;35 | ||
| 0.90 | 4 | 1;3;5;18 | 0.88 | 7 | 3;5;6;13;15;18;35 | ||
| 0.88 | 4 | 3;5;6;30 | 0.89 | 7 | 3;5;7;18;20;33;35 | ||
| 0.86 | 4 | 3;6;16;33 | 0.88 | 7 | 3;5;6;16;18;20;35 | ||
| 0.89 | 5 | 1;3;5;6;18 | 0.89 | 7 | 3;5;7;15;18;33;35 | ||
| 0.90 | 5 | 1;3;5;18;29 | 0.89 | 7 | 3;5;7;18;20;35 | ||
| 0.89 | 5 | 1;3;5;18;35 | 0.87 | 7 | 3;5;6;13;15;35;37 | ||
| 0.89 | 5 | 1;3;5;18;35 | 0.88 | 7 | 3;5;7;15;18;21;35 | ||
| 0.89 | 5 | 1;3;5;18;28 | 0.89 | 7 | 2;3;5;7;8;18;35 | ||
| 0.89 | 5 | 3;5;18;22;35 | 0.90 | 7 | 3;5;6;16;17;18;35 | ||
| 0.89 | 5 | 3;5;15;18;35 | 0.88 | 7 | 1;3;5;7;18;33;28 | ||
| 0.89 | 5 | 3;5;6;18;33 | 0.88 | 7 | 1;3;5;7;12;19;33 | ||
| 0.87 | 5 | 1;3;6;18;33 | 0.89 | 7 | 2;3;5;9;15;18;35 | ||
| 0.88 | 5 | 1;3;6;18;20 | 0.89 | 8 | 1;2;3;5;6;15;18;35 | ||
| 0.88 | 5 | 2;3;5;30;35 | 0.88 | 8 | 1;3;5;7;15;18;20;35 | ||
| 0.87 | 5 | 1;3;6;9;16 | 0.89 | 8 | 2;3;5;7;18;20;33;35 | ||
| 0.89 | 6 | 3;5;6;18;20;35 | 0.90 | 8 | 1;3;5;12;13;14;17;18 | ||
| 0.89 | 6 | 3;5;6;18;21;35 | 0.88 | 8 | 1;2;3;5;7;15;18;35 | ||
| 0.89 | 6 | 1;3;5;7;8;18 | 0.89 | 8 | 1;3;5;7;8;18;20;33 | ||
| 0.89 | 6 | 3;5;6;15;18;35 | 0.89 | 8 | 1;3;5;13;14;18;33;35 | ||
| 0.89 | 6 | 3;5;6;18;35 | 0.88 | 8 | 3;7;12;18;24;30;35;36 | ||
| 0.89 | 6 | 3;5;6;18;33;28 | 0.91 | 9 | 1;3;5;7;8;13;16;17;18 | ||
| 0.88 | 6 | 2;3;5;6;15;28 | 0.89 | 9 | 2;3;5;7;8;16;18;20;35 | ||
| 0.88 | 6 | 3;5;7;15;19;23 | 0.90 | 9 | 1;3;5;7;8;13;18;33;29 | ||
| 0.90 | 7 | 1;3;5;16;17;18;35 | 0.90 | 11 | 1;3;5;6;8;12;16;17;18;20;35 | ||
The MC_AUC stands for AUC produced by the LR models for given sets of selected features. Variable sets with variable numbers ranging from 3 to 11 were found by the GA.
Figure 5Frequencies of the variables selected by GA, for prediction of conversion from HC to MCI or AD in 36 months. Four features with frequencies greater than 50% (3, 5, 18 and 35; see Table 1 for details) were selected by the GA (in 50 runs).
AUC values from the LR models with each single variable or with any 2 of the most selected 3 features (variables).
| Variable# | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | 0.69 | 0.71 | 0.80 | 0.59 | 0.77 | 0.78 | 0.78 | 0.54 | 0.75 | 0.70 | 0.61 | 0.58 | 0.59 | 0.58 | 0.57 | 0.54 | 0.54 | 0.71 | 0.71 |
| Variable# | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | |
| AUC | 0.52 | 0.51 | 0.53 | 0.61 | 0.51 | 0.57 | 0.55 | 0.50 | 0.61 | 0.51 | 0.64 | 0.51 | 0.59 | 0.53 | 0.55 | 0.59 | 0.60 | 0.53 | |
| Variable set | 5, 18 | 3, 5 | 3, 18 | ||||||||||||||||
| AUC | 0.80 | 0.85 | 0.82 | ||||||||||||||||
The best value for a single variable was AUC = 0.8, the best value for two variables was AUC = 0.85, while the best value for selected set of small number of features was AUC = 0.90 (see Table 2).
Variable selection from GA for prediction of conversion from HC to MCI/AD, compared with random selections of features.
| # of features | Average AUC | Average AUC |
|---|---|---|
| 3 | 0.89 | 0.69 |
| 4 | 0.88 | 0.71 |
| 5 | 0.89 | 0.74 |
| 6 | 0.89 | 0.75 |
| 7 | 0.89 | 0.77 |
| 8 | 0.89 | 0.77 |
| 9 | 0.90 | 0.77 |
| 11 | 0.90 | 0.79 |
The best performance is associated with sets comprising 3 to 11 features and can be classified as good-to excellent performance (AUC≈0.9). Random selection of features resulted in poor to borderline predictions (AUC<0.79).
GA results, MCI to AD Conversion over 36 months.
| Run | MC_AUC | Number of | Variables | Run | MC_AUC | Number of | Variables |
|---|---|---|---|---|---|---|---|
| 0.86 | 4 | 10;19;24;31 | 0.85 | 6 | 10;15;19;31;34;35 | ||
| 0.82 | 5 | 1;8;10;19;24 | 0.86 | 6 | 1;10;15;19;34;31 | ||
| 0.83 | 5 | 10;18;23;25;31 | 0.86 | 6 | 1;10;15;19;25;31 | ||
| 0.83 | 5 | 1;10;14;19;23 | 0.86 | 6 | 9;15;19;24;31;35 | ||
| 0.83 | 5 | 5;9;24;23;35 | 0.86 | 6 | 9;10;15;18;23;31 | ||
| 0.84 | 5 | 10;19;21;25;31 | 0.86 | 6 | 10;15;19;24;31;35 | ||
| 0.84 | 5 | 10;16;19;20;35 | 0.87 | 6 | 1;10;15;19;24;31 | ||
| 0.85 | 5 | 10;16;19;23;35 | 0.84 | 7 | 8;10;18;25;31;33;34 | ||
| 0.85 | 5 | 10;15;16;18;31 | 0.85 | 7 | 10;16;19;23;25;31;34 | ||
| 0.85 | 5 | 10;19;20;25;31 | 0.85 | 7 | 1;10;15;19;24;31;33 | ||
| 0.85 | 5 | 9;23;25;31;35 | 0.85 | 7 | 1;8;10;15;19;25;31 | ||
| 0.85 | 5 | 10;15;19;31;35 | 0.86 | 7 | 9;10;15;18;24;31;35 | ||
| 0.86 | 5 | 10;19;24;31;35 | 0.86 | 7 | 1;10;15;19;31;34;35 | ||
| 0.86 | 5 | 10;19;24;31;35 | 0.86 | 7 | 1;8;10;15;18;25;31 | ||
| 0.83 | 6 | 5;10;15;19;31;35 | 0.86 | 7 | 1;10;15;19;31;34;35 | ||
| 0.84 | 6 | 1;3;10;19;25;31 | 0.86 | 7 | 1;10;15;19;24;31;35 | ||
| 0.84 | 6 | 1;10;19;24;30;31 | 0.87 | 7 | 1;10;15;19;31;34;35 | ||
| 0.84 | 6 | 1;15;19;20;34;35 | 0.87 | 7 | 1;10;15;19;25;31;35 | ||
| 0.84 | 6 | 10;19;23;24;31;35 | 0.87 | 7 | 1;10;15;19;25;31;35 | ||
| 0.84 | 6 | 3;10;19;23;31;35 | 0.87 | 7 | 1;10;15;19;25;31;35 | ||
| 0.85 | 6 | 1;10;19;23;31;35 | 0.87 | 7 | 10;15;16;18;20;31;35 | ||
| 0.85 | 6 | 10;16;19;24;31;35 | 0.87 | 7 | 1;10;15;19;24;31;35 | ||
| 0.85 | 6 | 9;15;18;23;31;35 | 0.85 | 8 | 3;10;15;16;19;21;25;35 | ||
| 0.85 | 6 | 8;10;15;18;25;31 | 0.86 | 8 | 1;10;15;19;24;31;35;36 | ||
| 0.85 | 6 | 1;10;15;19;23;31 | 0.87 | 8 | 1;10;15;19;24;31;32;35 | ||
The MC_AUC is the accuracy of predictions measured by the AUC value. The best results involve feature sets with 4-8 variables, while longer solutions (more variables in the models) were rejected by the GA selection criteria (worse performance).
Figure 6Frequencies of variables selected by GA, for prediction of conversion from MCI to AD in 36 months. Nine featuress were present in frequencies larger than 20%. The pattern of frequencies is dominated by five features, each present in more than 60% of feature sets.
Variable selection from GA for prediction of conversion from MCI to AD, compared with random selections of features.
| # of features | Average AUC | Average AUC |
|---|---|---|
| 4 | 0.86 | 0.67 |
| 5 | 0.85 | 0.67 |
| 6 | 0.85 | 0.68 |
| 7 | 0.86 | 0.69 |
| 8 | 0.86 | 0.69 |
The best performance is associated with the feature sets of lengths 7 and 8, that can be classified as good performance (0.9>AUC>0.8). Random selection of features resulted in poor prediction (AUC<0.7).
Results from the models selected by GA and from the stepwise algorithm.
| Case | Size | GA | Stepwise | P_value | ||
|---|---|---|---|---|---|---|
| Variables | AUC | Variables | AUC | |||
| HC | 3 | 3;5;18 | 0.89 | 3;5;19 | 0.88 | 0.366 |
| 4 | 1;3;5;18 | 0.90 | 3;5;8;18 | 0.89 | ||
| 5 | 1;3;5;18;29 | 0.90 | 3;5;8;17;18 | 0.89 | ||
| 6 | 1;3;5;7;8;18 | 0.89 | 3;5;8;16;17;18 | 0.90 | ||
| 7 | 3;5;6;16;17;18;35 | 0.90 | 3;5;8;9;16;17;18 | 0.90 | ||
| 8 | 1;3;5;12;13;14;17;18;35 | 0.90 | 3;5;8;9;16;17;18;33 | 0.91 | ||
| 9 | 1;3;5;7;8;13;16;17;18 | 0.91 | 3;5;8;9;16;17;18;33;25 | 0.90 | ||
| 11 | 1;3;5;6;8;12;16;17;18;20;35 | 0.90 | 1;3;5;8;9;16;17;18;25;33;34 | 0.91 | ||
| MCI | 4 | 10;19;24;31 | 0.86 | 10;16;19;35 | 0.83 | 0.002 |
| 5 | 10;19;24;31;35 | 0.86 | 10;16;19;27;35 | 0.80 | ||
| 6 | 1;10;15;19;24;31 | 0.87 | 10;16;19;27;28;35 | 0.80 | ||
| 7 | 1;10;15;19;24;31;35 | 0.87 | 10;16;19;24;27;28;35 | 0.79 | ||
| 8 | 1;10;15;19;24;31;32;35 | 0.87 | 10;16;18;19;24;27;28;35 | 0.77 | ||
No significant difference was observed for use of GA and stepwise algorithm for prediction of progression from HC to MCI/AD, while GA was superior in predicting progression from MCI to AD.
P values of each variable in the different models.
| Conversion from HC to MCI/AD | Conversion from MCI to AD | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| V1 | 0.07 | V1 | 0.170 | ||||||
| V3 | V10 | 0.056 | |||||||
| V5 | V15 | ||||||||
| V7 | V16 | 0.117 | |||||||
| V8 | 0.251 | V18 | 0.245 | ||||||
| V9 | V19 | ||||||||
| V13 | 0.139 | V24 | 0.124 | ||||||
| V16 | 0.109 | 0.087 | V27 | 0.284 | 0.233 | ||||
| V17 | V28 | 0.092 | |||||||
| V18 | V31 | 0.142 | 0.154 | ||||||
| V25 | 0.828 | V32 | 0.549 | ||||||
| V33 | 0.063 | V35 | 0.104 | 0.181 | |||||
The results provide statistical support for our finding that variables 3, 5 and 18 dominate prediction of progression of HC to MCI/AD. The prediction of progression from MCI to AD consistently showed the importance of variable 19, and also of variables 10 and 35. These results indicate that for conversion of MCI to AD, the combinations of variables are more important than the contribution of individual variables.
Mean values of each variable for the different clinical groups.
| Variable | HC | MCI | AD | HC convert to MCI/AD | MCI convert to AD | ||||
|---|---|---|---|---|---|---|---|---|---|
| v1 | 28.84 | 26.20 | 19.10 | 28.96 | 28.23 | <0.001 | 27.37 | 25.79 | 0.002 |
| v2 | 12.90 | 6.51 | 3.24 | 13.24 | 9.84 | <0.001 | 7.53 | 5.91 | 0.034 |
| v3 | 11.38 | 3.79 | 1.02 | 11.80 | 7.10 | <0.001 | 5.03 | 3.04 | 0.012 |
| v4 | 0.92 | 0.31 | 0.09 | 0.07 | 0.27 | <0.001 | 0.60 | 0.70 | 0.498 |
| v5 | 60.55 | 38.30 | 26.56 | 61.35 | 51.23 | <0.001 | 41.07 | 35.40 | 0.001 |
| v6 | 0.87 | -1.37 | -2.21 | 0.95 | -0.03 | <0.001 | -1.18 | -1.81 | 0.001 |
| v7 | 0.79 | -1.62 | -2.53 | 0.89 | -0.05 | <0.001 | -1.38 | -2.02 | 0.002 |
| v8 | 0.09 | -1.16 | -1.90 | 0.12 | 0.21 | 0.269 | -0.98 | -1.22 | 0.247 |
| v9 | -0.21 | 1.16 | 2.14 | -0.26 | 0.58 | <0.001 | 0.72 | 1.65 | 0.001 |
| v10 | 0.46 | -1.21 | -2.06 | 0.52 | -0.10 | 0.000 | -0.70 | -1.52 | <0.001 |
| v11 | -0.54 | -1.49 | -3.23 | -0.49 | -0.84 | 0.052 | -1.42 | -1.61 | 0.364 |
| v12 | -1.01 | -0.78 | -0.49 | -1.00 | -1.00 | 0.481 | -0.76 | -0.76 | 0.488 |
| v13 | 0.48 | -0.82 | -1.92 | 0.50 | 0.06 | 0.019 | -0.52 | -1.13 | 0.015 |
| v14 | 0.53 | -1.02 | -2.15 | 0.56 | 0.06 | 0.016 | -0.58 | -1.39 | 0.010 |
| v15 | 0.31 | -1.15 | -2.91 | 0.33 | 0.34 | 0.490 | -0.36 | -1.88 | <0.001 |
| v16 | 12.04 | 10.04 | 7.32 | 12.17 | 11.06 | 0.066 | 9.23 | 9.57 | 0.338 |
| v17 | 12.38 | 9.03 | 5.31 | 12.50 | 11.87 | 0.120 | 9.23 | 8.83 | 0.305 |
| v18 | 12.13 | 8.25 | 4.46 | 12.33 | 9.71 | 0.000 | 8.87 | 7.57 | 0.054 |
| v19 | 12.15 | 8.60 | 4.89 | 12.34 | 9.87 | 0.001 | 9.33 | 7.59 | 0.015 |
| v20 | 0.74 | 0.18 | -1.12 | 0.78 | 0.72 | 0.329 | 0.44 | 0.03 | 0.047 |
| v21 | 0.72 | 0.27 | -0.82 | 0.76 | 0.66 | 0.260 | 0.50 | 0.27 | 0.206 |
| v22 | 12.00 | 11.11 | 9.10 | 12.11 | 12.06 | 0.465 | 10.73 | 11.09 | 0.285 |
| V23 | 11.68 | 9.85 | 6.53 | 11.76 | 10.65 | 0.019 | 10.13 | 9.33 | 0.131 |
| v24 | 108.33 | 105.92 | 100.84 | 108.46 | 107.32 | 0.227 | 103.47 | 107.53 | 0.072 |
| v25 | 111.62 | 108.91 | 104.22 | 111.82 | 110.32 | 0.132 | 106.60 | 110.32 | 0.072 |
| v26 | -0.05 | 0.48 | 2.10 | -0.03 | 0.08 | 0.280 | 0.63 | 0.38 | 0.237 |
| v27 | 0.02 | 0.08 | 0.28 | 0.03 | 0.06 | 0.277 | 0.26 | 0.04 | 0.145 |
| v28 | 0.06 | 0.94 | 4.57 | 0.02 | 0.64 | 0.035 | 0.90 | 1.09 | 0.362 |
| v29 | 0.08 | 0.17 | 0.64 | 0.07 | 0.23 | 0.147 | 0.19 | 0.29 | 0.254 |
| v30 | -0.34 | 0.35 | 1.83 | -0.36 | 0.13 | 0.004 | 0.31 | 0.39 | 0.424 |
| v31 | 0.73 | 1.55 | 3.08 | 0.67 | 1.06 | 0.105 | 1.00 | 1.76 | 0.074 |
| v32 | -0.32 | 0.05 | 0.57 | -0.34 | -0.02 | 0.046 | -0.10 | 0.02 | 0.299 |
| V33 | 9.75 | 9.31 | 7.22 | 9.79 | 9.90 | 0.083 | 9.60 | 9.33 | 0.117 |
| V34 | 43.12 | 40.48 | 36.32 | 43.31 | 41.84 | 0.111 | 38.40 | 41.79 | 0.076 |
| v35 | 0.04 | 1.21 | 5.74 | 0.03 | 0.15 | 0.025 | 0.73 | 1.37 | <0.001 |
| v36 | 2.62 | 3.71 | 4.06 | 2.56 | 3.07 | 0.116 | 3.90 | 3.24 | 0.148 |
| v37 | 4.33 | 4.94 | 4.98 | 4.28 | 4.20 | 0.428 | 4.97 | 4.52 | 0.238 |
Columns HC, MCI and AD are the mean values of each variable (cognitive test). The other columns are the means values for the groups of converters and non-converters from HC or from MCI with the t-test (chi-square test for variable 4, a dichotomous variable with values Pass and Fail) for the comparison between the 2 groups.