Literature DB >> 27802228

Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study.

Timo Pekkala1, Anette Hall1, Jyrki Lötjönen2,3, Jussi Mattila3, Hilkka Soininen1,4, Tiia Ngandu5, Tiina Laatikainen5,6,7, Miia Kivipelto1,5,8, Alina Solomon1,8.   

Abstract

BACKGROUND AND
OBJECTIVE: This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study.
METHODS: The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC).
RESULTS: AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype.
CONCLUSION: The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions.

Entities:  

Keywords:  Computer-assisted decision making; dementia; prediction; prevention; supervised machine learning

Mesh:

Substances:

Year:  2017        PMID: 27802228      PMCID: PMC5147511          DOI: 10.3233/JAD-160560

Source DB:  PubMed          Journal:  J Alzheimers Dis        ISSN: 1387-2877            Impact factor:   4.472


INTRODUCTION

Dementia prevention is a high public health priority. With many reported modifiable risk factors [1],and several ongoing large multimodal prevention trials [2, 3], the interest in dementia prediction models has grown during the past years. Similarly to risk scores for cardiovascular disease [4], dementia risk scores could be used to identify at-risk individuals who would benefit most from preventive interventions. Dementia risk profiling could additionally facilitate the tailoring of preventive interventions to target the most relevant risk factors for a specific individual or group. Several dementia prediction models have been reported [5, 6]. Model development has been based mainly on a data analytical approach (logistic or Cox proportional hazards regression analyses), and in one case on an Evidence-Based Medicine approach [5, 7]. The increasing number and complexity of factors and biomarkers related to dementia risk, and limitations in visualizing and interpreting individual risk profiles represent major challenges for such methods of developing dementia prediction models. One of the few validated dementia risk scores [8, 9] has already been used to select at-risk elderly from the general population participating in a successful prevention trial [2], and is available for use with both pen-and-paper and computer-based technology (mobile app, online tool) [10]. The usefulness of computerized dementia prediction tools for prevention-related decision-making is only starting to be explored. As comprehensive online prevention research resources and e-Health solutions are starting to be developed for both health care professionals and general public (e.g., Brain Health Registry, multinational data discovery and sharing platforms, internet-based prevention trials [11], clinical decision support systems integratable with electronic health records [12]), it is increasingly important to find suitable methods for developing, updating, and easily visualizing and interpreting complex dementia risk profiles. The Disease State Index (DSI) is a supervised machine learning method designed for practical implementation as a clinical decision support system [12]. DSI has been extensively tested and shown to perform well in the context of improving early diagnosis of Alzheimer’s disease and differential diagnosis of neurodegenerative diseases [12-20]. However, the use of DSI in a public health/dementia prevention context has so far not been investigated, i.e., predicting dementia in a general population without cognitive impairment. Compared to previously used methods for developing dementia risk scores [5], the main strengths of DSI are its ability to deal with larger amounts of heterogeneous data, to handle missing data well, and to use unprocessed data (i.e., without any pre-specified cut-offs for clinical or biomarker variables). In addition, DSI is accompanied by the Disease State Fingerprint (DSF), a method for presenting DSI data in an easily and quickly interpretable visual form. The present study aims to develop a late-life dementia prediction model using DSI in the longitudinal population-based CAIDE study.

MATERIALS AND METHODS

The CAIDE study

The CAIDE study has been previously described in detail [21-23]. In brief, participants were first evaluated at midlife (1972, 1977, 1982, or 1987) in cardiovascular surveys. A random sample of 2,000 individuals aged 65–79 at the end of 1997, and living in or close to Kuopio and Joensuu regions in Eastern Finland were invited for a first late-life re-examination in 1998 (Fig. 1). Altogether 1,449 (72.5%) individuals participated. A second late-life re-examination was conducted in 2005–2008. Of the initial 2,000 persons, 1,426 were still alive and living in the region in the beginning of 2005, and 909 (63.7%) participated. Mean age (SD) was 50.6 (6.0) years at midlife, 71.3 (4.0) years at the first re-examination, and 78.6 (3.7) years at the second re-examination. The CAIDE study was approved by the local ethics committee of Kuopio University Hospital and written informed consent was obtained from all participants.
Fig.1

Formation of the study populations.

In both late-life re-examinations, cognition was assessed using a three-step protocol (screening, clinical, and differential diagnostic phases). In 1998, participants with≤24 points on the Mini-Mental State Examination (MMSE) [24] at screening were referred for further evaluations. In 2005–2008, subjects with≤24 points or decline≥3 points on MMSE, <70% delayed recall in the CERAD word list [25], or with informant concerns about the participant’s cognition were referred for further evaluations. In both re-examinations, the clinical phase included detailed medical and neuropsychological assessments, and the differential diagnostic phase included brain imaging (MRI/CT), blood tests, and if needed cerebrospinal fluid analysis. A review board including the study physician, neuropsychologist, a senior neuropsychologist, and a senior neurologist ascertained the primary diagnosis based on all available information. Dementia and mild cognitive impairment (MCI) diagnoses were made according to established criteria [26-28].

Design of the present study

The present study focused on CAIDE participants without dementia or MCI in 1998 (first late-life re-examination, used here as baseline). The main study population included 709 individuals who also participated in the 2005–2008 re-examination (39 diagnosed with dementia). Mean follow-up (SD) was 8.3 (1.0) years. To account for non-participants/non-survivors in 2005–2008, an extended study population (n = 1,009) was defined using additional data on dementia diagnoses until the end of 2008 from the Hospital Discharge Register, Drug Reimbursement Register and Causes of Death Register [22]. Dementia cases in the extended population (n = 151) were defined according to CAIDE or register diagnoses (CAIDE diagnoses had priority, except when registers indicated dementia diagnoses after the second re-examination and before the end of 2008). Mean follow-up (SD) was 9.0 (1.4) years, and mean time (SD) to dementia diagnosis was 7.1 (1.9) years. Non-participants in 2005–2008 who had died without a recorded dementia diagnosis (n = 244) could not be classified as cases or controls and were excluded. Additionally 13 subjects without cognitive impairment in 1998 who had a dementia diagnosis in any register before the end of 2000 were excluded (they were considered too close to dementia onset).

Factors included in prediction models

Survey methods were carefully standardized and complied with international recommendations [29]. Cognitive performance in 1998 was included in prediction models. Five cognitive domains were assessed as previously described [30]: global cognition (MMSE), episodic memory (mean number of recalled words from three 10-word lists), verbal expression (one-minute animal naming test), psychomotor speed (mean of normalized scores from Letter Digit Substitution and bimanual Purdue Pegboard tests), executive functioning (time difference between the color word interference and naming tasks in the Stroop test), and prospective memory (reminding the investigator to make a phone call at the end of the testing session; score 1–4 from not remembering to remembering without reminders). Vascular factors (blood pressure (BP), body mass index (BMI), waist-hip ratio, total cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides) were assessed at each examination. Assessments from 1998 were included in the basic model. Changes in BP, BMI, and total cholesterol from midlife to the first re-examination in 1998 were included in an additional model. Diagnoses of stroke, transient ischemic attack, myocardial infarction, coronary heart disease, atrial fibrillation, heart failure, or diabetes (Hospital Discharge Register) were combined into a dichotomous comorbidity variable. Other assessments from 1998 used in the present study included data from a self-administered questionnaire on sociodemographic characteristics, medical history and health-related behavior, e.g., leisure-time physical activity, alcohol use, smoking, self-rated health, and fitness, feelings of hopelessness [31], Beck Depression Inventory [32], and Subjective Memory Questionnaire (SMQ) [33]. Apolipoprotein E (APOE) genotypes were assessed from blood leucocytes using polymerase chain reaction and HhaI digestion [34]. APOE was modeled as a dichotomous variable (ɛ4 allele carrier/non-carrier), and also as an ordered variable (genotype 23 < 24 and 33 < 34 < 44) [35, 36].

Disease state index and disease state fingerprint

DSI has been previously described in detail [12, 13]. In brief, DSI is a validated supervised machine learning method that provides numeric index values ranging from 0 to 1. The DSI value is computed by comparing an individual to a previously known population (training data). The DSI value can be interpreted as the share of data corresponding to a subsequent dementia profile. DSI value 0 corresponds to an ideal control, and 1 to an ideal subsequent dementia case. Higher DSI values thus denote greater profile similarity to individuals known to subsequently develop dementia in the training population. DSI values are computed in three steps. First, each measurement is compared with the training data using a monotonically increasing fitness function that provides a likelihood of the measured factor belonging to an individual who will develop dementia. The fitness as a function of measurement value x, is defined as , where FN(x) is the false negative error rate and FP(x) the false positive error rate in the training data, when using x as the classification threshold. Second, the relevance of each measurement is calculated, indicating how well the measurement can discriminate between individuals who will develop dementia and those who will not. Relevance is computed as relevance = sensitivity + specificity - 1, where sensitivity and specificity are obtained by classifying the diagnosed population. Third, fitness and relevance values are combined into a composite factor group DSI value using a weighted average, where the fitness values are weighted according to their relevance: D. The process of evaluating fitness and relevance and combining measurements into a composite group DSI are repeated recursively until an overall DSI value from all available data is obtained for the individual. DSI can process heterogeneous data, and the measured factors/biomarkers are structured into groups, e.g., different cognitive tests into a Cognition group or vascular factors into a Vascular group. A composite DSI value is calculated for each group based on the included individual factors. Grouping is thus useful for assessing the combined effect of conceptually related measurements, and it has other effects such as filtering out noise at group level, and ensuring that strongly correlated factors are not added into the model multiple times. Missing data does not affect model building as long as there is enough data for each factor to give a reliable distribution. The DSF visualization gives a comprehensive overview of an individual’s predictive profile [13], showing which factors are most relevant and to what extent they correspond to a subject who will develop dementia. An example with explanations is shown in the Supplementary Material.

Data analysis

Differences between control and dementia groups were determined with Mann-Whitney U test for continuous or ordinal variables, and χ2 test for other categorical variables. Significance level was set at p < 0.05. Only factors significantly different between control and dementia groups were pre-selected into the DSI model. Additional p-value significance thresholds for selecting factors into the model were also tested to assess effects on predictive performance. Performance of DSI in predicting dementia was evaluated using a stratified cross-validation procedure. Analysis was performed using 50×5-folds. The performance of DSI was measured as the area under the receiver operating characteristic curve (AUC), by averaging AUCs from individual folds. DSI classification results were validated by comparison with a commonly used machine learning model, support vector machine (SVM), using the same data. Analyses were conducted using Matlab R2014a.

RESULTS

Population characteristics

Population characteristics in 1998 by dementia status until the end of 2008 are shown in Table 1. In the main study population, individuals with subsequent dementia were older, had significantly poorer performance on four of the six cognitive tests, had lower systolic blood pressure (SBP) and diastolic blood pressure (DBP), higher frequency of cardio/cerebrovascular comorbidity and the APOE ɛ4 allele, and more pronounced subjective memory complaints (total SMQ score and four items about forgetting phone numbers, clothing size, names of actors, and forgetting what to say in mid-sentence). SBP, DBP, and BMI decreased more between midlife and 1998 in subjects with subsequent dementia compared with controls.
Table 1

Characteristics of the study populations

Main study populationExtended study population
(participants/survivors)
nControlDementiap-valuenControlDementiap-value
Socio-demographic characteristics
Age670/3970.0 (3.4)72.4 (4.1)<0.001858/15170.2 (3.6)72.4 (3.9)<0.001
Education (years)659/399.3 (3.5)9.2 (4.1)0.47845/1489.0 (3.5)9.0 (3.8)0.38
Women670/3964.2%74.4%0.20858/15165.4%66.9%0.72
Cognition
MMSE (0-30p)669/3926.6 (1.6)26.2 (1.8)0.33853/14926.5 (1.7)26.1 (1.9)0.044
Verbal expression670/3921.3 (6.0)19.5 (5.0)0.15854/14820.9 (6.0)19.5 (5.6)0.028
Prospective memory (1-4p)585/342.9 (0.8)2.4 (0.9)0.001756/1292.8 (0.8)2.4 (0.8)<0.001
Episodic memory (0-10p)662/395.4 (1.1)4.8 (1.0)0.002846/1505.3 (1.1)4.7 (1.3)<0.001
Psychomotor speed635/380.02 (0.8)–0.4 (1.0)0.011809/1360.1 (0.8)–0.4 (0.9)<0.001
Executive functioning649/3836.2 (17.5)48.3 (21.5)<0.001824/14337.5 (18.7)46.6 (25.9)<0.001
Vascular & lifestyle factors
SBP (mmHg)669/39151.1 (22.3)140.4 (22.7)0.004857/151150.9 (22.6)152.6 (25.8)0.48
DBP (mmHg)669/3981.3 (10.8)75.4 (10.3)0.002857/15180.9 (11.0)79.9 (11.1)0.40
BMI (kg/m2)670/3927.8 (4.0)27.0 (4.0)0.29858/15127.8 (4.0)27.2 (4.1)0.09
Waist-hip ratio669/390.9 (0.1)0.9 (0.1)0.12852/1500.9 (0.1)0.9 (0.1)0.35
Total cholesterol (mmol/l)668/395.9 (1.0)5.6 (1.1)0.08855/1505.9 (1.0)5.9 (1.0)0.56
HDL (mmol/l)668/391.4 (0.4)1.5 (0.4)0.48855/1501.4 (0.4)1.5 (0.4)0.09
Triglycerides (mmol/l)668/391.5 (0.7)1.4 (0.7)0.38855/1501.5 (0.7)1.5 (0.8)0.35
Physical activity (1-6p)666/392.0 (1.1)2.5 (1.8)0.47849/1492.1 (1.2)2.3 (1.5)0.32
Alcohol use (1-3p)663/391.8 (0.8)1.9 (0.7)0.23848/1481.8 (0.8)1.9 (0.8)0.22
Smoker665/3833.1%29.0%0.60848/14832.9%35.8%0.49
Presence of comorbidity670/3920.9%41.0%0.003858/15121.1%31.1%0.007
APOE genotype
ɛ23/24 or 33/34/44 (N)669/398/63/27/25/41/44/100.001850/1508/62/28/23/48/38/11<0.001
ɛ4 carrier670/3930.4%53.8%0.002858/15131.1%50.3%<0.001
Self-rated health measures
Self-rated health (1-5p)667/392.6 (0.7)2.7 (0.7)0.71852/1502.6 (0.7)2.8 (0.8)0.06
Self-rated fitness (1-5p)666/372.7 (0.7)2.7 (0.8)0.84852/1482.7 (0.7)2.8 (0.8)0.10
Hopelessness (0-8p)624/385.2 (1.8)4.7 (2.1)0.16782/1275.1 (1.8)4.9 (1.8)0.44
BDI (0-63p)572/299.2 (6.2)10.3 (7.3)0.49719/1119.2 (6.3)10.0 (7.0)0.38
Subjective Memory Questionnaire (1-4p/question)
Total score510/2846.1 (8.1)50.1 (9.0)0.021648/9845.7 (8.2)48.1 (10.0)0.020
Forgetting phone numbers659/382.5 (0.8)2.9 (0.7)0.003842/1432.5 (0.8)2.75 (0.77)0.001
Forgetting clothing size632/322.3 (0.9)2.8 (1.1)0.023808/1322.3 (1.0)2.5 (1.0)0.19
Forgetting name of actors638/372.8 (0.8)3.2 (0.8)0.005807/1422.8 (0.8)2.9 (0.8)0.10
Forgetting what to say in mid-sentence662/381.8 (0.6)2.1 (0.7)0.009844/1411.8 (0.6)1.9 (0.7)0.09
Changes in vascular factors (late-lifemidlife)
SBP (mmHg)669/3910.3 (23.2)–1.2 (25.2)0.002857/1519.4 (23.3)7.6 (26.5)0.29
DBP (mmHg)669/39–7.0 (12.1)–13.3 (14.0)0.017857/151–7.4 (12.1)–9.8 (12.7)0.031
Total cholesterol (mmol/l)668/39–0.8 (1.2)–1.0 (1.4)0.24855/150–0.8 (1.2)–0.9 (1.2)0.035
BMI (kg/m2)670/391.6 (2.7)–0.1 (2.3)<0.001858/1511.6 (2.7)0.2 (2.9)<0.001

All shown characteristics were assessed at the first late-life re-examination (1998), except for Changes in vascular factors, which show differences between 1998 and midlife (21 years earlier). Values are means (standard deviations) unless otherwise specified. For Cognition, higher results indicate better performance, except for executive functioning where lower results indicate better performance. Physical activity was assessed as 6 ordered categories: 1 = daily; 2 = 2-3 times a week; 3 = once a week; 4 = 2-3 times a month; 5 = a few times a year; and 6 = not at all. Alcohol use was assessed as 3 ordered categories: 1 = monthly; 2 = less than monthly; and 3 = not at all. Self-rated health and fitness were assessed as 5 ordered categories: 1 = very good; 2 = good; 3 = satisfactory; 4 = relatively poor; and 5 = very poor. For hopelessness, higher score indicated less hopelessness. For BDI (Beck Depression Inventory), higher score indicate more pronounced depressive symptoms. In the Subjective Memory Questionnaire, each question had 4 ordered answer categories: 1 = never; 2 = sometimes; 3 = often; and 4 = almost all the time (i.e., higher score indicates more pronounced memory complaints). Only questions with significant differences between groups are shown here.

In the extended study population, individuals with dementia were older, had significantly poorer performance on all six cognitive tests, higher frequency of cardio/cerebrovascular comorbidity and the APOE ɛ4 allele, and more pronounced subjective memory complaints (total SMQ score and one item about forgetting phone numbers). No differences were found in SBP or DBP. Changes in DBP, total cholesterol, and BMI (but not SBP) between midlife and 1998 were different between controls and subsequent dementia cases.

Performance of DSI in predicting dementia

Table 2 shows AUCs (95% CI) for the composite DSI including factor groups Cognition, Vascular factors, Demographics, Subjective memory questionnaire, and APOE genotype (basic model). The composite DSI achieved an AUC of 0.79 (0.79–0.80) in the main study population, and 0.75 (0.74–0.75) in the extended study population. Training the DSI on the entire main or extended population and using it to classify the same cases yielded AUCs of 0.84 and 0.76, respectively.
Table 2

Performance of DSI, included individual factors and factor groups in predicting dementia

Main study population (participants/survivors)Extended study population
AUC (95% CI)AUC (95% CI)
Basic model
Total DSI0.79 (0.79–0.80)0.75 (0.74–0.75)
Cognition0.73 (0.73–0.74)0.69 (0.69–0.70)
Executive functioning0.68 (0.67–0.69)0.62 (0.62–0.63)
Episodic memory0.64 (0.62–0.65)0.61 (0.61–0.62)
Prospective memory0.62 (0.61–0.63)0.63 (0.62–0.63)
Psychomotor speed0.62 (0.61–0.63)0.67 (0.66–0.68)
MMSE0.54 (0.54–0.55)
Verbal Expression0.55 (0.55–0.56)
Socio-demographic characteristics0.67 (0.65–0.68)0.66 (0.66–0.67)
Age0.67 (0.65–0.68)0.66 (0.66–0.67)
Vascular factors0.65 (0.64–0.66)0.53 (0.52–0.53)
DBP0.64 (0.63–0.65)
SBP0.63 (0.62–0.64)
Presence of comorbidity0.56 (0.55–0.57)0.53 (0.52–0.53)
Subjective Memory Questionnaire0.64 (0.63–0.66)0.58 (0.57–0.58)
Total score0.62 (0.61–0.64)0.57 (0.56–0.58)
Forgetting phone numbers0.61 (0.60–0.62)0.57 (0.56–0.57)
Forgetting name of actors0.60 (0.59–0.61)
Forgetting clothing size0.59 (0.57–0.60)
Forgetting what to say in mid-sentence0.58 (0.57–0.59)
APOE genotype0.59 (0.58–0.60)0.60 (0.59–0.61)
Genotype risk order0.60 (0.59–0.61)0.60 (0.60–0.61)
ɛ4 carrier0.57 (0.55–0.58)0.57 (0.57–0.58)
Basic model + changes in vascular factors from midlife
Total DSI0.80 (0.79–0.81)0.78 (0.77–0.79)
Vascular changes0.68 (0.66–0.69)0.65 (0.64–0.66)
BMI change0.68 (0.67–0.69)0.68 (0.67–0.69)
SBP change0.65 (0.63–0.66)
DBP change0.61 (0.59–0.62)0.61 (0.59–0.62)
Total cholesterol change0.55 (0.54–0.57)

Values are AUC (95% CI) for the composite DSI, factor groups (Cognition, Vascular factors, Demographics, Subjective memory questionnaire, and APOE genotype), and individual factors within each group. In the basic model + changes in vascular factors from midlife, the total DSI value includes all factors and factor groups from the basic model plus the Vascular changes group. Only factors with significant differences between control and dementia groups (as per Table 1 p-values) are shown here.

There was an overall pattern of similar to somewhat lower AUCs for individual factors and factor groups in the extended population compared with the main study population. ROC curves for the composite DSI in both populations are shown in Fig. 2. Accuracy, sensitivity and specificity for different composite DSI cut-off values are shown in Table 3.
Fig.2

ROC curves for the late-life DSI dementia index in the main and extended study populations.

Table 3

Late-life DSI dementia index cut-offs (basic model) with accuracy, sensitivity, specificity, and the percentage of individuals classified as developing dementia in the future

Main study populationExtended study population
Cut-offAccuracySensitivitySpecificityClassifiedAccuracySensitivitySpecificityClassified
dementia + (%)dementia + (%)
0.100.061.000.01990.151.000.00100
0.200.151.000.10910.191.000.0596
0.300.330.930.30720.340.950.2380
0.400.560.850.54480.510.830.4559
0.450.660.820.65380.600.780.5748
0.500.740.730.74290.670.690.6738
0.550.810.610.83200.740.590.7629
0.600.870.450.89130.800.490.8520
0.700.930.240.9750.850.260.958
0.800.940.090.9910.850.070.992
0.900.950.031.0000.850.001.000
Results were validated by comparison with a SVM classification, trained with a linear kernel using the same set of factors and cross-validation procedure. We used the MATLAB fitcsvm function with parameter values that empirically gave the best results (kernel scale 103 and box constraint 10- 3 for both models). Population mean values were used for missing values, and factors were entered into the model as individual standardized values. The SVM achieved an AUC of 0.77 (0.76–0.78) for the main study population, and 0.74 (0.73–0.74) for the extended population, a slightly lower performance compared with DSI. AUC (95% CI) for the composite DSI including the basic model plus changes in vascular factors from midlife to late-life are shown in Table 2. There was a slight increase in AUCs for composite DSI compared with the basic model. AUCs for changes in vascular factors considered together were slightly higher than AUCs for the group of late-life vascular factors, and this difference was most pronounced in the extended study population. Change in BMI had the highest AUC (0.68) for both main and extended study populations.

Sensitivity analyses

Table 4 shows the effects of p-value threshold filtering on the number of factors included in the prediction model, and on AUCs (95% CI) for the composite DSI. Analyses focused on p-values from Mann-Whitney U-tests comparing controls and subsequent dementia cases, and on factors showing significant differences at various p-value thresholds. Results suggest that the model is not improved after adding variables with p > 0.01.
Table 4

Effects of p-value threshold filtering on the number of factors included in the model, and on the predictive performance (AUC) of the DSI dementia index

Main study populationExtended study population
p-value thresholdsNo. of factors includedAUC (95% CI)No. of factors includedAUC (95% CI)
in modelin model
p < 0.000001050.76 (0.75–0.76)
p < 0.00140.76 (0.75–0.78)90.77 (0.76–0.77)
p < 0.01140.82 (0.81–0.83)100.77 (0.76–0.77)
p < 0.05180.80 (0.79–0.81)150.75 (0.75–0.76)
p < 0.1210.79 (0.79–0.80)230.75 (0.74–0.75)
p < 0.2300.79 (0.78–0.80)270.75 (0.74–0.75)
no threshold490.74 (0.73–0.76)490.73 (0.72–0.73)

p-values calculated from Mann-Whitney U-tests comparing controls and subsequent dementia cases were used for the thresholds shown. Only factors showing significant differences between groups below a specific threshold are included in the model and factors not showing significant differences are filtered out of the model.

Additional analyses were conducted to account for previously described J- or U-shaped associations between BMI, BP, cholesterol, and dementia [1] (the current DSI version includes a monotonically increasing fitness function). Dichotomous variables were created for values higher or lower than chosen cut-offs for BMI, BP, and total cholesterol, and the variables were added to the models to investigate the significance of the distribution tails. Several cut-offs were tested, but the combined predictive performance of these variables was low and did not affect the overall performance of the model (results not shown).

DISCUSSION

The late-life DSI dementia index developed using a supervised machine learning method performed well in predicting dementia up to 10 years later in an older general population without MCI or dementia at baseline. Performance was in the upper range of reported performance for previous dementia risk scores [5], and close to the performance level of established risk scores for cardiovascular conditions [4, 37, 38]. The late-life DSI dementia index and midlife CAIDE Dementia Risk Score, both developed within the CAIDE study but with very different methods, had similar predictive power [8, 9]. As emphasized by a recent multidomain vascular care trial to prevent dementia [39], preventive interventions may not be effective in unselected older populations. A risk-based selection could facilitate targeting preventive interventions to individuals who are most likely to benefit. The midlife CAIDE Dementia Risk Score has been used for this purpose in another population-based multidomain lifestyle trial that showed significant beneficial intervention effects on cognitive performance [2]. However, the selection required data pre-processing according to pre-set cut-offs, and additional cognitive testing referenced to population norms (separate from the dementia risk score). The late-life DSI dementia index could facilitate faster and more detailed risk assessment, with easier to interpret individual risk profiles, thus enabling risk-based selection of target populations, and also potential tailoring of preventive interventions based on the most relevant risk factors. Such advantages derive from the ability of DSI to quickly handle large amounts of heterogeneous data in raw form (i.e., as collected from subjects), and the provision of DSI data to human readers in an easily interpretable visual form. While many available classifiers process data as a ‘black box’ requiring machine learning expertise to scrutinize, DSF clearly discloses the factors contributing to the results, and supports clinical judgment by highlighting what is most relevant. Such characteristics are particularly important for dementia risk assessment tools in the context of recent database developments such as large population-based online Brain Health Registries, multinational data discovery and sharing platforms, or internet-based prevention trials [11].

Factors included in the DSI index

A large number of heterogeneous factors were tested in the present study, and DSI performed well in identifying the main types of late-life risk factors related to subsequent dementia: objective and subjective measures of cognition, age, vascular factors, and APOE genotype, in overall agreement with previous studies using other statistical methods [5]. Detailed, factor-specific comparisons with available dementia risk scores are difficult because these have often pre-processed raw data according to different cut-offs, and/or combined variables in different ways, leading to variability in individual factors and their weights. However, some general patterns can be observed. Long-term (i.e., decades) dementia prediction models tend to differ from shorter-term (i.e.,<10 years) prediction models, and they also tend to perform poorly when applied outside the age groups they were designed for [5, 6]. The relatively long pre-clinical stage of dementia-related diseases (e.g., Alzheimer’s disease or cerebrovascular disease) is a major challenge for dementia risk scores, particularly at older ages [5, 6]. The links between risk factors and dementia development can be bidirectional, i.e., a factor may increase dementia risk, but it may also be influenced by ongoing disease processes once the dementia-related disease starts [1]. While the mechanisms are not yet fully clear, a pattern of more pronounced decline in, for example, BP, BMI, and total cholesterol from midlife to late-life has been consistently described in people who subsequently develop dementia [1]. Whereas traditional vascular risk factors (e.g., high BP, BMI, and/or total cholesterol) are important for midlife dementia risk scores, their predictive value decreases in late-life risk scores (some of which may even include low BP and/or low BMI as predictors) [1, 5]. AUCs for the vascular factors group in the DSI dementia index are in agreement with this pattern. Interestingly, group AUCs for changes in vascular factors prior to baseline were slightly higher that group AUCs for vascular factors at baseline in the DSI model. Declining BMI from midlife to late-life was the most important predictor in the vascular changes group, while BMI in late-life was not predictive of subsequent dementia. The predictive value of one-time late-life measurements versus midlife-to-latelife changes has so far not been investigated in late-life dementia risk scores. However, overall performance of the DSI dementia index was not greatly affected by leaving out changes in vascular factors. The most important predictor was cognitive performance, which is perhaps not surprising for late-life dementia risk scores [5]. Cognitive performance was also more predictive of subsequent dementia than age. As our study focused on individuals aged 65–79 years, it remains to be determined whether this finding applies to other age groups or populations. APOE genotype had the lowest AUCs compared to the other groups of factors included in the DSI models. While in some previous dementia prediction models APOE genotype appeared to be somewhat informative, other models have excluded it as not informative enough [5].

Strengths, limitations, and future directions

The main strengths of the present study are the population-based design, long follow-up time, and detailed late-life cognitive assessments at two time points, thus increasing diagnostic accuracy. Mortality and non-participation were at least partly taken into account by including both the main population (survivors/participants) and extended population (additional register dementia diagnoses for non-survivors/non-participants) in analyses. Results for both populations were relatively similar, although in the extended population AUCs tended to be somewhat lower, and some factors were excluded from the models. Individuals who do not participate in studies or die during follow-up usually have poorer health, and are more likely to either develop dementia or die at younger ages, before dementia onset. Although dementia diagnoses in Finnish national registers were accurate (positive predictive values above 90%), their combined sensitivity was around 70% [22], thus underestimating the actual number of cases. Also, individuals who died without recorded dementia diagnoses had to be excluded from analyses. The comorbidity variable used in DSI models was based on Hospital Discharge Register diagnoses, thus including only cardio/cerebrovascular conditions severe enough to require hospitalization (data on pharmacological treatment and conditions diagnosed in outpatient clinics were not available). Also, brain MRI measurements were not included in the present study due to insufficient sample size. A previous late-life risk index including MRI measurements had somewhat better predictive performance (AUC 0.81) [40], but the shorter version without MRI had similar predictive performance to DSI (AUC0.77) [41]. The present study tested many heterogeneous factors, and results from p-value thresholds filtering analyses indicated that the DSI dementia index benefited from selection of factors. DSI was originally built with the assumption that all included factors are already established as likely classifiers, and their effectiveness is ranked by relevance. If several factors with unclear predictive value for dementia are included, the need for factor selection arises. A large amount of poor classifiers with little relevance can overpower the factors with higher relevance and skew the final results. Also, if the training groups are too small, a non-significant difference between controls and cases can lead to a higher relevance by chance. The late-life DSI dementia prediction model was designed for shorter-term dementia prediction (up to 10 years). External validation is needed to verify its predictive performance. Long-term predictive performance will also need to be tested. In addition, analyses of changes in overall risk level over time are essential for determining whether the DSI dementia index can be used for longitudinal risk monitoring and assessing response to preventive interventions.

Conclusion

DSI performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. The DSI dementia index could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions. The detailed and visually easy to interpret individual risk profiles may also facilitate tailoring of preventive interventions based on the most relevant risk factors. Click here for additional data file.

Supplementary Figure

Disease State Fingerprint for an individual who later developed dementia. Low DSI values, depicting similarity to controls, are shown in blue, high DSI values predicting dementia in red. Inconclusive values are shown in white. The size of the box shows relevance, larger boxes contribute more to the total value than small ones. Click here for additional data file.
  36 in total

Review 1.  Cardiovascular risk prediction: basic concepts, current status, and future directions.

Authors:  Donald M Lloyd-Jones
Journal:  Circulation       Date:  2010-04-20       Impact factor: 29.690

2.  Prediction of coronary heart disease using risk factor categories.

Authors:  P W Wilson; R B D'Agostino; D Levy; A M Belanger; H Silbershatz; W B Kannel
Journal:  Circulation       Date:  1998-05-12       Impact factor: 29.690

3.  Advances in the prevention of Alzheimer's disease and dementia.

Authors:  A Solomon; F Mangialasche; E Richard; S Andrieu; D A Bennett; M Breteler; L Fratiglioni; B Hooshmand; A S Khachaturian; L S Schneider; I Skoog; M Kivipelto
Journal:  J Intern Med       Date:  2014-03       Impact factor: 8.989

4.  A disease state fingerprint for evaluation of Alzheimer's disease.

Authors:  Jussi Mattila; Juha Koikkalainen; Arho Virkki; Anja Simonsen; Mark van Gils; Gunhild Waldemar; Hilkka Soininen; Jyrki Lötjönen
Journal:  J Alzheimers Dis       Date:  2011       Impact factor: 4.472

5.  Dementia: Risk prediction models in dementia prevention.

Authors:  Alina Solomon; Hilkka Soininen
Journal:  Nat Rev Neurol       Date:  2015-05-19       Impact factor: 42.937

6.  Determination by PCR-RFLP of apo E genotype in a Japanese population.

Authors:  K Tsukamoto; T Watanabe; T Matsushima; M Kinoshita; H Kato; Y Hashimoto; K Kurokawa; T Teramoto
Journal:  J Lab Clin Med       Date:  1993-04

7.  Hopelessness and risk of mortality and incidence of myocardial infarction and cancer.

Authors:  S A Everson; D E Goldberg; G A Kaplan; R D Cohen; E Pukkala; J Tuomilehto; J T Salonen
Journal:  Psychosom Med       Date:  1996 Mar-Apr       Impact factor: 4.312

8.  The Consortium to Establish a Registry for Alzheimer's Disease (CERAD). Part I. Clinical and neuropsychological assessment of Alzheimer's disease.

Authors:  J C Morris; A Heyman; R C Mohs; J P Hughes; G van Belle; G Fillenbaum; E D Mellits; C Clark
Journal:  Neurology       Date:  1989-09       Impact factor: 9.910

9.  Predicting risk of dementia in older adults: The late-life dementia risk index.

Authors:  D E Barnes; K E Covinsky; R A Whitmer; L H Kuller; O L Lopez; K Yaffe
Journal:  Neurology       Date:  2009-05-13       Impact factor: 9.910

10.  The CAIDE Dementia Risk Score App: The development of an evidence-based mobile application to predict the risk of dementia.

Authors:  Shireen Sindi; Elisabeth Calov; Jasmine Fokkens; Tiia Ngandu; Hilkka Soininen; Jaakko Tuomilehto; Miia Kivipelto
Journal:  Alzheimers Dement (Amst)       Date:  2015-07-02
View more
  8 in total

1.  Prediction models for dementia and neuropathology in the oldest old: the Vantaa 85+ cohort study.

Authors:  Anette Hall; Timo Pekkala; Tuomo Polvikoski; Mark van Gils; Miia Kivipelto; Jyrki Lötjönen; Jussi Mattila; Mia Kero; Liisa Myllykangas; Mira Mäkelä; Minna Oinas; Anders Paetau; Hilkka Soininen; Maarit Tanskanen; Alina Solomon
Journal:  Alzheimers Res Ther       Date:  2019-01-22       Impact factor: 6.982

Review 2.  Machine Learning in Acute Ischemic Stroke Neuroimaging.

Authors:  Haris Kamal; Victor Lopez; Sunil A Sheth
Journal:  Front Neurol       Date:  2018-11-08       Impact factor: 4.003

3.  Predicting Global Cognitive Decline in the General Population Using the Disease State Index.

Authors:  Lotte G M Cremers; Wyke Huizinga; Wiro J Niessen; Gabriel P Krestin; Dirk H J Poot; M Arfan Ikram; Jyrki Lötjönen; Stefan Klein; Meike W Vernooij
Journal:  Front Aging Neurosci       Date:  2020-01-23       Impact factor: 5.750

4.  Detection of child depression using machine learning methods.

Authors:  Umme Marzia Haque; Enamul Kabir; Rasheda Khanam
Journal:  PLoS One       Date:  2021-12-16       Impact factor: 3.240

Review 5.  Artificial Intelligence Models in the Diagnosis of Adult-Onset Dementia Disorders: A Review.

Authors:  Gopi Battineni; Nalini Chintalapudi; Mohammad Amran Hossain; Giuseppe Losco; Ciro Ruocco; Getu Gamo Sagaro; Enea Traini; Giulio Nittari; Francesco Amenta
Journal:  Bioengineering (Basel)       Date:  2022-08-05

6.  Third follow-up of the Cardiovascular Risk Factors, Aging and Dementia (CAIDE) cohort investigating determinants of cognitive, physical, and psychosocial wellbeing among the oldest old: the CAIDE85+ study protocol.

Authors:  Mariagnese Barbera; Jenni Kulmala; Inna Lisko; Eija Pietilä; Anna Rosenberg; Ilona Hallikainen; Merja Hallikainen; Tiina Laatikainen; Jenni Lehtisalo; Elisa Neuvonen; Minna Rusanen; Hilkka Soininen; Jaakko Tuomilehto; Tiia Ngandu; Alina Solomon; Miia Kivipelto
Journal:  BMC Geriatr       Date:  2020-07-10       Impact factor: 3.921

7.  Secular changes in dementia risk indices among 70-year-olds: a comparison of two Finnish cohorts born 20 years apart.

Authors:  Jenni Vire; Marika Salminen; Paula Viikari; Tero Vahlberg; Seija Arve; Matti Viitanen; Laura Viikari
Journal:  Aging Clin Exp Res       Date:  2019-05-04       Impact factor: 3.636

8.  A Risk Prediction Model Based on Machine Learning for Cognitive Impairment Among Chinese Community-Dwelling Elderly People With Normal Cognition: Development and Validation Study.

Authors:  Maritta Välimäki; Hui Feng; Mingyue Hu; Xinhui Shu; Gang Yu; Xinyin Wu
Journal:  J Med Internet Res       Date:  2021-02-24       Impact factor: 5.428

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.