| Literature DB >> 36163029 |
Marcos D Machado-Fragua1, Benjamin Landré2, Mathilde Chen2, Aurore Fayosse2, Aline Dugravot2, Mika Kivimaki3, Séverine Sabia2,3, Archana Singh-Manoux2,3.
Abstract
BACKGROUND: Age is the strongest risk factor for dementia and there is considerable interest in identifying scalable, blood-based biomarkers in predicting dementia. We examined the role of midlife serum metabolites using a machine learning approach and determined whether the selected metabolites improved prediction accuracy beyond the effect of age.Entities:
Keywords: Biomarkers; C-statistic; Dementia; Longitudinal study; Metabolites; Predictive accuracy; Risk score
Mesh:
Substances:
Year: 2022 PMID: 36163029 PMCID: PMC9513883 DOI: 10.1186/s12916-022-02519-6
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 11.150
Fig. 1Scheme of the repeated nested cross-validation procedure. The following procedure was repeated 100 times to account for variation in results due to random partitioning of the cross-validation folds. The steps in the analyses are (1) Partition dataset into 10 outer folds with the same dementia rate in each fold. (2) Further partition each training outer fold (blue boxes) into 5 inner folds (same dementia rate) to build the inner loop. Grey boxes represent the validation folds (outer loop) which are not involved in the inner loop. 3) Use inner folds to tune the hyperparameters, select best combination of α and λ (model with the lowest partial likelihood deviance in the inner loop). (4) Apply selected hyperparameters to the corresponding training outer fold. (5) Evaluate model performance in the corresponding outer validation fold (red box). (6) Choose the best of 10 outer models (lowest partial likelihood deviance). (7) Identify predictors (variables with non-zero beta-coefficients) in the training fold of the best model in the outer fold. (8) Apply the best outer model hyperparameters to the corresponding validation outer fold. (9) Compare the c-statistic of the prediction model to the c-statistic of an age-specific model in the same validation outer fold
Sample characteristics in 1997–1999 as a function of dementia status at the end of follow-up (31st March 2019)
| Dementia | |||
|---|---|---|---|
| No | Yes | ||
| N | 5045 (93.9) | 329 (6.1) | |
| Age at baseline, M (SD) | 55.4 (5.9) | 61.2 (4.7) | <0.001 |
| Sex | |||
| Men | 3662 (72.6) | 220 (66.9) | 0.03 |
| Women | 1383 (27.4) | 109 (33.1) | |
| Education | |||
| Low | 2198 (43.6) | 181 (55.0) | |
| Medium | 1335 (26.4) | 66 (20.1) | <0.001 |
| High | 1512 (30.0) | 82 (24.9) | |
| Ethnicity | |||
| White | 4656 (92.3) | 288 (87.5) | 0.002 |
| Non-white | 389 (7.7) | 41 (12.5) | |
Data are n (%), unless otherwise specified
*Data on dementia subtype was as follows: Alzheimer’s disease (N=137), vascular dementia (N=47), Parkinson’s dementia (N=17), mixed Alzheimer’s and vascular dementia (N=21), mixed vascular and Parkinson’s dementia (N=1), mixed Alzheimer’s and Parkinson’s dementia (N=5), other/missing subtype (N=101)
†p-value for difference in χ2 test (categorical data) or Student’s t test (continuous data)
Elastic net penalized Cox regression with repeated nested cross-validation: models (out of 100 repetitions) that improved prediction accuracy for incident dementia compared to an age-only model
| Repetition number | α | λ | c-statistic of the best model | c-statistic age-only model | Number of predictors in the selected model¶ | |
|---|---|---|---|---|---|---|
| 2 | 0.9 | 0.00617437 | 0.760 | 0.749 | 0.01 | 4 |
| 4 | 1 | 0.00617437 | 0.724 | 0.715 | 0.007 | 4 |
| 10 | 1 | 0.00677636 | 0.747 | 0.741 | 0.04 | 4 |
| 16 | 1 | 0.00512607 | 0.775 | 0.765 | 0.02 | 7 |
| 18 | 0.7 | 0.01299645 | 0.742 | 0.738 | 0.007 | 3 |
| 22 | 1 | 0.00816215 | 0.703 | 0.696 | 0.02 | 2 |
| 23 | 1 | 0.00425575 | 0.779 | 0.763 | 0.001 | 8 |
| 30 | 1 | 0.00617437 | 0.746 | 0.736 | 0.009 | 5 |
| 38 | 1 | 0.00617437 | 0.718 | 0.711 | 0.04 | 4 |
| 50 | 1 | 0.00562585 | 0.745 | 0.734 | 0.01 | 5 |
| 57 | 1 | 0.00467068 | 0.747 | 0.731 | 0.0006 | 9 |
| 67 | 0.5 | 0.00983134 | 0.755 | 0.743 | 0.004 | 6 |
| 74 | 0.8 | 0.00816215 | 0.735 | 0.726 | 0.02 | 3 |
| 91 | 1 | 0.00677636 | 0.724 | 0.714 | 0.008 | 3 |
| 94 | 0.7 | 0.00677636 | 0.762 | 0.751 | 0.03 | 9 |
| 96 | 0.5 | 0.00743705 | 0.735 | 0.722 | 0.04 | 12 |
*These are hyperparameters, allowing selection of the model with the lowest partial likelihood deviance in the inner loop; α ranges from 0 to 1 and when it is 0 all predictors are retained in the model, λ controls the coefficient shrinkage
†c-statistic, in the validation fold of the outer loop, of the best model (lowest partial likelihood deviance in the training folds of the outer loop)
‡c-statistic of the age-only model in the validation fold of the best outer loop model
§p-value for difference in C-statistic between the best model and the age-only model
¶Age was forced to be selected in all models.
Frequency of metabolites identified by elastic net penalized Cox regression in the sixteen selected models
| Name of metabolite | |
|---|---|
| Glucose (mmol/l) | 16/16 (100) |
| Phospholipids to total lipids ratio in medium HDL (%) | 15/16 (93.8) |
| Creatinine (mmol/l) | 10/16 (62.5) |
| Triglycerides to total lipids ratio in very large VLDL (%) | 8/16 (50.0) |
| Phospholipids to total lipids ratio in medium VLDL (%) | 5/16 (31.3) |
| Alanine (mmol/l) | 5/16 (31.3) |
| β-hydroxybutyrate (mmol/l) | 3/16 (18.8) |
| Free cholesterol to total lipids ratio in small HDL (%) | 2/16 (12.5) |
| Citrate (mmol/l) | 2/16 (12.5) |
| Free cholesterol to total lipids ratio in very large VLDL (%) | 1/16 (6.3) |
| Free cholesterol to total lipids ratio in large HDL (%) | 1/16 (6.3) |
| Triglycerides to total lipids ratio in medium HDL (%) | 1/16 (6.3) |
| Phospholipids to total lipids ratio in small HDL (%) | 1/16 (6.3) |
| Sphingomyelins (mmol/l) | 1/16 (6.3) |
| Albumin (signal area) | 1/16 (6.3) |
VLDL very low-density lipoproteins, HDL high-density lipoprotein
Predictive performance of risk scores for incident dementia (N=5374)
| Risk scores | HR (95% confidence interval) | AIC | Δ AIC | Sensitivity % | Specificity % | c-statistic (95% confidence interval) | ||
|---|---|---|---|---|---|---|---|---|
| Age-only model | 3.04 (2.66, 3.47) | 0.525 (0.450, 0.593) | 5198.2 | Ref. | 75.5 | 70.3 | 0.780 (0.757, 0.802) | Ref. |
| Risk score 1† | 3.13 (2.75, 3.57) | 0.545 (0.468, 0.636) | 5181.2 | − 17.0 | 80.9 | 64.7 | 0.786 (0.763, 0.808) | 0.05 |
| Risk score 2‡ | 3.16 (2.77, 3.60) | 0.551 (0.475, 0.620) | 5176.1 | − 22.1 | 81.5 | 64.2 | 0.787 (0.764, 0.809) | 0.05 |
| Risk score 3§ | 3.17 (2.78, 3.61) | 0.557 (0.482, 0.630) | 5170.8 | − 27.4 | 77.0 | 69.1 | 0.788 (0.766, 0.811) | 0.03 |
| Risk score 4¶ | 3.19 (2.80, 3.63) | 0.565 (0.492, 0.636) | 5163.6 | − 34.6 | 72.4 | 72.7 | 0.790 (0.767, 0.813) | 0.02 |
| Risk score 5# | 3.26 (2.87, 3.71) | 0.582 (0.511, 0.649) | 5147.7 | − 50.5 | 74.0 | 72.0 | 0.796 (0.774, 0.819) | <0.001 |
R Royston’s R2, AIC Akaike information criterion, c-statistic Harrell’s C-index, VLDL very low-density lipoproteins, HDL high-density lipoprotein
*p-value for difference in c-statistic using age-only model as reference
†Risk score 1 includes age and glucose
‡Risk score 2 includes age, glucose and phospholipids to total lipids ratio in medium HDL (%)
§Risk score 3 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%) and creatinine (mmol/l)
¶Risk score 4 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%), creatinine (mmol/l) and triglycerides to total lipids ratio in very large VLDL (%)
#Risk score 5 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%), creatinine (mmol/l), triglycerides to total lipids ratio in very large VLDL (%), phospholipids to total lipids ratio in medium VLDL (%), alanine (mmol/l), 3-hydroxybutyrate (mmol/l), free cholesterol to total lipids ratio in small HDL (%), citrate (mmol/l), free cholesterol to total lipids ratio in very large VLDL (%), free cholesterol to total lipids ratio in large HDL (%), triglycerides to total lipids ratio in medium HDL (%), phospholipids to total lipids ratio in small HDL (%), sphingomyelins (mmol/l) and albumin (signal area)
Youden index cutoff points for the calculation of the sensitivity and specificity are as follows: 0.541 for the age-only model; 0.310 for risk score 1; 0.296 for risk score 2; 0.468 for risk score 3; 0.604 for risk score 4; and 0.564 for risk score 5
Fig. 2Observed and predicted rate of dementia per 1000 person-years (calibration-in-the-large) as a function of deciles of predictors (age, risk score 3, risk score 4 and risk score 5). VLDL, very low-density lipoproteins; HDL, high-density lipoprotein. The first and second decile were collapsed due to a small number of events in these deciles. Risk score 3 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%) and creatinine (mmol/l). Risk score 4 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%), creatinine (mmol/l), and triglycerides to total lipids ratio in very large VLDL (%). Risk score 5 includes age, glucose, phospholipids to total lipids ratio in medium HDL (%), creatinine (mmol/l), triglycerides to total lipids ratio in very large VLDL (%), phospholipids to total lipids ratio in medium VLDL (%), alanine (mmol/l), 3-hydroxybutyrate (mmol/l), free cholesterol to total lipids ratio in small HDL (%), citrate (mmol/l), free cholesterol to total lipids ratio in very large VLDL (%), free cholesterol to total lipids ratio in large HDL (%), triglycerides to total lipids ratio in medium HDL (%), phospholipids to total lipids ratio in small HDL (%), sphingomyelins (mmol/l) and albumin (signal area)