| Literature DB >> 34886532 |
Salvatore Tedesco1, Martina Andrulli1, Markus Åkerlund Larsson2, Daniel Kelly3, Antti Alamäki4, Suzanne Timmons5, John Barton1, Joan Condell3, Brendan O'Flynn1, Anna Nordström2,6.
Abstract
As global demographics change, ageing is a global phenomenon which is increasingly of interest in our modern and rapidly changing society. Thus, the application of proper prognostic indices in clinical decisions regarding mortality prediction has assumed a significant importance for personalized risk management (i.e., identifying patients who are at high or low risk of death) and to help ensure effective healthcare services to patients. Consequently, prognostic modelling expressed as all-cause mortality prediction is an important step for effective patient management. Machine learning has the potential to transform prognostic modelling. In this paper, results on the development of machine learning models for all-cause mortality prediction in a cohort of healthy older adults are reported. The models are based on features covering anthropometric variables, physical and lab examinations, questionnaires, and lifestyles, as well as wearable data collected in free-living settings, obtained for the "Healthy Ageing Initiative" study conducted on 2291 recruited participants. Several machine learning techniques including feature engineering, feature selection, data augmentation and resampling were investigated for this purpose. A detailed empirical comparison of the impact of the different techniques is presented and discussed. The achieved performances were also compared with a standard epidemiological model. This investigation showed that, for the dataset under consideration, the best results were achieved with Random UnderSampling in conjunction with Random Forest (either with or without probability calibration). However, while including probability calibration slightly reduced the average performance, it increased the model robustness, as indicated by the lower 95% confidence intervals. The analysis showed that machine learning models could provide comparable results to standard epidemiological models while being completely data-driven and disease-agnostic, thus demonstrating the opportunity for building machine learning models on health records data for research and clinical practice. However, further testing is required to significantly improve the model performance and its robustness.Entities:
Keywords: ageing; all-cause mortality; imbalanced data; machine learning; mortality prediction; older adults; prediction models
Mesh:
Year: 2021 PMID: 34886532 PMCID: PMC8657506 DOI: 10.3390/ijerph182312806
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Cox Model.
| Variables | Hazard Ratio (HR) | HR 95% C.I. | |
|---|---|---|---|
| Sex | 0.47 | 0.27–0.83 | 0.01 |
| HDL Cholesterol | 0.88 | 0.83–0.93 | <0.005 |
| Rheumatoid arthritis | 0.20 | 0.02–2.43 | 0.21 |
| Total T-score | 0.77 | 0.62–0.96 | 0.02 |
| Secondary osteoporosis | 1.54 | 0.8–2.95 | 0.2 |
| Glucocorticoids | 2.95 | 0.8–10.8 | 0.1 |
| Parent Fractured Hip | 0.68 | 0.31–1.5 | 0.34 |
| Sway trace length (no vision) | 1 | 1.0–1.0 | 0.02 |
| IPAQ MET-min per week | 1 | 1.0–1.0 | 0.01 |
Base classifiers performance.
| MODEL | AUC-ROC | AUC-PR | BRIER SCORE | F1 SCORE | ACCURACY | RECALL | PRECISION | |
|---|---|---|---|---|---|---|---|---|
|
| Test | 0.509 | 0.222 | 0.746 | 0.048 | 0.954 | 0.036 | 0.072 |
| (0.500–0.527) | (0.020–0.520) | (0.741–0.753) | (0.000–0.095) | (0.945–0.959) | (0.000–0.071) | (0.000–0.143) | ||
| Train | 0.568 | 0.280 | 0.543 | 0.079 | 0.957 | 0.536 | 0.043 | |
| (0.534–0.601) | (0.251–0.297) | (0.543–0.544) | (0.072–0.087) | (0.956–0.957) | (0.429–0.679) | (0.038–0.048) | ||
|
| Test | 0.511 | 0.094 | 0.596 | 0.068 | 0.904 | 0.095 | 0.056 |
| (0.507–0.514) | (0.020–0.148) | (0.566–0.634) | (0.025–0.120) | (0.856–0.930) | (0.000–0.143) | (0.000–0.096) | ||
| Train | 0.522 | 0.104 | 0.587 | 0.079 | 0.913 | 0.093 | 0.076 | |
| (0.518–0.527) | (0.095–0.121) | (0.577–0.593) | (0.069–0.086) | (0.907–0.918) | (0.079–0.112) | (0.059–0.092) | ||
|
| Test | 0.509 | 0.187 | 0.044 | 0.040 | 0.956 | 0.030 | 0.052 |
| (0.508–0.509) | (0.020–0.520) | (0.041–0.045) | (0.000–0.080) | (0.955–0.959) | (0.000–0.060) | (0.000–0.104) | ||
| Train | 0.570 | 0.106 | 0.044 | 0.028 | 0.956 | 0.015 | 0.167 | |
| (0.551–0.590) | (0.092–0.117) | (0.043–0.045) | (0.010–0.046) | (0.955–0.957) | (0.000–0.031) | (0.000–0.300) | ||
|
| Test |
| 0.054 | 0.547 | 0.020 | 0.953 | 0.012 | 0.056 |
|
| (0.020–0.121) | (0.547–0.548) | (0.000–0.059) | (0.952–0.953) | (0.000–0.036) | (0.000–0.167) | ||
| Train | 0.582 | 0.113 | 0.549 | 0.056 | 0.951 | 0.036 | 0.138 | |
| (0.553–0.611) | (0.068–0.138) | (0.546–0.552) | (0.039–0.074) | (0.949–0.953) | (0.015–0.047) | (0.040–0.196) |
LR: Logistic Regression, DT: Decision Tree, RF: Random Forest, AdaBoost: Adaptive Boosting, AUC-ROC: Area Under the Curve–Receiver Operative Curve, AUC-PR: Area Under the Curve–Precision-Recall Curve. Results obtained on test and training sets. For each metric mean value and its 95% confidence interval are reported. Bold: the model with best results as done before.
Figure 1Enhanced base classifiers performance-summary.
Figure 2Enhanced base classifiers performance with Monte Carlo data augmentation-summary.
Figure 3Confusion matrix–original vs. synthetic samples.
Enhanced base learners performance (selected cases summary).
| Model | AUC-ROC | AUC-PR | Precision | Recall |
|---|---|---|---|---|
|
| 0.573 | 0.311 | 0.055 | 0.548 |
|
| 0.541 | 0.317 | 0.045 | 0.572 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.543 | 0.364 | 0.047 | 0.667 |
|
| 0.539 | 0.262 | 0.049 | 0.453 |
|
| 0.539 | 0.273 | 0.049 | 0.476 |
|
| 0.530 | 0.448 | 0.044 | 0.845 |
|
| 0.535 | 0.280 | 0.050 | 0.488 |
|
| 0.529 | 0.172 | 0.046 | 0.476 |
LR: Logistic Regression, DT: Decision Tree, RF: Random Forest, AdaBoost: Adaptive Boosting, AUC-ROC: Area Under the Curve–Receiver Operative Curve, AUC-PR: Area Under the Curve–Precision-Recall Curve, SMOTE: Synthetic Minority Over-sampling Technique, ADASYN: Adaptive Synthetic, RUS: Random under-Sampling, SMOTEENN: Synthetic Minority Over-sampling Technique Edited Nearest Neighbor, w/: with, w/o: without, prob.cal.: probability calibration, D.A: data augmentation. Only results obtained on test set reported. For each metric mean value and its 95% confidence interval are reported. Bold: the model with best results as done before.