| Literature DB >> 36192681 |
Qing Yang1, Sunan Gao2, Junfen Lin1, Ke Lyu3, Zexu Wu3, Yuhao Chen3, Yinwei Qiu1, Yanrong Zhao1, Wei Wang1, Tianxiang Lin1, Huiyun Pan4, Ming Chen5,6.
Abstract
BACKGROUND: Biological age (BA) has been recognized as a more accurate indicator of aging than chronological age (CA). However, the current limitations include: insufficient attention to the incompleteness of medical data for constructing BA; Lack of machine learning-based BA (ML-BA) on the Chinese population; Neglect of the influence of model overfitting degree on the stability of the association results. METHODS ANDEntities:
Keywords: Biological age; Biological features; Health status; Interpolation; Machine learning; Stacking
Mesh:
Substances:
Year: 2022 PMID: 36192681 PMCID: PMC9528174 DOI: 10.1186/s12859-022-04966-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The analytical flowchart of our study. *ML-BA, machine learning-based biological age; STK-BA, staking model-based biological age; XGB-BA, XGBoost-based biological age; ABSI, A Body Shape Index; WHtR, Waist-to-height ratio
Fig. 2Imputing results of different methods in missing completely at random (MCAR, A, B) and missing not at random (MNAR, C, D) simulation datasets. Correlation between biological features and chronological age (E)
RSME, R2, MAE, and Pearson’s correlation of ML-BA models
| Model | Training set (80%) | Test set (20%) | ||||||
|---|---|---|---|---|---|---|---|---|
| RMSE | R2 | MAE | Pearson’s correlation | RMSE | R2 | MAE | Pearson’s correlation | |
| Stacking (SVM) | 5.765 | 0.438 | 4.349 | 0.661 | 5.776 | 0.435 | 4.352 | 0.659 |
| Stacking (MLR) | 5.788 | 0.431 | 4.418 | 0.657 | 5.786 | 0.431 | 4.414 | 0.656 |
| Stacking (RF) | 2.786 | 0.900 | 2.094 | 0.949 | 5.828 | 0.422 | 4.444 | 0.650 |
| XGBoost | 4.988 | 0.578 | 3.780 | 0.760 | 5.869 | 0.414 | 4.489 | 0.643 |
| CatBoost | 3.674 | 0.771 | 2.739 | 0.878 | 5.893 | 0.409 | 4.494 | 0.640 |
| LGBM | 4.128 | 0.711 | 3.097 | 0.843 | 5.926 | 0.403 | 4.538 | 0.634 |
| GBDT | 5.513 | 0.484 | 4.239 | 0.696 | 5.951 | 0.397 | 4.579 | 0.630 |
| Extra Trees | 0.000 | 1.000 | 0.000 | 1.000 | 6.319 | 0.321 | 4.889 | 0.566 |
| DNN | 6.251 | 0.341 | 4.869 | 0.584 | 6.419 | 0.299 | 5.014 | 0.547 |
| CNN | 5.918 | 0.409 | 4.583 | 0.640 | 6.467 | 0.289 | 5.016 | 0.537 |
| GAM | 6.516 | 0.279 | 5.094 | 0.529 | 6.509 | 0.280 | 5.072 | 0.529 |
| MLR | 6.692 | 0.240 | 5.238 | 0.490 | 6.691 | 0.239 | 5.224 | 0.489 |
| AdaBoost | 6.986 | 0.172 | 5.499 | 0.414 | 6.994 | 0.168 | 5.501 | 0.409 |
Bold indicates the performance of the final selected model
Distribution of BA in male and female study populations
| BA | Min | Max | Median | Mean (SD) | Correlation with CA (P value) | |
|---|---|---|---|---|---|---|
| STK-BA | Male | 47.23 | 88.57 | 68.17 | 68.51 (4.16) | 0.604–0.617 (< 0.001) |
| Female | 43.59 | 88.39 | 67.02 | 67.16 (5.58) | 0.682–0.692 (< 0.001) | |
| Total | 43.59 | 88.57 | 67.61 | 67.77 (5.03) | 0.660–0.668 (< 0.001) | |
| XGB-BA1 | Male | 43.48 | 90.94 | 68.17 | 68.47 (4.39) | 0.695–0.706 (< 0.001) |
| Female | 36.45 | 99.75 | 66.99 | 67.18 (5.68) | 0.756–0.764 (< 0.001) | |
| Total | 36.45 | 99.75 | 67.60 | 67.76 (5.16) | 0.738 ~ 0.745 (< 0.001) | |
| XGB-BA2 | Male | 44.39 | 92.43 | 68.08 | 68.48 (4.82) | 0.791–0.799 (< 0.001) |
| Female | 35.37 | 99.66 | 66.96 | 67.17 (6.23) | 0.836–0.842 (< 0.001) | |
| Total | 35.37 | 99.66 | 67.54 | 67.76 (5.67) | 0.822–0.827 (< 0.001) |
Fig. 3Correlation A–C between chronological age (CA) and biological age (BA) and distribution D–F of BA in the whole sample
Fig. 4Associations of STK-BA and XGB-BAs with health risk indicators (A Body Shape Index (ABSI), Waist-to-height ratio (WHtR)). Health risk indicators as continuous variables (ABSI: A, WHtR: B). Health risk indicators as categorical variables (Model 2, ABSI: C, WHtR: D). Model 1 was a crude model, Model 2 was adjusted for CA, BMI, and family disease status
Fig. 5Associations of STK-BA and XGB-BAs with disease counts (A: Model 1, B: Model 2). The associations between each disease and STK-BA, XGB-BAs (C: Model 2). Model 1 was a crude model, Model 2 was adjusted for CA, and family disease status
Associations of STK-BA and XGB-BAs with disease counts
| Model 1* | Model 2** | |||||
|---|---|---|---|---|---|---|
| Coef (SE) | t-value | P | Coef (SE) | t-value | P | |
| STK-BA | 0.025 (0.001) | 24.20 | < 0.001 | 0.008 (0.001) | 5.981 | < 0.001 |
| XGB-BA1 | 0.025 (0.001) | 25.08 | < 0.001 | 0.006 (0.002) | 4.130 | < 0.001 |
| XGB-BA2 | 0.023 (0.001) | 26.76 | < 0.001 | 0.005 (0.002) | 3.205 | 0.001 |
*Model 1 was a crude model
**Model 2 was adjusted for CA, and family disease status
Fig. 6The schematic diagram of the Stacking method