| Literature DB >> 34926482 |
Xingqi Cao1, Guanglai Yang2, Xurui Jin2,3, Liu He1, Xueqin Li1, Zhoutao Zheng1, Zuyun Liu1, Chenkai Wu2.
Abstract
Objective: Biological age (BA) has been accepted as a more accurate proxy of aging than chronological age (CA). This study aimed to use machine learning (ML) algorithms to estimate BA in the Chinese population. Materials and methods: We used data from 9,771 middle-aged and older Chinese adults (≥45 years) in the 2011/2012 wave of the China Health and Retirement Longitudinal Study and followed until 2018. We used several ML algorithms (e.g., Gradient Boosting Regressor, Random Forest, CatBoost Regressor, and Support Vector Machine) to develop new measures of biological aging (ML-BAs) based on physiological biomarkers. R-squared value and mean absolute error (MAE) were used to determine the optimal performance of these ML-BAs. We used logistic regression models to examine the associations of the best ML-BA and a conventional aging measure-Klemera and Doubal method-BA (KDM-BA) we previously developed-with physical disability and mortality, respectively.Entities:
Keywords: aging measure; biological age; disability; machine learning; mortality
Year: 2021 PMID: 34926482 PMCID: PMC8671693 DOI: 10.3389/fmed.2021.698851
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Flow chart of the analytic sample. CHARLS, the China Health and Retirement Longitudinal Study; BADL, basic activities of daily living; IADL, instrumental activities of daily living.
Baseline characteristics of the study population.
|
|
|
| ||
|---|---|---|---|---|
|
|
|
| ||
| Age, years | 59.1 ± 9.2 | 59.8 ± 9.1 | 58.5 ± 9.2 | |
| <60 years | 5,414 (55.4) | 2,361 (52.0) | 3,053 (58.4) | |
| ≥60 years | 4,357 (44.6) | 2,184 (48.1) | 2,173 (41.6) | |
| ML-BA | 59.4 (5.8) | 60.0 (5.8) | 58.8 (5.8) | |
| KDM-BA | 57.0 (9.9) | 58.2 (9.4) | 56.1 (10.3) | |
| Sex, female | 5,226 (53.5) | – | – | |
| Residence, rural | 6,366 (65.2) | 3,005 (66.1) | 3,361 (64.3) | |
| Education | ||||
| No schooling | 2,882 (29.5) | 601 (13.2) | 2,281 (43.7) | |
| Primary school | 4,018 (41.1) | 2,182 (48.0) | 1,836 (35.1) | |
| Middle school | 1,923 (19.7) | 1,160 (25.5) | 763 (14.6) | |
| High school or more | 948 (9.7) | 602 (13.3) | 346 (6.6) | |
| Marital status | ||||
| Currently married | 8,156 (83.5) | 3,984 (87.7) | 4,172 (79.8) | |
| Others | 1,615 (16.5) | 561 (12.3) | 1,054 (20.2) | |
| Smoking status | ||||
| Non-smoker | 6,797 (69.6) | 1,897 (41.7) | 4,900 (93.8) | |
| Smoker | 2,974 (30.4) | 2,648 (58.3) | 326 (6.24) | |
| Alcohol consumption | ||||
| Non-drinker | 5,973 (61.1) | 1,522 (33.5) | 4,451 (85.2) | |
| Drinker | 3,798 (38.9) | 3,023 (66.5) | 775 (14.8) | |
| BMI (kg/m2) | 23.5 ± 3.9 | 23.0 ± 3.6 | 24.0 ± 4.1 | |
| BMI category | ||||
| Underweight | 650 (6.8) | 315 (7.1) | 335 (6.5) | |
| Normal | 4,990 (52.0) | 2,627 (58.8) | 2,363 (46.1) | |
| Overweight | 2,828 (29.5) | 1,149 (25.7) | 1,679 (32.7) | |
| Obese | 1,130 (11.8) | 377 (8.4) | 753 (14.7) | |
| Disease counts | ||||
| 0 | 2,938 (30.1) | 1,469 (32.3) | 1,469 (28.1) | |
| 1 | 3,110 (31.8) | 1,482 (32.6) | 1,628 (31.2) | |
| 2 | 2,132 (21.8) | 943 (20.8) | 1,189 (22.8) | |
| 3 | 1,591 (16.3) | 651 (14.3) | 940 (18.0) | |
ML-BA, Machine Learning method-biological age; KDM-BA, Klemera and Doubal method-biological age; BMI, body mass index. The continuous variables and categorical variables were expressed as mean ± standard deviation, and number (percentage), respectively.
BMI was calculated as weight in kilograms divided by height in meters squared. Underweight was defined as BMI < 18.5 kg/m.
MAE, MSE, RMSE, and R-squared value of machine learning models.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Gradient boosting regressor | 6.519 | 64.127 | 8.001 | 0.270 |
| Light gradient boosting machine | 6.532 | 64.875 | 8.049 | 0.261 |
| CatBoost regressor | 6.527 | 65.121 | 8.063 | 0.258 |
| Random forest | 6.557 | 65.126 | 8.065 | 0.258 |
| Extra trees regressor | 6.576 | 65.330 | 8.075 | 0.256 |
| Support vector machine | 6.655 | 68.141 | 8.248 | 0.224 |
| AdaBoost regressor | 6.877 | 68.804 | 8.289 | 0.217 |
MAE, Mean Absolute Error; MSE, Mean Square Error; RMSE, Root Mean Square Error.
Figure 2Correlations of chronological age with machine learning method-biological age and Klemera and Doubal method-biological age. CA, chronological age; ML-BA, Machine Learning method-biological age; KDM-BA, Klemera and Doubal method-biological age. (A) and (B) show the correlation between CA and the two measures (ML-BA and KDM-BA), respectively.
Unadjusted associations of CA, ML-BA, or KDM-BA with disability and mortality in the full sample and sex subgroup.
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
| ||||||
|
|
|
|
|
| ||
| No. of events/No. of participants | 1,860/7,797 | 1,935/7,490 | 2,380/4,375 | 1,947/7,698 | 882/9,771 | |
| Total | CA only | 1.048 (1.042, 1.054) | 1.045 (1.039, 1.051) | 1.03 (1.02, 1.04) | 1.05 (1.04, 1.06) | 1.11 (1.10, 1.12) |
| ML-BA only | 1.06 (1.05, 1.07) | 1.06 (1.05, 1.07) | 1.04 (1.03, 1.05) | 1.07 (1.06, 1.08) | 1.16 (1.14, 1.17) | |
| KDM_BA only | 1.043 (1.037, 1.048) | 1.037 (1.031, 1.043) | 1.024 (1.017, 1.031) | 1.04 (1.03, 1.05) | 1.104 (1.096, 1.113) | |
| Male | CA only | 1.06 (1.05, 1.07) | 1.06 (1.05, 1.07) | 1.04 (1.03, 1.05) | 1.05 (1.04, 1.04) | 1.10 (1.09, 1.12) |
| ML-BA only | 1.07 (1.06,1.09) | 1.08 (1.07,1.10) | 1.06 (1.05, 1.07) | 1.08 (1.06,1.09) | 1.14 (1.13, 1.16) | |
| KDM-BA only | 1.06 (1.05, 1.07) | 1.05 (1.04, 1.06) | 1.04 (1.03,1.05) | 1.05 (1.04, 1.06) | 1.10 (1.09, 1.11) | |
| Female | CA only | 1.046 (1.038, 1.054) | 1.043 (1.035, 1.051) | 1.03 (1.02,1.04) | 1.055 (1.047, 1.064) | 1.13 (1.11, 1.14) |
| ML-BA only | 1.06 (1.05, 1.08) | 1.06 (1.05, 1.07) | 1.04 (1.02,1.05) | 1.08 (1.06, 1.09) | 1.17 (1.15, 1.19) | |
| KDM-BA only | 1.04 (1.03, 1.05) | 1.036 (1.029, 1.044) | 1.03 (1.02, 1.04) | 1.045 (1.037, 1.052) | 1.11 (1.10, 1.12) |
CA, chronological age; ML-BA, Machine Learning method-biological age; KDM-BA, Klemera and Doubal method-biological age; BADL, basic activities of daily living; IADL, instrumental activities of daily living; OR, odds ratio; CI: confidence interval.
Participants with prevalent disability in BADL/IADL/lower extremity mobility/upper extremity mobility were excluded for analyses of BADL/IADL/lower extremity mobility/upper extremity mobility, respectively.
Risk estimates of physical disability and mortality predicted by ML-BA and KDM-BA adjusting for CA.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
| |||||
|
|
|
|
|
| ||
| No. of events/No. of participants | 1,860/7,797 | 1,935/7,490 | 2,380/4,375 | 1,947/7,698 | 882/9,771 | |
| CA+ ML-BA | CA | 1.04 (1.03, 1.05) | 1.04 (1.03, 1.05) | 1.02 (1.01, 1.03) | 1.04 (1.03, 1.05) | 1.08 (1.06, 1.09) |
| ML-BA | 1.01 (1.00, 1.03) | 1.02 (1.00, 1.03) | 1.02 (1.00, 1.03) | 1.02 (1.01, 1.03) | 1.07 (1.05, 1.09) | |
| CA+ KDM-BA | CA | 1.04 (1.02, 1.05) | 1.04 (1.03, 1.06) | 1.03 (1.01, 1.04) | 1.05 (1.03, 1.06) | 1.06 (1.05, 1.08) |
| KDM-BA | 1.01 (1.00, 1.03) | 1.00 (0.99, 1.01) | 1.00 (0.99, 1.02) | 1.00 (0.99, 1.01) | 1.05 (1.04, 1.07) |
CA, chronological age; ML-BA, machine learning method-biological age; KDM-BA, Klemera and Doubal method-biological age; BADL, basic activities of daily living; IADL, instrumental activities of daily living; OR, odds ratio; CI: confidence interval.
articipants with prevalent disability in BADL/IADL/lower extremity mobility/upper extremity mobility were excluded for analyses of BADL/IADL/lower extremity mobility/upper extremity mobility, respectively.