| Literature DB >> 33182250 |
Md Adnan Arefeen1, Sumaiya Tabassum Nimi1, M Sohel Rahman2, S Hasan Arshad3,4, John W Holloway5, Faisal I Rezwan5,6.
Abstract
Epigenetic aging has been found to be associated with a number of phenotypes and diseases. A few studies have investigated its effect on lung function in relatively older people. However, this effect has not been explored in the younger population. This study examines whether lung function in adolescence can be predicted with epigenetic age accelerations (AAs) using machine learning techniques. DNA methylation based AAs were estimated in 326 matched samples at two time points (at 10 years and 18 years) from the Isle of Wight Birth Cohort. Five machine learning regression models (linear, lasso, ridge, elastic net, and Bayesian ridge) were used to predict FEV1 (forced expiratory volume in one second) and FVC (forced vital capacity) at 18 years from feature selected predictor variables (based on mutual information) and AA changes between the two time points. The best models were ridge regression (R2 = 75.21% ± 7.42%; RMSE = 0.3768 ± 0.0653) and elastic net regression (R2 = 75.38% ± 6.98%; RMSE = 0.445 ± 0.069) for FEV1 and FVC, respectively. This study suggests that the application of machine learning in conjunction with tracking changes in AA over the life span can be beneficial to assess the lung health in adolescence.Entities:
Keywords: epigenetic aging; feature selection; hyperparameter tuning; lung function; machine learning
Year: 2020 PMID: 33182250 PMCID: PMC7712054 DOI: 10.3390/mps3040077
Source DB: PubMed Journal: Methods Protoc ISSN: 2409-9279
Figure 1Mutual information score between each feature and the target forced expiratory volume in one second (FEV1) at age 18. A mutual information score > 0.1 was used as a threshold for selecting the best features. AA, age acceleration; IEAA, intrinsic epigenetic age acceleration.
Results of five regression models predicting FEV1 using the best features.
| Regression Model | R2 | RMSE |
|---|---|---|
| Linear | 74.98 ± 7.45 | 0.3781 ± 0.06380 |
| Lasso | 74.99 ± 7.45 | 0.3801 ± 0.0519 |
| Ridge | 75.03 ± 7.37 | 0.3780 ± 0.0639 |
| Elastic Net | 75.00 ± 7.41 | 0.3781 ± 0.0640 |
| Bayesian Ridge | 75.01 ± 7.42 | 0.3780 ± 0.0639 |
The models were developed using the four best features (height, sex, and weight at age 18 and FEV1 at age 10) as predictors of FEV1. Here, R2 = average goodness-of-fit measure for regression models represented as a percentage and RMSE = average root mean squared error.
Results of five regression models predicting FEV1 using the best features and AAdiff.
| Regression Model | R2 | RMSE |
|---|---|---|
| Linear | 75.16 ± 7.49 | 0.3770 ± 0.0652 |
| Lasso | 75.16 ± 7.49 | 0.3770 ± 0.0652 |
| Ridge | 75.21 ± 7.42 | 0.3768 ± 0.0653 |
| Elastic Net | 75.16 ± 7.49 | 0.3770 ± 0.0653 |
| Bayesian Ridge | 75.19 ± 7.46 | 0.3768 ± 0.0652 |
The models were developed using the four best features (height, sex, and weight at age 18 and FEV1 at age 10) with AAdiff as predictors of FEV1. Here, AAdiff = AA at 18 – AA at 10, R2 = average goodness-of-fit measure for regression models represented as a percentage and RMSE = average root mean squared error.
Results of five regression models predicting FVC using the best features.
| Regression Model | R2 | RMSE |
|---|---|---|
| Linear | 75.24 ± 7.10 | 0.4455 ± 0.0692 |
| Lasso | 75.25 ± 7.08 | 0.4456 ± 0.0680 |
| Ridge | 75.24 ± 7.00 | 0.4458 ± 0.0673 |
| Elastic Net | 75.35 ± 6.88 | 0.4450 ± 0.0673 |
| Bayesian Ridge | 75.25 ± 7.07 | 0.4456 ± 0.0678 |
The models were developed using four best features (height, sex, weight at age 18 and FVC at age 10) as predictors of FVC. Here, R2 = average goodness-of-fit measure for regression models represented as percentage and RMSE = average root mean squared error.
Results of five regression models predicting FVC using best features and AAdiff.
| Regression Model | R2 | RMSE |
|---|---|---|
| Linear | 75.26 ± 7.14 | 0.4456 ± 0.0693 |
| Lasso | 75.27 ± 7.12 | 0.4456 ± 0.0692 |
| Ridge | 75.28 1 7.12 | 0.4455 ± 0.0691 |
| Elastic Net | 75.38 ± 6.98 | 0.4448 ± 0.0690 |
| Bayesian Ridge | 75.28 ± 7.13 | 0.4455 ± 0.0692 |
The models were developed using the four best features (height, sex, and weight at age 18 and FVC at age 10) with AAdiff as predictors of FVC. Here, AAdiff = AA at 18 ‒ AA at 10, R2 = average goodness-of-fit measure for regression models represented as a percentage and RMSE = average root mean squared error.
Figure 2Impact of hyperparameter (α) on (a) ridge regression and (b) elastic net regression.