| Literature DB >> 35111213 |
Atef Zaguia1, Deepak Pandey2, Sandeep Painuly2, Saurabh Kumar Pal2, Vivek Kumar Garg3, Neelam Goel2.
Abstract
PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35111213 PMCID: PMC8803417 DOI: 10.1155/2022/8393498
Source DB: PubMed Journal: Comput Intell Neurosci
Data collection for healthy individuals.
| DNA origin | Platform (K) | No. | Age range | Availability |
|---|---|---|---|---|
| Blood PBMC 1 | 27 | 80 | 3.6–18 | GSE27097 |
| Whole blood | 27 | 93 | 49–74 | GSE20236 |
| Blood CD4 + CD14 | 27 | 50 | 16–69 | GSE20242 |
| White blood | 27 | 60 | 18–89 | GSE32396 |
| Blood PBMC | 27 | 80 | 24–45 | GSE37008 |
| Whole blood | 450 | 91 | 26–101 | GSE40279 |
CD: cluster differentiation; PBMC: peripheral blood mononuclear cell.
Data collection for unhealthy individuals.
| DNA origin | Platform (K) | No. | Age range | Availability |
|---|---|---|---|---|
| Blood | 27 | 80 | 23–85 | GSE49904 |
| Whole blood | 27 | 100 | 50–85 | GSE19711 |
| Whole blood | 27 | 120 | 1–32 | GSE20067 |
| White blood | 27 | 62 | 16–86 | GSE41037 |
| Blood | 450 | 38 | 34–72 | GSE51032 |
Figure 1(a) Relation between Beta-value of CpG site and Age. (b) Relation between Beta-value of CpG site and Age.
Figure 2Proposed methodology.
Figure 3(a) A histogram of the age distribution for healthy individuals; (b) disease individuals.
Results of four algorithms on a healthy dataset on training split.
| Training | Healthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.80 | 5.24 | 7.43 |
| Support Vector Regression | 0.70 | 6.63 | 9.08 |
| Multiple Linear Regression | 0.71 | 6.78 | 8.97 |
| Random Forest Regression | 0.96 | 2.51 | 3.48 |
| Best result | Random Forest Regressor | ||
Results of four algorithms on healthy dataset on the unseen independent split.
| Testing | Healthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.77 | 5.28 | 7.67 |
| Support Vector Regression | 0.72 | 5.83 | 8.47 |
| Multiple linear regression | 0.78 | 4.92 | 7.59 |
| Random Forest Regression | 0.78 | 5.02 | 7.49 |
| Best result | Random forest regressor | ||
Results on healthy dataset on training split after hyperparameter tuning.
| Training | Healthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.84 | 4.96 | 6.69 |
| Random Forest Regression | 0.87 | 4.51 | 6.08 |
Results on healthy dataset on testing split after hyperparameter tuning.
| Testing | Healthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.76 | 5.32 | 7.84 |
| Random Forest Regression | 0.81 | 4.85 | 7.01 |
Figure 4Results for healthy data with optimized Random Forest model: (a) training data; (b) testing data.
Results of 4 algorithms on unhealthy dataset on training split.
| Training | Unhealthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.75 | 8.0 | 10.68 |
| Support Vector Regression | 0.40 | 12.94 | 16.48 |
| Multiple Linear Regression | 0.56 | 11.49 | 14.10 |
| Random Forest Regression | 0.94 | 3.83 | 5.18 |
| Best result | Random forest regressor | ||
Results of four algorithms on unhealthy dataset on unseen independent split.
| Testing | Unhealthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.53 | 10.40 | 13.45 |
| Support Vector Regression | 0.37 | 12.05 | 15.58 |
| Multiple Linear Regression | 0.46 | 11.52 | 14.40 |
| Random Forest Regression | 0.57 | 9.53 | 12.88 |
| Best result | Random forest regressor | ||
Results of unhealthy dataset on training split after hyperparameter tuning.
| Training | Unhealthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.92 | 4.61 | 6.00 |
| Random Forest Regression | 0.92 | 4.75 | 6.18 |
Results on unhealthy dataset on testing split after hyperparameter tuning.
| Testing | Unhealthy dataset | ||
|---|---|---|---|
|
| MAD | RMSE | |
| Gradient Boosting Regression | 0.62 | 10.28 | 13.17 |
| Random Forest Regression | 0.56 | 9.67 | 13.07 |
Figure 5Results for unhealthy data with optimized Random Forest model: (a) training data; (b) testing data.