| Literature DB >> 32973211 |
Imene Garali1,2, Mourad Sahbatou3, Antoine Daunay4, Laura G Baudrin2,4, Victor Renault1, Yosra Bouyacoub2,4, Jean-François Deleuze1,2,3,4,5, Alexandre How-Kit6.
Abstract
Several blood-based age prediction models have been developed using less than a dozen to more than a hundred DNA methylation biomarkers. Only one model (Z-P1) based on pyrosequencing has been developed using DNA methylation of a single locus located in the ELOVL2 promoter, which is considered as one of the best age-prediction biomarker. Although multi-locus models generally present better performances compared to the single-locus model, they require more DNA and present more inter-laboratory variations impacting the predictions. Here we developed 17,018 single-locus age prediction models based on DNA methylation of the ELOVL2 promoter from pooled data of four different studies (training set of 1,028 individuals aged from 0 and 91 years) using six different statistical approaches and testing every combination of the 7 CpGs, aiming to improve the prediction performances and reduce the effects of inter-laboratory variations. Compared to Z-P1 model, three statistical models with the optimal combinations of CpGs presented improved performances (MAD of 4.41-4.77 in the testing set of 385 individuals) and no age-dependent bias. In an independent testing set of 100 individuals (19-65 years), we showed that the prediction accuracy could be further improved by using different CpG combinations and increasing the number of technical replicates (MAD of 4.17).Entities:
Mesh:
Substances:
Year: 2020 PMID: 32973211 PMCID: PMC7515898 DOI: 10.1038/s41598-020-72567-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Correlation between chronological age and DNA methylation for the seven CpGs analyzed located in the ELOVL2 promoter.
| CpG | Chromosome location (GRCh38) | Bekaert27 (n = 206) | Zbiec-Piekarska31 (n = 420) | Park28 (n = 692) | Cho32 (n = 95) | All (n = 1414) |
|---|---|---|---|---|---|---|
| 1 | Chr6: 11,044,661 | 0.898 | 0.837 | 0.940 | 0.860 | 0.904 |
| 2 | Chr6: 11,044,655 | 0.915 | 0.799 | 0.920 | 0.834 | 0.884 |
| 3 | Chr6: 11,044,647 | 0.866 | 0.803 | 0.897 | 0.818 | 0.852 |
| 4 | Chr6: 11,044,644 | 0.912 | 0.841 | 0.902 | 0.872 | 0.851 |
| 5 | Chr6: 11,044,642 | 0.924 | 0.881 | 0.906 | 0.871 | 0.893 |
| 6 | Chr6: 11,044,640 | 0.939 | 0.877 | 0.935 | 0.821 | 0.911 |
| 7 | Chr6: 11,044,634 | 0.876 | 0.910 | 0.907 | 0.887 | 0.878 |
Age prediction performances of the different statistical models on the training and testing sets.
| Model | Best performance from Training (T)/Testing (V) setsa | Number of CpGs | CpG combination | Training set | Testing set | ||||
|---|---|---|---|---|---|---|---|---|---|
| R | MAD | RMSE | R | MAD | RMSE | ||||
| Zbiec-Pierkarska 1 | – | 2 | CpG5,7 | 0.918 | 6.885 | 9.127 | 0.932 | 6.397 | 8.803 |
| MQR | T | 9 | CpG1–2 & 4–6 & CpG22,42,62–72 | 0.945 | 5.133 | 6.975 | 0.950 | 4.773 | 6.730 |
| V | 8 | CpG4–6 & CpG22–42,62–72 | 0.941 | 5.229 | 7.184 | 0.953 | 4.574 | 6.559 | |
| SVMr | T | 6 | CpG1–3,5–7 | 0.956 | 4.555 | 6.229 | 0.953 | 4.464 | 6.544 |
| V | 5 | CpG2–3,5–7 | 0.9546 | 4.6139 | 6.3257 | 0.9534 | 4.4101 | 6.4919 | |
| SVMl | T | 7 | CpG1–7 | 0.935 | 5.575 | 7.531 | 0.943 | 5.221 | 7.194 |
| V | 5 | CpG2–6 | 0.930 | 5.650 | 7.793 | 0.945 | 5.130 | 7.058 | |
| SVMp | T | 7 | CpG1–7 | 0.799 | 9.946 | 13.046 | 0.830 | 9.734 | 12.124 |
| V | 5 | CpG3–7 | 0.778 | 10.456 | 13.582 | 0.833 | 9.465 | 12.098 | |
| GBR | T | 7 | CpG1–7 | 0.992 | 1.993 | 2.627 | 0.953 | 4.549 | 6.520 |
| V | 5 | CpG2,4–7 | 0.989 | 2.378 | 3.121 | 0.955 | 4.426 | 6.398 | |
| mMDA | T | 3 | CpG1,5–6 | 0.933 | 5.650 | 7.625 | 0.940 | 5.320 | 7.357 |
| V | 3 | CpG2,5–6 | 0.929 | 5.801 | 7.855 | 0.943 | 5.231 | 7.223 | |
aFor each statistical model, both CpG combinations giving the best age prediction accuracy according to the training (T) and testing (V) sets were included in the table.
Figure 1Scatterplots of predicted age and chronological age of the training and testing samples obtained with ELOVL2 age-prediction models based on six different statistical approaches. The plotted data were obtained from the combination of CpGs giving the best age prediction accuracy on the training set. Z-P1, Zbiec-Piekarska model[25] using multiple linear regression; MQR, multiple quadratic regression; SVM, support vector machine with radial kernel (r), linear (l) and polynomial (p) functions; GBR, gradient boosting regressor; mMDA, missMDA. Four out-of-scale values (y-axis) are missing for SVMp.
Age prediction performances of the different statistical models on an independent validation set.
| Model | Number of CpGs | CpGs | Estimators | Training set (n = 1,028) | Testing set 1 (n = 385) | Independent testing set 2 (n = 100) | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 PCR and 1 PSQ/PCR (1 replicate) | 1 PCR and 2 PSQ/PCR (2 replicates) | 2 PCR and 1 PSQ/PCR (2 replicates) | 3 PCR and 1 PSQ/PCR (3 replicates) | 3 PCR and 2 PSQ/PCR (6 replicates) | ||||||
| Zbiec-Pierkarska 1 | 2 | CpG5,7 | R | 0.918 | 0.932 | 0.880 | 0.893 | 0.902 | 0.909 | 0.914 |
| MAD | 6.885 | 6.397 | 5.445 | 5.319 | 5.147 | 5.050 | 5.011 | |||
| RMSE | 9.127 | 8.803 | 6.870 | 6.624 | 6.440 | 6.290 | 6.201 | |||
| MQR | 4 | CpG6 & CpG42,62–72 | R | 0.934 | 0.945 | 0.904 | 0.911 | 0.919 | 0.924 | 0.927 |
| MAD | 5.521 | 4.910 | 4.786 | 4.619 | 4.425 | 4.266 | 4.232 | |||
| RMSE | 7.574 | 7.057 | 6.225 | 5.996 | 5.765 | 5.598 | 5.504 | |||
| SVMr | 2 | CpG6,7 | R | 0.947 | 0.948 | 0.902 | 0.906 | 0.917 | 0.923 | 0.925 |
| MAD | 5.051 | 4.701 | 4.784 | 4.668 | 4.388 | 4.211 | 4.174 | |||
| RMSE | 6.843 | 6.833 | 6.287 | 6.140 | 5.771 | 5.581 | 5.515 | |||
| SVMl | 2 | CpG6,7 | R | 0.905 | 0.927 | 0.902 | 0.905 | 0.917 | 0.922 | 0.923 |
| MAD | 6.246 | 6.036 | 5.536 | 5.484 | 5.289 | 5.211 | 5.197 | |||
| RMSE | 9.078 | 8.095 | 6.874 | 6.796 | 6.525 | 6.404 | 6.375 | |||
| BGR | 2 | CpG6,7 | R | 0.976 | 0.947 | 0.900 | 0.904 | 0.913 | 0.919 | 0.920 |
| MAD | 3.471 | 4.772 | 4.892 | 4.842 | 4.577 | 4.436 | 4.469 | |||
| RMSE | 4.660 | 6.931 | 6.397 | 6.314 | 5.973 | 5.803 | 5.741 | |||
| mMDA | 1 | CpG6 | R | 0.906 | 0.927 | 0.902 | 0.905 | 0.917 | 0.922 | 0.923 |
| MAD | 6.291 | 6.079 | 5.926 | 5.875 | 5.736 | 5.673 | 5.598 | |||
| RMSE | 9.008 | 8.104 | 7.234 | 7.158 | 6.932 | 6.826 | 6.772 | |||
Figure 2Scatterplots of predicted age and chronological age of the independent testing set of 100 blood samples from individuals of 19–65 years obtained with ELOVL2 age-prediction models based on six different statistical approaches. The plotted data were obtained from the combination of CpGs giving the best age prediction accuracy on this independent testing set. Due to replicate measures per sample and to allow comparison between conditions, only one age prediction value per sample was randomly picked for representation. Z-P1, Zbiec-Piekarska model[25] using multiple linear regression; MQR, multiple quadratic regression; SVM, support vector machine with radial kernel (r) and linear (l) functions; GBR, gradient boosting regressor; mMDA, missMDA.