Literature DB >> 30134623

Human Age Prediction Based on DNA Methylation Using a Gradient Boosting Regressor.

Xingyan Li1, Weidong Li2, Yan Xu3,4.   

Abstract

All tissues of organisms will become old as time goes on. In recent years, epigenetic investigations have found that there is a close correlation between DNA methylation and aging. With the development of DNA methylation research, a quantitative statistical relationship between DNA methylation and different ages was established based on the change rule of methylation with age, it is then possible to predict the age of individuals. All the data in this work were retrieved from the Illumina HumanMethylation BeadChip platform (27K or 450K). We analyzed 16 sets of healthy samples and 9 sets of diseased samples. The healthy samples included a total of 1899 publicly available blood samples (0⁻103 years old) and the diseased samples included 2395 blood samples. Six age-related CpG sites were selected through calculating Pearson correlation coefficients between age and DNA methylation values. We built a gradient boosting regressor model for these age-related CpG sites. 70% of the data was randomly selected as training data and the other 30% as independent data in each dataset for 25 runs in total. In the training dataset, the healthy samples showed that the correlation between predicted age and DNA methylation was 0.97, and the mean absolute deviation (MAD) was 2.72 years. In the independent dataset, the MAD was 4.06 years. The proposed model was further tested using the diseased samples. The MAD was 5.44 years for the training dataset and 7.08 years for the independent dataset. Furthermore, our model worked well when it was applied to saliva samples. These results illustrated that the age prediction based on six DNA methylation markers is very effective using the gradient boosting regressor.

Entities:  

Keywords:  DNA methylation; age prediction; aging; epigenetics; gradient boosting regressor

Year:  2018        PMID: 30134623      PMCID: PMC6162650          DOI: 10.3390/genes9090424

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Aging is an irreversible natural process in human life which is influenced by many factors, such as genetic factors, living environment and diseases [1,2]. Aging can be modified and regulated by various mechanisms at a molecular level, such as oxidative damage of DNA, chemical modification on DNA, and shortened and dysfunctional telomeres [3]. Although many methods have been used to estimate individual age, the problems of low sensitivity and prediction accuracy still to be improved [4,5,6,7]. Recent studies have shown that human aging is related to the alteration of DNA methylation in genome specific locations, and these epigenetic modifications can be used to estimate the individual age [8,9]. DNA methylation (DNAm) refers to the chemical modification process which transfers the active methyl to the specific base on the DNA chain under the catalysis of DNA methyltransferase (DNMT) [10]. DNA methylation can occur at the N-6 position of adenine, N-7 position of guanine, C-5 position of cytosine and so on. However, in the mammalian genome, DNA methylation often occurs on C (cytosine) of 5’-CpG-3’ to generate 5-methyldeoxycytidine (5mC). Due to the close relationship between DNA methylation and human development, tumor diseases, especially the transcriptional inactivation of tumor suppressor genes induced by CpG island methylation, DNA methylation has become an important research topic in epigenetics and epigenomics. DNA methylation is actually an epigenetic modification that plays an important modulation role in individual growth, development, gene expression patterns and the stability of the genome without changing DNA sequences [11]. In addition, this modification can be steadily transmitted in the process of development and cell proliferation [12]. Some studies have shown that the level of DNA methylation is closely related to age. With age, the DNA methylation level of the global genome is decreasing [13,14,15]. It has been reported that 5mC is increased with age in some specific CpG sites, whereas at other CpG sites, the level of 5mC decreases with age [16,17]. For some CpG sites, the degree of DNA methylation is closely related to aging, therefore it can be used for age prediction [8,18,19,20,21,22]. In the past, an individual’s age could be predicted by measuring and analyzing skeletal markers such as bones and teeth [23,24]. This method is limited to the existence of the skeleton. In molecular biology, DNA damage, mitochondrial mutations, and the length of leukocyte telomere are related to aging, and can also be used to predict age [25,26]. However, these methods are less accurate or are technically difficult. Furthermore, in most crime scenes, the perpetrators have fled after the crime, with only piecemeal remains such as blood, saliva or semen to be found. Therefore, it is imperative to find other feasible methods for the prediction of individual age. It has long been known that the aging process can cause changes in the molecular level of tissues and organs. It has not been found until recently that changes in DNA methylation can be used to predict age. Some reports have translated age-related DNA methylation into an age prediction model to reveal individual age [8,18,20,27,28,29]. For example, Yi et al. reported a multiple linear regression to predict age in blood samples in 2014 [30]. The model showed that the average difference between predicted age and actual age was around 4 years. Zbiec-Piekarska et al. analyzed the CpG sites in blood and built a multiple linear regression model in 2015 [31]. Based on a combination of five DNA methylation markers, the mean absolute deviation (MAD) of prediction age was 3.9 years. Huang et al. selected five age-related CpG sites from 38 candidate markers by pyrosequencing and established a linear regression model to predict age in 2015 [32]. The accuracy of their model was slightly lower, and the MAD was 7.986 years. Park et al. selected three CpG sites and used DNA methylation markers in blood from the Asian population to predict age in 2016 [33]. They identified a root mean square error (RMSE) of 6.320 years and an MAD of 3.156 years. In addition, Hannum et al. established a quantitative model with 71 highly age-related markers in 2013 [19]. The correlation coefficient between the true age and the predicted age was 0.96, and the average error was 3.9 years. However, most of these studies were based on biological experiments to identify sites. They are time-consuming and complicated to operate. Therefore, it is necessary to develop a computational method to select the candidate CpG sites. Existing models primarily use linear regression models to interpret the complex relationship between DNA methylation and age [8,30,32]. For a limited number of CpG sites, it is necessary to find a reliable age prediction model to improve the accuracy. In this study, we adopted a gradient boosting regressor to predict age, and its results were better than the existing methods.

2. Materials and Methods

2.1. Data Collection and Processing

In this study, we obtained dozens of blood datasets from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi). All of these DNA methylation data were retrieved from two platforms, HumanMethylation27 BeadChip and HumanMethylation450 BeadChip. Some of the GEO datasets contained ethnicity information: GSE36064 (Caucasian, Chinese, and African American), GSE40279 (Caucasian, European), GSE65638 (Chinese), GSE51032 (Italycohort), GSE41169 (Dutch population), GSE27317 (African-American, Caucasian and other), GSE34257 (Gambian), GSE37008 (European, Caucasian or other ethnicity), GSE41037 (Dutch population). The datasets that did not provide the age of individuals were excluded. Finally, 25 complete datasets were obtained, of which 16 were healthy and 9 were disease datasets. The diseases which affect the DNA methylation will lead to bias in age prediction. So we divided the datasets into two categories. One was the healthy datasets (Table 1) and the other was the disease datasets (Table 2). To illustrate the performance of our model, we randomly divided each dataset into training and independent in a ratio of 7:3. The training dataset for each divided data is combined into one piece, and so is the independent dataset. A total of 1899 healthy individuals from different race backgrounds with ages between 0 and 103 years were divided into 1322 training samples and 577 independent samples. The 9 disease datasets were divided into 1673 training samples and 722 independent samples.
Table 1

Sixteen healthy DNA-methylation datasets.

DNA OriginPlatformNo.Age RangeAuthor and Publication YearAvailability
Whole Blood27K93(49, 74)Rakyan (2010)GSE20236
Blood CD4+CD1427K50(16, 69)Rakyan (2010)GSE20242
Blood PBMC 127K398(3.6, 18)Alisch (2012)GSE27097
Blood Cord27K168(0, 0)Adkins (2011)GSE27317
Blood PBMC450K40(0, 103)Heyn (2012)GSE30870
Blood PBMC450K71(3.5, 76)Harretal (2012)GSE32149
Blood Cord27K84(0, 0)Khulan (2012)GSE34257
Blood Cord27K24(0, 0)Mallon (2012)GSE34869
Blood PBMC450K78(1, 16)Alisch (2012)GSE36064
Blood Cord27K123(0, 0)Gordon (2012)GSE36642
Blood Cord27K48(0, 0)Turan (2012)GSE36812
Blood PBMC27K91(24, 45)Lam (2012)GSE37008
Whole Blood450K500(26, 101)Hannum (2012)GSE40279
Whole Blood450K95(18, 65)Horvath (2012)GSE41169
Whole blood450K43(47, 59)Bell (2013)GSE53128
Blood450K16(21, 32)Xu (2015)GSE65638

1 Peripheral blood mononuclear cell.

Table 2

Nine disease DNA-methylation datasets.

DNA OriginPlatformNo.Age RangeAuthor and Publication YearAvailability
Whole Blood27K203(50, 85)Song (2010)GSE19711
Whole Blood27K194(1, 32)Teschendorff (2010)GSE20067
Peripheral Blood450K46(3.5, 76)Harris (2011)GSE32148
Blood450K24(52, 88)Athanasios (2012)GSE40005
Whole Blood27K498(16, 86)Horvath (2012)GSE41037
Whole Blood450K500(18, 70)Liu (2013)GSE42861
Blood27K71(23, 85)Day (2013)GSE49904
Blood450K499(34, 72)Polidoro (2013)GSE51032
Peripheral Blood450K383(34, 93)Lwe (2013)GSE53740

2.2. Methylation Quality Control

To explain the common experimental biases and perform quality control analysis on DNA methylation datasets, we used principal component analysis (PCA) to identify and remove abnormal samples. To do this we used MATLAB R2014b software (v8.4.0.150421 win64) for processing. First of all, we standardized each dataset, then performed principal component analysis and extracted the first two principal components, and finally made a cluster diagram. Samples outside the circle with a radius of five were defined at outliers and removed, this filtering procedure was iteratively executed until no samples were determined to be outliers. A total of 22 healthy samples were removed and 23 disease samples were removed.

2.3. Selection of Age-Related CpG Sites

For each CpG site, the value indicates the percentage of methylation. The value of the site is equal to one if it is fully methylated, and zero if it is completely unmethylated. There are batch effects between different data platforms. This batch effect can be partially overcome by Z-score conversion, so we used Z-score to normalize the methylation levels between different datasets to avoid obvious batch effects and used the normalized methylation values for age prediction analysis (This used the IBM SPSS v.22 software processing.) Therefore, all the DNA methylation values used the normalized values. To identify age-related DNA methylation markers, we calculated Pearson correlations between age and DNA methylation value of each CpG site for every dataset from 1 to 103 years old (because Pearson correlation cannot be calculated for the datasets where objects have the same age). According to the Pearson correlation analysis, we chose the highly age-related r values (including positive and negative correlations) in each dataset and calculated the overlapping sites selected in each dataset. Finally, seven sites with high repetition frequency were selected. These sites were cg22736354, cg06493994, cg02228185, cg09809672, cg19761273, cg01820374 and cg19283806. Some datasets did not contain cg19283806, so it was rejected. To select the appropriate number of these sites for age prediction, we used stepwise forward to select variables and got the sequential results about the importance of markers (cg09809672, cg02228185, cg01820374, cg22736354, cg06493994, cg19761273). For this type of analysis, the markers were added to the age prediction model one by one [3]. It has been shown that the combination of these six markers had the highest accuracy. Finally, six age-related hypomethylated or hypermethylated CpG sites were determined (Table 3). Among them, cg22736354 and cg06493994 were positively correlated with age. However, cg02228185, cg09809672, cg19761273 and cg01820374 were negatively correlated with age. This is consistent with the results of Horvath’s research report [20]. To analyze the robustness of the six CpG sites, we split the data for 450K and 27K, and obtained the same sites in the 27K data. Similar results were not obtained at 450K, which may be due to 450K have relatively less data (only 5 datasets), but the selected six CpG sites had good prediction ability in subsequent prediction.
Table 3

Information of 6 selected age-related CpG sites.

CpG IDGene IDChromosome Location 1Gene Region 2Relation to GpG Island 3Correlation StatusReference
cg09809672EDARADD1:236557682TSS1500N_ShoreNegative[1,17,33]
cg22736354NHLRC16:181227191stExonIslandPositive[2,7,18,19]
cg02228185ASPA17:33795671stExon--Negative[7,26,33]
cg01820374LAG312:6882083BodyN_ShoreNegative[1]
cg06493994SCGN6:256526021stExonIslandPositive[2,7,18,19]
cg19761273CSNK1D17:80232096TSS1500S_ShoreNegative[2]

1 Chromosome location is referred to the Human genome reference GRCh37 version. 2 TSS: transcription start site. TSS1500: 1500 bp flanking region from the TSS. 3 CpGs island table were downloaded from University of California Santa Cruz (UCSC) browser. Distance of 2kb to CpG islands were defined as CpG island shores (N_Shore: downstream of CpG island and S_Shore: up-stream of the CpG island).

2.4. Algorithm

In recent years, age prediction models in blood based on a small number of CpG sites have been studied [9,27,34]. Other tissues, such as saliva [18,35], semen [36] and teeth [37] have been investigated, too. Most of these models are linear regression models. However, it is impossible to clarify the complex relationship between DNA methylation and age using a simple linear model. To minimize the prediction error and improve the accuracy of the model, the gradient boosting regressor (GBR) model has been utilized [38]. GBR is an integrated model with higher performance and better stability. Friedman proposed the GBR algorithm that extends the boosting algorithm in order to solve the regression problem. The algorithm uses the negative gradients of the loss function to solve the minimum value. GBR has been widely used in biological research, which can handle unclean and noisy data well, support different loss function, and has strong predictive ability for nonlinear data [38]. The gradient boosting regressor algorithm was executed with the sklearn package (October 2017. scikit-learn 0.19.1). It avoids the overfitting problem in decision tree learning by stopping tree growth as early as possible. The parameters of GBR are loss = ‘lad’, learning_rate = 0.03, n_estimators = 300, subsample = 0.6, λ = 0.6, min_samples_spli = 2, max_depth = 4, verbose = 1, warm_start = True. The parameters of Support Vector Regression (SVR) are kernel = ‘rbf’, degree = 3, coef = 0.0, tol = 0.001, C = 1.0, ε = 0.1. The parameters of BayesianRidge are n_iter = 300, tol = 0.001, .

2.5. Statistical Measurements

In the age prediction model, we used 1899 samples from different races and evaluated the age prediction model by calculating the MAD. The MAD is the mean absolute deviation between the predicted age and the actual age. The degree of correlation between predicted age and true age is measured by calculating . All statistical analyses were done using Python 3.6 programming. They are defined as below: where m denotes the number of target values is the prediction value, and represents the regression function for feature vector The MAD denotes mean absolute deviation, MSE (mean square error), and RMSE (root mean square error).

3. Results

3.1. Healthy Blood Data Results

To verify the accuracy of the GBR model, three other models—BayesianRidge, Multiple Linear Regression (MLR) and SVR—were also executed. The results showed that the correlation between age and DNA methylation was 0.97 for the gradient boosting regressor, with RMSE and MAD being 4.55 and 2.72 years, respectively (Figure 1a). The RMSE and MAD were 12.58 and 10.26 years for BayesianRidge (Figure 1b), 7.75 and 5.13 years for Support Vector Regression (Figure 1c), 12.58 and 10.24 years for multiple linear regression (Figure 1d). For the independent datasets of 583 samples, the MAD was 4.06 years for gradient boosting regressor (Figure 2a), 10.56 years for BayesianRidge (Figure 2b), 5.93 years for Support Vector Regression (Figure 2c), and 10.55 years for multiple linear regression (Figure 2d). The detailed results are shown in Table 4. All the values were identified on the same CpG sites. The results showed that the prediction accuracy of the gradient boosting regressor was better than those of other linear regression models.
Figure 1

Comparison between the real age and the age predicted by the four models in the training dataset of health data. GBR: gradient boosting regresion; MAD: mean absolute deviation; RMSE: root mean square error; SVR: support vector regression.

Figure 2

Comparison between the real age and the age predicted by the four models in the validation dataset of healthy data.

Table 4

Comparison of gradient booster regressor (GBR) with the other three methods on healthy datasets.

R2MADMSERMSE
Training
Gradient Boosting Regressor0.97472.717120.72434.5524
BayesianRidge0.805510.2561158.304412.5819
Support Vector Regression0.92675.133860.04207.7487
Multiple Linear Regression0.805510.2448158.280012.5809
Testing
Gradient Boosting Regressor0.95234.059339.82696.3109
BayesianRidge0.810110.5654157.872112.5647
Support Vector Regression0.91515.926771.20608.4384
Multiple Linear Regression0.810410.5510157.672612.5568

MAD: mean absolute deviation; MSE: mean square error; RMSE: root mean square error.

3.2. Disease Blood Data Results

There was no significant correlation between age-related methylation and sex or race [3]; however, some genes were associated with age-related diseases, such as cancer, Alzheimer’s, and so on. DNA methylation will be disordered in these diseases. Horvath et al. reported that the predicted age in cancer was poorly correlated with patient ages [20]. Park et al. found the correlation between age and methylation of three CpG sites in patients with acute myeloid leukemia (AML) disappeared [33]. Alzheimer’s disease is also known as senile dementia. The degree of methylation in the promotor region of amyloid preprotein gene declined with age in the patients [39,40]. We analyzed nine diseased samples in Table 2 to further validate the proposed GBR. The correlation between age and DNA methylation was 0.83 in our GBR. The RMSE and MAD were 7.81 and 5.91 years, respectively (Figure 3a). For the independent set, the MAD was 6.99 years (Figure 4a). The results of other models are shown in Table 5. As shown in the Table 5, the diseases affect the age prediction based on DNA methylation. However, GBR still performed well in these disease samples.
Figure 3

Comparison between the real age and the age predicted by the four models in the training dataset of disease data.

Figure 4

Comparison between the real age and the age predicted by the four models in the validation dataset of disease data.

Table 5

Results comparison of GBR with the other three methods on disease datasets.

R2 MADMSERMSE
Training
Gradient Boosting Regressor0.81865.440163.06487.9413
BayesianRidge0.68447.8944109.622710.4701
Support Vector Regression0.53339.8583162.694912.7552
Multiple Linear Regression0.68447.8946109.622210.4701
Testing
Gradient Boosting Regressor0.73747.083291.78879.5806
BayesianRidge0.68128.0786111.289610.5494
Support Vector Regression0.53039.9573164.674712.8326
Multiple Linear Regression0.68128.0795111.301610.5500
We predicted the age per disease group to see whether there would be a systematic difference between predicted age and chronological age. For this purpose, we analyzed each diseased sample. The obtained MAD for each disease was as follows: ovarian cancer was 5.91 years; type 1 diabetes mellitus (DM) was 5.33 years; Crohn’s disease or ulcerative colitis was 5.15 years; head and neck squamous cell carcinoma (HNSCC) was 7.04 years; schizophrenia was 4.54 years; rheumatoid arthritis was 4.45 years; breast cancer, colorectal cancer and other primary cancers was 6.51 years; and neurodegenerative tauopathy was 3.95 years. Neurodegenerative tauopathy and schizophrenia showed the lowest age prediction error, while HNSCC demonstrated the lowest correlation with age. All these suggest that age-related DNA methylation is accelerated in these diseases, so there would not be a systematic difference between predicted age and true age.

3.3. Application of the Technique to Saliva

Some studies have shown that the pattern of DNA methylation is tissue-specific [41]. Koch et al. pointed out that it was difficult to define common markers that displayed general accuracy of prediction in a variety of tissues [42]. However, methylation of certain CpG sites is not always associated with tissue specificity [43]. To test the robustness of our selected age-related CpG sites when applied to the body fluids other than the blood, we studied the methylation data of 278 saliva samples (see the Supplementary S1). The methylation values of the selected 6 CpG sites were collected from a total of 278 individuals with aged between 21 to 55 years, and 196 samples were used to train the GBR model and 82 samples were used in the independent group. The results showed that the correlation coefficient between predicted age and real age was 0.85, and the MAD was 2.1 years (training) and 5.3 years (independent). The other model results are shown in the Table 6.
Table 6

Results comparison of GBR with the other three methods on saliva datasets.

R2MADMSERMSE
Training
Gradient Boosting Regressor0.85392.104013.77953.7121
BayesianRidge0.43105.748352.51697.2469
Support Vector Regression0.02277.936999.52739.9763
Multiple Linear Regression0.43335.677552.30457.2322
Testing
Gradient Boosting Regressor0.42985.347856.12917.4919
BayesianRidge0.54235.538943.84686.6217
Support Vector Regression0.03088.4729104.440310.2196
Multiple Linear Regression0.54795.466243.39336.5874
To assess the performance of the GBR model, we also compared it to other studies. Bocklandt et al. identified 88 CpG sites in 80 genes [18]. Using a multiple linear regression model, the correlation coefficient between age and DNA methylation was 0.73, and the average error was 5.2 years. Using the same data (GSE28746), which included 84 individuals, the selected six sites in this work were used. The correlation coefficient between age and DNA methylation is 0.58, and the average error is 3.76 years, which is more accurate than Bocklandt’s multiple linear regression (Table 7). These results highlight the robustness of GBR model on non-blood tissue.
Table 7

Results of GBR and Multiple Linear Regression on saliva samples.

No. of CpG Sites R2 MAD
Multiple Linear Regression880.735.2
Gradient Boosting Regressor60.583.76

3.4. Analysis of the Selected Six CpG Sites

In the existing studies, the ranking of age-related CpG sites is quite different. This is probably due to the difference in age range, methods and statistical techniques (the age range is shown in Figure 5). Furthermore, there is almost no overlap in calculating DNAm-based age prediction factors for different tissues. The six CpG loci extracted from the blood data can be applied to predict saliva data without any adjustment, and the prediction results are better than other predictive factors. Therefore, it is a complex task to select the CpG sites to establish the prediction age model. In this work, we selected six age-related CpG sites (AR-CpGs). These six sites are from six specific genes, including edaradd, nhlrc1, aspa, lag3, scgn and csnk1d, respectively. These special genes play important roles in regulation of developmental processes. We annotated these CpGs to their associated genes. The detailed locations of these CpGs were also included in Table 3. Two CpGs were located at the promoter region of genes (e.g., TSS1500), three were located at the first exon region and one in gene body. Meanwhile, two CpGs were located within CpG island regions, three were located at the CpG island shores, and one was far from the CpG island regulatory regions. For example, the CpG cg19761273 is located at the TSS1500 regions of the gene edaradd and overlapping with south shore of the CpG island, see Figure 6.
Figure 5

(a) A histogram of the age distribution for healthy individuals; (b) A histogram of the age distribution for disease individuals.

Figure 6

UCSC genome browser view of the genomic location of the CpG cg19761273.

4. Discussion

Many bioinformatical studies have established linear regression models to study the relationship between DNA methylation and age. The reason for this is that the linear model is fast, interpretable and easy to use. However, Alisch and her colleagues et al. used non-linear models to do that in children (3–17 years old). In addition, they found that the DNA methylation did not change at a constant rate with age in life [44]. Bekaert et al. also noted that the relationship between DNA methylation and age in elovl2 was not a straight line [37], illustrating that the linear model does not always predict age very well, and that non-linear models can sometimes be a good fit. In this study, we selected six CpG sites by calculating the Pearson correlation between age and DNA methylation values. Gradient boosting regressor was adopted, which is an integrated model. It was found that the correlation between predicted age and true age was strong (). In addition, the MAD was 2.72 years. In the combined independent datasets, the MAD of age prediction was 4.06 years. The MAD value was lower than those of the other three models. This indicates that the GBR is a more suitable model for age prediction. Studies have shown that the level of DNA methylation is closely related to age, where most CpGs from CpG islands were highly hypermethylated during aging [13,45]. Here we observed that two of the CpG island sites were hyper-methylated, while the remaining ones showed hypo-methylation with aging, with none of them being present at CpG islands. Previous studies have shown there was no strong evidence showing DNA methylation was strongly associated with known aging-related mechanisms, but the aging-associated CpGs may represent a set of biomarkers for predicting the cellular chronological clock [3,8,46]. Specifically, we noted that majority of the genes were not presented in the previously reported genes whose expression changes with aging [46,47], but all 6 of these genes were involved in age-related processes. All CpG sites showing close correlation with age belong to genes involved in age-related processes. Here are a few examples. edaradd was identified by its association with ectodermal dysplasia, and specifically with hypohidrotic ectodermal dysplasia, a genetic disorder characterized by defective development of hair, teeth, and eccrine sweat glands [48]. The nhlrc1 gene provides instructions for making a protein called malin. Although this protein is active in cells throughout the body, it appears to play a critical role in the survival of nerve cells (neurons) in the brain. The aspa gene provides instructions for making an enzyme called aspartoacylase. In the brain, this enzyme breaks down a compound called N-acetyl-L-aspartic acid (NAA) into aspartic acid (an amino acid which is a building block for many proteins) and another molecule called acetic acid. LAG3’s main ligand is MHC class II, to which it binds with higher affinity than CD4 [49]. The protein negatively regulates cellular proliferation, activation, and homeostasis of T cells, in a similar fashion to CTLA-4 and PD-1 [50,51] and has been reported to play a role in Treg suppressive function [52]. LAG3 also helps maintain CD8+ T cells in a tolerogenic state [53] and, working with PD-1, helps maintain CD8 exhaustion during chronic viral infection [54]. LAG3 is known to be involved in the maturation and activation of dendritic cells [55]. SCGN is a secreted calcium-binding protein which is found in the cytoplasm. It is related to calbindin D-28K and calretinin. This protein is thought to be involved in potassium chloride-stimulated calcium flux and cell proliferation [56]. The csnk1d gene encodes the casein kinase I isoform delta enzyme in humans [57]. This gene is a member of the casein kinase I (CKI) gene family whose members have been implicated in the control of cytoplasmic and nuclear processes, including DNA replication and repair. Interestingly, gene expressions of the selected hypo-methylated genes aspa and csnk1d were reported to be positively associated with aging [58,59], which implied potentially inverse correlations between the methylation level and the expression level to those usually occurring in promoter regions. Taken together, these genes have an important influence on the development, and their methylation could play vital roles in the regulation of aging. Of course, our research also has some limitations. Firstly, we did not consider the impact of gender on age prediction. Some researchers have reported that age-related methylation may be different in gender [1]. However, in Bram’s study, there was no significant difference in age-related methylation level between males and females [37]. Secondly, because data on other tissues is limited, we only studied blood tissue. Each tissue has a different methylation pattern, and there is a specific methylation change during aging [60]. If more age-related methylation sites can be found in different tissues, the available methylation indicators for age prediction will be enormous. Undoubtedly, the combination of multiple age-related methylated markers will contribute to accurately estimating age.

5. Conclusions

Age prediction based on DNA methylation is a rapidly evolving field of epigenetics, and it has great potential to provide accurate results. In this study, we selected six highly age-related CpG sites through calculating person correlation between age and DNA methylation value of each CpG site. By comparing the prediction effects of GBR with other linear methods, the results showed that GBR has a better prediction accuracy for blood samples. In healthy datasets, the MAD was 2.72 years for the training set and 4.06 years for the independent set. Furthermore, the age-related DNA methylation was associated with the specifically age-related diseases. The MAD clearly increased on the disease data, which was 5.44 years in the training set and 7.08 years in the independent set. GBR also achieved good results in saliva.
  57 in total

1.  Isolation and identification of age-related DNA methylation markers for forensic age-prediction.

Authors:  Shao Hua Yi; Long Chang Xu; Kun Mei; Rong Zhi Yang; Dai Xin Huang
Journal:  Forensic Sci Int Genet       Date:  2014-03-22       Impact factor: 4.882

2.  Telomeres shorten during ageing of human fibroblasts.

Authors:  C B Harley; A B Futcher; C W Greider
Journal:  Nature       Date:  1990-05-31       Impact factor: 49.962

3.  Role of epigenetics in human aging and longevity: genome-wide DNA methylation profile in centenarians and centenarians' offspring.

Authors:  Davide Gentilini; Daniela Mari; Davide Castaldi; Daniel Remondini; Giulia Ogliari; Rita Ostan; Laura Bucci; Silvia M Sirchia; Silvia Tabano; Francesco Cavagnini; Daniela Monti; Claudio Franceschi; Anna Maria Di Blasio; Giovanni Vitale
Journal:  Age (Dordr)       Date:  2012-08-25

4.  CD4/major histocompatibility complex class II interaction analyzed with CD4- and lymphocyte activation gene-3 (LAG-3)-Ig fusion proteins.

Authors:  B Huard; P Prigent; M Tournier; D Bruniquel; F Triebel
Journal:  Eur J Immunol       Date:  1995-09       Impact factor: 5.532

5.  Genome-wide methylation profiles reveal quantitative views of human aging rates.

Authors:  Gregory Hannum; Justin Guinney; Ling Zhao; Li Zhang; Guy Hughes; SriniVas Sadda; Brandy Klotzle; Marina Bibikova; Jian-Bing Fan; Yuan Gao; Rob Deconde; Menzies Chen; Indika Rajapakse; Stephen Friend; Trey Ideker; Kang Zhang
Journal:  Mol Cell       Date:  2012-11-21       Impact factor: 17.970

6.  Epigenetic predictor of age.

Authors:  Sven Bocklandt; Wen Lin; Mary E Sehl; Francisco J Sánchez; Janet S Sinsheimer; Steve Horvath; Eric Vilain
Journal:  PLoS One       Date:  2011-06-22       Impact factor: 3.240

7.  Molecular insights into the pathogenesis of Alzheimer's disease and its relationship to normal aging.

Authors:  Alexei A Podtelezhnikov; Keith Q Tanis; Michael Nebozhyn; William J Ray; David J Stone; Andrey P Loboda
Journal:  PLoS One       Date:  2011-12-28       Impact factor: 3.240

8.  Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population.

Authors:  Jordana T Bell; Pei-Chien Tsai; Tsun-Po Yang; Ruth Pidsley; James Nisbet; Daniel Glass; Massimo Mangino; Guangju Zhai; Feng Zhang; Ana Valdes; So-Youn Shin; Emma L Dempster; Robin M Murray; Elin Grundberg; Asa K Hedman; Alexandra Nica; Kerrin S Small; Emmanouil T Dermitzakis; Mark I McCarthy; Jonathan Mill; Tim D Spector; Panos Deloukas
Journal:  PLoS Genet       Date:  2012-04-19       Impact factor: 5.917

9.  AGEMAP: a gene expression database for aging in mice.

Authors:  Jacob M Zahn; Suresh Poosala; Art B Owen; Donald K Ingram; Ana Lustig; Arnell Carter; Ashani T Weeraratna; Dennis D Taub; Myriam Gorospe; Krystyna Mazan-Mamczarz; Edward G Lakatta; Kenneth R Boheler; Xiangru Xu; Mark P Mattson; Geppino Falco; Minoru S H Ko; David Schlessinger; Jeffrey Firman; Sarah K Kummerfeld; William H Wood; Alan B Zonderman; Stuart K Kim; Kevin G Becker
Journal:  PLoS Genet       Date:  2007-10-02       Impact factor: 5.917

10.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

View more
  10 in total

Review 1.  Accelerating research on biological aging and mental health: Current challenges and future directions.

Authors:  Laura K M Han; Josine E Verhoeven; Audrey R Tyrka; Brenda W J H Penninx; Owen M Wolkowitz; Kristoffer N T Månsson; Daniel Lindqvist; Marco P Boks; Dóra Révész; Synthia H Mellon; Martin Picard
Journal:  Psychoneuroendocrinology       Date:  2019-04-05       Impact factor: 4.905

2.  Human aging DNA methylation signatures are conserved but accelerated in cultured fibroblasts.

Authors:  Gabriel Sturm; Andres Cardenas; Marie-Abèle Bind; Steve Horvath; Shuang Wang; Yunzhang Wang; Sara Hägg; Michio Hirano; Martin Picard
Journal:  Epigenetics       Date:  2019-06-12       Impact factor: 4.528

3.  Current perspectives on the cellular and molecular features of epigenetic ageing.

Authors:  Kenneth Raj; Steve Horvath
Journal:  Exp Biol Med (Maywood)       Date:  2020-04-10

4.  Analyzing Corin-BNP-NEP Protein Pathway Revealing Differential Mechanisms in AF-Related Ischemic Stroke and No AF-Related Ischemic Stroke.

Authors:  Xiaozhu Shen; Nan Dong; Yiwen Xu; Lin Han; Rui Yang; Juan Liao; Xianxian Zhang; Tao Xie; Yugang Wang; Chen Chen; Mengqian Liu; Yi Jiang; Liqiang Yu; Qi Fang
Journal:  Front Aging Neurosci       Date:  2022-05-09       Impact factor: 5.702

5.  The relationship between ageing and changes in the human blood and brain methylomes.

Authors:  Patrick Bryant; Arne Elofsson
Journal:  NAR Genom Bioinform       Date:  2022-02-02

6.  Development of Tissue-Specific Age Predictors Using DNA Methylation Data.

Authors:  Heeyeon Choi; Soobok Joe; Hojung Nam
Journal:  Genes (Basel)       Date:  2019-11-04       Impact factor: 4.096

7.  MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction.

Authors:  Zahra Momeni; Mohammad Saniee Abadeh
Journal:  Genes (Basel)       Date:  2019-11-25       Impact factor: 4.096

8.  Machine learning analysis on the impacts of COVID-19 on India's renewable energy transitions and air quality.

Authors:  Thompson Stephan; Fadi Al-Turjman; Monica Ravishankar; Punitha Stephan
Journal:  Environ Sci Pollut Res Int       Date:  2022-06-17       Impact factor: 5.190

9.  Improvements and inter-laboratory implementation and optimization of blood-based single-locus age prediction models using DNA methylation of the ELOVL2 promoter.

Authors:  Imene Garali; Mourad Sahbatou; Antoine Daunay; Laura G Baudrin; Victor Renault; Yosra Bouyacoub; Jean-François Deleuze; Alexandre How-Kit
Journal:  Sci Rep       Date:  2020-09-24       Impact factor: 4.379

10.  DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning.

Authors:  Atef Zaguia; Deepak Pandey; Sandeep Painuly; Saurabh Kumar Pal; Vivek Kumar Garg; Neelam Goel
Journal:  Comput Intell Neurosci       Date:  2022-01-24
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.