Literature DB >> 33101366

Improved Human Age Prediction by Using Gene Expression Profiles From Multiple Tissues.

Fayou Wang1,2, Jialiang Yang3,4,5, Huixin Lin4,5, Qian Li4,6, Zixuan Ye4, Qingqing Lu4,5, Luonan Chen2, Zhidong Tu3, Geng Tian4,5.   

Abstract

Studying transcriptome chronological change from tissues across the whole body can provide valuable information for understanding aging and longevity. Although there has been research on the effect of single-tissue transcriptomes on human aging or aging in mice across multiple tissues, the study of human body-wide multi-tissue transcriptomes on aging is not yet available. In this study, we propose a quantitative model to predict human age by using gene expression data from 46 tissues generated by the Genotype-Tissue Expression (GTEx) project. Specifically, the biological age of a person is first predicted via the gene expression profile of a single tissue. Then, we combine the gene expression profiles from two tissues and compare the predictive accuracy between single and two tissues. The best performance as measured by the root-mean-square error is 3.92 years for single tissue (pituitary), which deceased to 3.6 years when we combined two tissues (pituitary and muscle) together. Different tissues have different potential in predicting chronological age. The prediction accuracy is improved by combining multiple tissues, supporting that aging is a systemic process involving multiple tissues across the human body.
Copyright © 2020 Wang, Yang, Lin, Li, Ye, Lu, Chen, Tu and Tian.

Entities:  

Keywords:  RNA sequencing; age prediction; aging; gene expression; genotype-tissue expression (GTEx)

Year:  2020        PMID: 33101366      PMCID: PMC7546819          DOI: 10.3389/fgene.2020.01025

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

Different people may age at different rates as revealed by recent studies (Li et al., 2009; Horvath, 2013). Some people appear younger than their chronological age, and others appear older. In an extreme case, a 16-year-old girl without any known genetic syndromes or chromosomal abnormalities appeared to stop growing and looked like an infant (Walker et al., 2009). It is a challenge to identify her “actual” age. Many factors, for instance, lifestyle, and environmental factors, can hasten or delay aging (Feldman et al., 1994; Hultsch et al., 1999). Thus, a set of biomarkers that can reliably reflect real age has practical value. There are special cases in which such age biomarkers are particularly useful. For example, people may need to verify an athlete's age in sporting events such as the Olympic Games or to determine a suspect's age in certain forensic cases. Different types of biomarkers have been proposed to quantify human age (Li et al., 2009). Physical parameters, such as visual acuity, auditory threshold, and maximum work rate, have been used as indicators of aging for more than three decades (Furukawa et al., 1975; Borkan and Norris, 1980). Other criteria, such as gray hair and skin wrinkles, can also reflect chronological age (Van Neste and Tobin, 2004). However, these parameters often do not provide accurate estimation of age and cannot reveal the internal molecular changes of the human body or the underlying aging mechanisms. With the rapid development of high-throughput technologies, genomic, and epigenetic data are accumulating to an unprecedented status. This provides a new route of estimating aging at the molecular level. Associations between epigenetic variations (e.g., DNA methylation and histone modification) and age have been reported (Fraga and Esteller, 2007). It is manifested that gene expression and the methylation profile of blood (Bocklandt et al., 2011; Hannum et al., 2013; Horvath, 2013), the gene expression profile of brain (Fraser et al., 2005), and telomere length (Harley et al., 1990; Benetos et al., 2001) are good indicators of age in human and other primates. In addition, these biomarkers may also provide candidate targets for intervention to extend the human life span (Baker and Sprott, 1988). Previous studies on age prediction using gene expression mainly rely on single tissues, such as blood or brain. The predictive ability of different tissues had not been thoroughly studied. Because aging is a concordant process involving multiple tissues (Kujoth et al., 2005), it might be effective to build an age-prediction model with information from multiple tissues. In this study, we built an optimal age prediction model by using the Genotype-Tissue Expression (GTEx) profile among 46 human tissues and then compared the predictive efficiency of a single tissue and combining two tissues.

Methods

Tissue Gene Expression and Data Preprocessing

From the GTEx (V6), the gene expression profiles from 46 tissues were used. A detailed description of sample collection, RNA preparation, RNA sequencing, gene expression estimation, etc., are listed in the GTEx consortium paper (The GTEx Consortium, 2015). We first normalized the original gene expression data from GTEx via quantile normalization.

Pearson Correlation for Selecting Age-Associated Genes

The genes in each tissue were ranked based on the Pearson correlation of donor age and corresponding gene expression. Then, we picked top genes from 50 to 6400 with multiples of 2 as a model input and tuned it by 10-fold cross-validation (CV).

Accuracy of the Models

In this paper, we use root-mean-square error (RMSE) to measure the accuracy of the models. RMSE is a frequently used measure of the differences between values (sample or population values) predicted by a model or between an estimator and the values observed. In the age-prediction models, we use RMSE to measure the quality of the model: the smaller the RMSE, the higher the accuracy of the model—and on the contrary, the lower the accuracy of the model. The RMSE of predicted value ŷ, a regression's dependent variable y, is computed for different predictions as the square root of the mean of the squares of the deviations:

Prediction Based on Single Tissue

Our age-prediction model is based on the elastic net algorithm (Zhou and Hastile, 2005). The elastic net algorithm has a sparsity property and favors grouping effects so that strongly correlated predictors tend to be in or out of the model together. These properties let the method specifically fit our study because gene expression is highly interrelated, and our prediction model relies on only a small number of genes. The age-prediction process is formulated as follows: where Age is the chronological age of the donor of sample i with 1 ≤ i ≤ M, M is the number of samples in a particular tissue, x is the log2-transformed expression of gene j with 1 ≤ j ≤ N for sample i, N is the number of preselected genes in the tissue, ω0 is the intercept, ω is the weight of gene j, is the predicted value of ω, 0 ≤ α ≤ 1 is a parameter to balance the L1 (e.g., lasso) and L2 (e.g., ridge regression) penalty, and λ is the lasso parameter. The two parameters α and λ are optimized by a 10-fold CV. After ω0 and ω (1 ≤ j ≤ N) are determined, the following equation is used to predict age for a new sample y with an expression level known for selected genes: It is worth noting that the main purpose of this study is to compare the predictive capability of a single tissue with double tissues. Because the main focus is not to identify the “best” predictive models, we do not compare the performance of elastic net with other machine learning methods. However, given the wide application of elastic net in age prediction (Hannum et al., 2013), we consider it to be an appropriate choice to serve the main purpose of this work.

Parameter Tuning and Model Selection

To identify the best age-prediction model, we applied the 10-fold CV strategy to the analysis. In addition, we bootstrapped the CV process 100 times and averaged the validation RMSE and Pearson correlation coefficient (PCC) to reduce the potential bias that originated from random sampling when splitting the sample into training and testing sets. As stated above, there are three model parameters, namely the preselection threshold N, parameter α to balance the lasso and ridge regression penalties, and lasso parameter λ. These parameters are tuned by 10-fold CV. Specifically, we let N increase from 50 to 6400 by multiples of 2, α increase from 0 to 1 with a step-wise addition of 0.01, and λ increase from 2−10 to 210 with multiples of 2. The set of parameters yielding the lowest averaged validation RMSE in the 100 bootstrapped, 10-fold, CV runs were chosen as the optimal parameters for single and double tissue. It is of note that we reranked and selected genes (based on the 9 fold training data) in each CV to avoid overfitting.

Prediction Using Gene Expression Data of Two Tissues

Because the number of overlapping samples among three tissues are often less than 70, we only analyzed samples that came from two tissues. To balance the contribution of individual tissue, an equal number of top gene expression profiles from each tissue were combined as features in the prediction model. A similar analysis was then applied to tune the model parameters. The performance of each tissue and double tissues were evaluated by RMSE from both validation and testing data.

DAVID Analysis

The DAVID (6.7) (Huang et al., 2009) (https://david.ncifcrf.gov/tools.jsp) bioinformatics resource consists of an integrated biological knowledge base and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. We can use DAVID, a high-throughput and integrated data-mining environment, to analyze gene functional classification, functional annotation charts, or clustering and functional annotation tables through gene lists derived from our age-prediction models. By following this protocol, investigators are able to gain an in-depth understanding of the aging themes in lists of genes that are enriched in genome-scale studies.

Results

Using GTEx Gene Expression Profile as Data Input

We develop a computational framework to predict donor age depending on the gene expression profile of one single or two tissues generated from GTEx (Version 6). GTEx contains expression profiles of more than 41,298 genes in 46 human tissues. There are 34,443 genes and 8,375 samples that passed the quality control and data processing procedure that was used as the benchmark data in this study. Detailed information on the samples for 46 tissues is provided in Table 1. As can be seen from Table 1, the ages of donors range from 20 to 70, and the number of samples varies from 71 to 430 for each tissue.
Table 1

Sample Information of 46 tissues in GTEX.

TissueNumberMinimumMaximumMedianMeanNumMenNumWomenProportion
Adipose_subcutaneous350217055522191311.672
Adipose_visceral_(omentum)22721705452145821.768
Adrenal_gland1452170515181641.266
Artery_aorta22421695451138861.605
Artery_coronary1332169545277561.375
Artery_tibial332207053512131191.79
Brain_amygdala722070605850222.273
Brain_anterior_cingulate_cortex_(BA24)842070605861232.652
Brain_caudate_(basal_ganglia)1172070605885322.656
Brain_cerebellar_hemisphere1052070595674312.387
Brain_cerebellum1252070595784412.049
Brain_cortex1142070595777372.081
Brain_frontal_cortex_(BA9)1082370605877312.484
Brain_hippocampus942070605765292.241
Brain_hypothalamus962070605871252.84
Brain_nucleus_accumbens_(basal_ganglia)1132070605779342.324
Brain_putamen_(basal_ganglia)972070595769282.464
Brain_spinal_cord_(cervical_c-1)712270595743281.536
Breast_mammary_tissue21421705351124901.378
Cells_EBV-transformed_lymphocytes1182170504875431.744
Cells_transformed_fibroblasts284217053.5511811031.757
Colon_sigmoid1492170565488611.443
Colon_transverse19621705048115811.42
Esophagus_gastroesophageal_junction1532170535194591.593
Esophagus_mucosa286217052.5501791071.673
Esophagus_muscularis24721705049157901.744
Heart_atrial_appendage19420705554126681.853
Heart_left_ventricle21820705351142761.868
Liver1192169555478411.902
Lung320217054522131071.991
Muscle_skeletal430207054.5522741561.756
nerve_tibial304207054521991051.895
Ovary972169515097NANA
Pancreas17121705150102691.478
Pituitary1032070595774292.552
Prostate106217050.549106NANA
Skin_not_sun_exposed_(suprapubic)25020705553164861.907
Skin_sun_exposed_(lower_leg)357217055522261311.725
Small_intestine_terminal_ileum88217049.54851371.378
Spleen1042168504860441.364
Stomach19321705148111821.354
Testis17221705250172NANA
Thyroid323207055532111121.884
Uterus832169504883NANA
Vagina962169515096NANA
Whole_blood393207054522491441.729
Sample Information of 46 tissues in GTEX.

Age Prediction Based on Single Tissue

As shown in Figure 1, our prediction framework has multiple steps. First, we rank the genes in each tissue based on the PCC of donor age and the corresponding gene expression. Top age-associated genes in one single or two tissues were then used to construct features in an elastic net regularization model, which is a sparse learning model capable of handling data with small sample sizes but numerous features (Zhou and Hastile, 2005). The parameters of the models were tuned through 10-fold CV according to the RMSE. Functions of genes were annotated by the DAVID Tools (see “Methods” for detailed information).
Figure 1

Overview of elastic net method for building age-prediction model. 1. Normalize the original gene expression data from GTEx via quantile normalization. 2. Select the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes, obtained via the Pearson correlation of the age and corresponding gene expression, and build the age-prediction model for each of 46 tissues. 3. Construct age-prediction model for multiple tissues as was done for single tissues. Because overlapping samples among three tissues are often less than 70, only two-tissue studies are contained in the current study. 4. Use the selected genes for DAVID analysis.

Overview of elastic net method for building age-prediction model. 1. Normalize the original gene expression data from GTEx via quantile normalization. 2. Select the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes, obtained via the Pearson correlation of the age and corresponding gene expression, and build the age-prediction model for each of 46 tissues. 3. Construct age-prediction model for multiple tissues as was done for single tissues. Because overlapping samples among three tissues are often less than 70, only two-tissue studies are contained in the current study. 4. Use the selected genes for DAVID analysis. Our method was first applied to 46 single tissues, respectively. The performance of each tissue is listed in Table 2. As mentioned above, the number of top age-associated genes was taken as a parameter to our model. We selected the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes and tested their performances by the 10-fold CV. It turns out that the number of top genes has some influence on prediction accuracy. The lowest RMSE (i.e., 3.8 years) was achieved for pituitary while selecting 600 genes. Pituitary is one of the most studied tissues and is highly associated with human aging (Seeman and Robbins, 1994). Other good tissues for age prediction include small intestine terminal ileum, spleen and testis, and brain/spinal cord. The most accessible tissue, whole blood, seems to be unsuitable for this task. Hannum et al. (2013) applied a blood gene expression profile to predict age based on a much larger sample size (488 in total). However, the RMSE is 7.22 years, which is comparable to our result. We also plotted the RMSEs for all other tissues (using the top 600 genes) in Figure 2A for a better view.
Table 2

Prediction accuracy by using single tissue.

TissueValidation RMSE
501002004006008001,6003,2006,400
Adipose_subcutaneous7.767.357.287.176.977.036.977.057.2
Adipose_visceral_(omentum)8.498.358.027.867.697.787.677.957.6
Adrenal_gland7.827.36.976.065.665.465.255.385.53
Artery_aorta6.846.686.436.145.935.985.775.765.9
Artery_coronary8.288.027.3275.896.125.785.846.06
Artery_tibial7.446.416.095.995.795.885.715.816.07
Brain_amygdala7.116.526.315.625.115.275.235.415.39
Brain_anterior_cingulate_cortex_(BA24)6.35.896.55.825.6866.166.326.51
Brain_caudate_(basal_ganglia)6.646.626.265.615.465.635.074.654.65
Brain_cerebellar_hemisphere7.237.537.467.526.976.96.526.096.14
Brain_cerebellum7.136.736.215.825.515.255.014.694.63
Brain_cortex7.456.987.476.576.876.815.815.925.67
Brain_frontal_cortex_(BA9)7.27.396.566.255.975.95.95.325.34
Brain_hippocampus8.048.088.216.776.736.876.96.415.54
Brain_hypothalamus6.917.056.916.596.66.436.296.196.59
Brain_nucleus_accumbens_(basal_ganglia)7.226.566.156.535.985.515.735.335.43
Brain_putamen_(basal_ganglia)7.227.096.35.565.165.195.555.525.8
Brain_spinal_cord_(cervical_c-1)6.96.865.265.325.124.914.8355.51
Breast_mammary_tissue10.38109.59.068.777.986.866.286.4
Cells_EBV-transformed_lymphocytes8.868.187.566.296.045.685.645.876.78
Cells_transformed_fibroblasts10.389.919.149.258.838.748.267.767.74
Colon_sigmoid9.428.968.88.98.368.258.367.147.5
Colon_transverse9.589.379.048.838.68.68.428.377.98
Esophagus_gastroesophageal_junction8.9498.918.618.448.357.567.186.86
Esophagus_mucosa8.498.378.287.957.857.587.567.697.58
Esophagus_muscularis7.787.657.817.697.066.916.556.046.38
Heart_atrial_appendage8.668.577.557.447.177.126.655.935.96
Heart_left_ventricle9.49.159.59.159.028.918.067.256.87
Liver7.496.766.135.926.035.695.485.776.08
Lung8.718.468.598.137.77.77.696.927.12
Muscle_skeletal8.457.837.437.287.47.527.376.966.86
Nerve_tibial6.816.546.195.886.056.225.965.715.74
Ovary6.096.145.895.785.815.465.395.225.41
Pancreas5.855.975.635.155.34.934.274.515.06
Pituitary5.535.114.574.233.83.983.924.114.55
Prostate8.868.918.688.047.457.46.886.876.57
Skin_not_sun_exposed_(suprapubic)9.048.588.2487.497.357.246.196.24
Skin_sun_exposed_(lower_leg)7.737.357.116.796.746.86.526.256.11
Small_intestine_terminal_ileum7.577.075.544.244.164.034.164.595.49
Spleen6.836.166.225.184.774.524.715.15.3
Stomach9.78.68.017.387.016.826.156.26.71
Testis6.56.035.815.55.415.314.834.924.95
Thyroid7.917.566.916.776.516.546.226.396.1
Uterus6.646.867.597.677.917.767.537.247.23
Vagina8.558.428.067.297.036.666.946.646.99
Whole_blood10.6710.610.6810.5310.5810.4810.1910.0310.08

In this table the age-prediction model established with 46 tissues using the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes with the highest age-related degree, respectively. Validation RMSE of 46 single tissues by 10-fold CV.

Figure 2

The accuracy of 46 single tissues and five double tissues in age prediction. (A) The RMSE of single tissue age predictors for the top 600 genes. We select the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes, which are obtained via Pearson correlation of age and gene expression, and then build the age-prediction model across the 46 single tissues. Because the best predictive model appears in the top 600 genes, here we show the RMSE of the top 600 gene model. As can be seen from the figure, the minimum RMSE is 3.8, which corresponds to the age-prediction model of pituitary tissue. (B) Blue represents the RMSE of the top 600 genes of pituitary and the top 50 genes of muscle, adipose subcutaneous, brain cerebellum, skin sun exposed, and whole blood, and brown represents RMSE of the first 50 genes of muscle, adipose subcutaneous, brain cerebellum, skin sun exposed, and whole blood.

Prediction accuracy by using single tissue. In this table the age-prediction model established with 46 tissues using the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes with the highest age-related degree, respectively. Validation RMSE of 46 single tissues by 10-fold CV. The accuracy of 46 single tissues and five double tissues in age prediction. (A) The RMSE of single tissue age predictors for the top 600 genes. We select the top 50, 100, 200, 400, 600, 800, 1,600, 3,200, and 6,400 genes, which are obtained via Pearson correlation of age and gene expression, and then build the age-prediction model across the 46 single tissues. Because the best predictive model appears in the top 600 genes, here we show the RMSE of the top 600 gene model. As can be seen from the figure, the minimum RMSE is 3.8, which corresponds to the age-prediction model of pituitary tissue. (B) Blue represents the RMSE of the top 600 genes of pituitary and the top 50 genes of muscle, adipose subcutaneous, brain cerebellum, skin sun exposed, and whole blood, and brown represents RMSE of the first 50 genes of muscle, adipose subcutaneous, brain cerebellum, skin sun exposed, and whole blood.

Age Prediction Using Multiple Tissues

Because aging is a process associated with multiple tissues (Kujoth et al., 2005), it is reasonable to assume that combining multiple tissues can improve age-prediction accuracy. Because there are at least 71 samples in a single tissue, we selected people with at least 70 samples in two tissues for a relatively fair comparison, which derives 382 combinations in total. The combinations were used to train 382 elastic net models (Zhou and Hastile, 2005), whose performances were also evaluated by the 10-fold CV. The results show that it is possible to improve age prediction by combining two tissues. As we mentioned above, the best prediction RMSE for single tissue (3.8 years) was achieved at pituitary with 600 genes. We added 50, 100, 200, and 400 selected genes from one other tissue, including muscle skeletal, adipose subcutaneous, brain cerebellum, skin sun exposed, and whole blood, whose performances are listed in Table 3 and shown in Figure 2B. As can be seen, the validation RMSE decreases to 3.6 by combining 50 genes from muscle skeletal (see also Figures 3A,B). However, the prediction accuracy is worse when adding other tissues, indicating that different tissues might undergo aging at different rates or mechanisms. Generally speaking, the age-prediction accuracy is elevated with the increase of tissue number, which supports that aging is a concordant process involving multiple tissues (Kujoth et al., 2005).
Table 3

Prediction accuracy by combining double tissues.

TissuesValidation RMSE
600 + 50600 + 100600 + 200600 + 400
Pituitary&muscle skeletal3.63.613.673.78
Pituitary&adipose subcutaneous4.164.234.364.36
Pituitary&brain cerebellum4.144.154.214.19
Pituitary&skin sun exposed4.0144.034.08
Pituitary&whole blood4.324.314.454.64

In this table a double age-predicting model composed of pituitary and muscle, adipose, brain, skin, and whole blood; 600 is the most age-related gene in pituitary and 50, 100, 200, and 400 are the most age-related gene in other five tissues. Validation RMSE of pituitary and five tissue models by 10-fold CV.

Figure 3

Scatterplot of age prediction and gene functional analysis. (A) Scatterplot of the pituitary age-prediction model for the top 600 genes in 46 single tissues. The RMSE is 3.8, and the PCC of real and predicted age is 0.93. (B) Scatterplot of pituitary for 600 genes and muscle skeletal for 50 genes age-prediction model. The RMSE is 3.6, and the PCC of real and predicted age is 0.95. (C) DAVID analysis of the age-prediction model in pituitary. (D) DAVID analysis of the age-prediction model in pituitary and muscle skeletal.

Prediction accuracy by combining double tissues. In this table a double age-predicting model composed of pituitary and muscle, adipose, brain, skin, and whole blood; 600 is the most age-related gene in pituitary and 50, 100, 200, and 400 are the most age-related gene in other five tissues. Validation RMSE of pituitary and five tissue models by 10-fold CV. Scatterplot of age prediction and gene functional analysis. (A) Scatterplot of the pituitary age-prediction model for the top 600 genes in 46 single tissues. The RMSE is 3.8, and the PCC of real and predicted age is 0.93. (B) Scatterplot of pituitary for 600 genes and muscle skeletal for 50 genes age-prediction model. The RMSE is 3.6, and the PCC of real and predicted age is 0.95. (C) DAVID analysis of the age-prediction model in pituitary. (D) DAVID analysis of the age-prediction model in pituitary and muscle skeletal.

Effect of Model Parameters on Prediction Accuracy

In our model, we prefilter genes and only allow the top N genes as features to be selected by the elastic net model. There are two elastic net parameters, namely α, which controls the balance between lasso and ridge regression, and λ, the lasso parameter. Because the effects of α and λ have been extensively studied (Zhou and Hastile, 2005), we tested the effect of N on validation error in this study. For most prediction models with a small validation error, the number of genes involved in the model ranges from 300 to 1600. As an indication, only a small or moderate portion of genes are necessary to predict age. This finding is also supported by other studies (Bocklandt et al., 2011; Hannum et al., 2013), in which 200 methylation markers are used to predict the biological age of individuals. The parameters of the best model (e.g., “pituitary&muscle”) are α = 0, λ = 0.5, w0 = 49.1, that is, age = 49.1 − 0.5534609 × RF00019 + 0.4345046 × RASSF8 + 0.4238481 × ALOX15B + … The model has an intercept of 49.1 years, which is quite close to the mean age of the samples 50.81.

Optimal Gene Set of Predicted Age and Functional Analysis

For the best prediction model, we listed the top 50 genes (according to the absolute value of coefficients) and their coefficients in Table 4. Among the top 50 genes, 49 are from pituitary, and only 1 is from muscle (ranked at 15). Interestingly, most of the top genes are age-associated. For example, RASSF8 (ras association domain-containing protein 8), ranks second in the list. RASSF8 encodes a protein that is a member of the transmembrane 4 superfamily and is a lung tumor–suppressor gene candidate. It plays important roles in the regulation of localization, methylation, cell–cell adhesion, cell migration, cell death, response to hypoxia, mitosis, cell growth, wound healing, contact inhibition, and epithelial cell migration (Falvella et al., 2006; Wang et al., 2017; Karthik et al., 2018; Shi L. et al., 2018). Accumulated evidence suggests that RASSF8 is associated with aging (Geigl et al., 2004; Shi Z. et al., 2018; Pagliai et al., 2019). Similarly, ALOX15B (Arachidonate 15-Lipoxygenase Type B), which ranks third on the list, is a protein-coding gene. Diseases associated with ALOX15B include autosomal recessive congenital ichthyosis and prostate cancer (Bhatia et al., 2005; Ginsburg et al., 2016; GeneCards, 2020). This gene is a senescent gene, which can also affect human aging with its expression increasing when prostate epithelial cells become senescent (Bhatia et al., 2005; Alfardan et al., 2019). In addition to age-associated genes, there are also many genes whose association with aging is unknown. For example, no association with aging could be identified in the literature for the top gene RF00019 on the list. In the future, further studies might be needed to elucidate the mechanism for age-dependent functions of RF00019.
Table 4

Best models for age prediction using pituitary & muscle skeletal tissue.

Gene symbolCoefficientTissueGene symbolCoefficientTissue
Intercept49.1
RF00019−0.5534609PituitaryHMGN2P46−0.265154Pituitary
RASSF80.43450456PituitaryAIPL1−0.262319Pituitary
ALOX15B0.42384809PituitaryAC079922.1−0.2613869Pituitary
IGSF1−0.3815586PituitaryCYP3A50.25593725Pituitary
MAOA0.3779751PituitaryMIR3186−0.248713Pituitary
PIGP−0.3643882PituitaryFA2H−0.2478653Pituitary
AC138904.1−0.3590232PituitaryLZTS1−0.2453074Pituitary
ITGA100.34749327PituitaryFKBP5−0.2403517Pituitary
CYP51A1P2−0.3468059PituitaryHTN30.23757784Pituitary
FABP60.33526575PituitaryVNN30.23713188Pituitary
AC007938.1−0.3287363PituitaryMMP11−0.2370928Pituitary
LINC01315−0.3252791PituitaryPADI20.23575174Pituitary
AL596325.20.32297086PituitaryNANOGNBP30.23556292Pituitary
LINC006620.3151238MuscleST6GALNAC5−0.2348075Pituitary
CATSPERB0.31335041PituitaryC7−0.2308648Pituitary
MUC10.31188538PituitaryKCNMB2-AS10.22953261Pituitary
NBEAP30.29659649PituitaryDQX1−0.2276446Pituitary
SNAI3−0.2943786PituitaryGSTM40.22188874Pituitary
HIST1H1C0.29287356PituitaryAC021016.10.22063205Pituitary
LINC022320.28356117PituitaryFER1L40.2180329Pituitary
S100A10.28252535PituitaryLY6G5B0.21750613Pituitary
KMO0.27801131PituitaryZBTB16−0.2170829Pituitary
HLA-DOB0.27540573PituitaryFCF1P1−0.2147114Pituitary
AC124947.10.26677666PituitaryCHRNA10.21457823Pituitary
KCNK4−0.2667203PituitaryMGAT5−0.2125122Pituitary

In this Table the coefficient of the pituitary and muscle combination model in .

Best models for age prediction using pituitary & muscle skeletal tissue. In this Table the coefficient of the pituitary and muscle combination model in .

Functional Annotation Clustering of Top Genes

To identify the biological processes associated with genes in the prediction model, we performed functional annotation analysis using the DAVID tools (Huang et al., 2009), a web-accessible set of tools that allow researchers to infer the biological meaning behind large lists of genes. Because our focus is on enriched functional categories rather than on individual genes, we selected the functional clustering with adjusted P < 0.05. The top cluster is related to glycoprotein (P = 1.79 × 10−8). Histidine-rich glycoprotein (HRG) is present at high levels in plasma, and it is synthesized by parenchymal liver cells and transported as a free protein as well as being stored in α-granules of platelets and released after thrombin stimulation (Blank and Shoenfeld, 2008). Levels of HRG variants in human blood are associated with chronological age and predict mortality (Hong et al., 2019). Also noteworthy were clusters related to age, for instance, GO:0045926~negative regulation of growth (P = 1.08 × 10−4) (Figures 3C,D).

Discussion

Each human individual has two “ages.” One is the chronological age defined by the time that has passed since birth, and the other is biological age, which describes a shortfall between a population cohort average life expectancy and the perceived life expectancy of an individual of the same age (Jackson et al., 2003). An accurate estimation of biological age is helpful in studying aging, and several approaches have been proposed so far (Borkan and Norris, 1980; Dubina et al., 1983; Hannum et al., 2013). The aging prediction strategy in this study reflects the donor's biological age, effectively providing a possible way to identify key genetics or environmental factors that lead to younger biological age than the chronological age. By constructing elastic net models, we can predict human age as well as identifying genes strongly associated with human aging. For example, RASSF8 and ALOX15B have been studied to be associated with human aging and age-associated diseases. The function enrichment analysis revealed some common functions, such as glycoprotein and signal peptide in prediction models of multiple tissues, suggesting their general association with aging. In the future, we will identify tissue-common and tissue-specific aging genes and functions. Our results suggest that the expression level of a small number of genes can reliably predict human age. In the single-tissue model, the predicted age showed a higher deviation from the true chronological age compared to predictions based on two tissues. This reveals that tissues within the same individual have heterogeneous aging rates. The tissue specificity of aging is reported by studies performed in model organisms (Herndon et al., 2002; Libina et al., 2003; Niedernhofer, 2008). On the other hand, aging is a concordant process involving multiple tissues. Different tissues have different potentials for revealing the chronological age of the host, jointly considering that multiple tissues can reduce the variation derived from a single tissue. For instance, our results indicate that blood is a poor choice for age prediction although it is one of the most accessible tissues. In both validation and test data sets, predicted age is more easily deviated from chorological age in blood compared with other tissues. The poor prediction performance of blood is also supported by the other study using the human whole blood transcriptome (Hannum et al., 2013), suggesting that the blood transcriptome fluctuates more due to its frequent interactions with other tissues and environmental factors through circulation (Benetos et al., 1993; Franklin et al., 1997). Some improvements can be expected to increase the prediction accuracy. First, only two tissues were considered in this study due to sample size limitation. In the future, we may include more tissues. Second, we only use gene expression to predict age. Many other molecular biomarkers have also been reported successfully in predicting human age, for example, methylation (Hannum et al., 2013) and telomere length (Harley et al., 1990; Benetos et al., 2001). Last, there are many choices of machine learning technologies that can be adopted, for example, support vector machine (Cortes and Vapnik, 1995) and neural network (Mcculloch and Pitts, 1990). Combining multiple types of genomics data and data analysis methods will certainly facilitate the prediction efficiency greatly (Dobin et al., 2013).

Conclusions

We have developed a computational framework to predict individual age through age-associated gene expression of single and two tissues. The predicted age is an indicator of biological age reflecting the life span and true functionality of a human body. Although gene expression from a single tissue could be used to estimate individual chronological age, the prediction accuracy is improved by properly combining those with other tissues. Different tissues provide different potential in predicting age, more reliable gene expression–based age markers are obtained in pituitary and skeletal muscle compared with blood.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

ZT, LC, JY, and GT conceived, designed, and managed the study. FW and JY performed the experiments. HL, QLi, ZY, QLu, and GT provided computational support and technical assistance. All authors approved the final manuscript.

Conflict of Interest

JY, HL, QLi, ZY, QLu, and GT were employed by the company Geneis Beijing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  38 in total

1.  Assessment of biological age by multiple regression analysis.

Authors:  T Furukawa; M Inoue; F Kajiya; H Inada; S Takasugi
Journal:  J Gerontol       Date:  1975-07

2.  Telomere length as an indicator of biological aging: the gender effect and relation with pulse pressure and pulse wave velocity.

Authors:  A Benetos; K Okuda; M Lajemi; M Kimura; F Thomas; J Skurnick; C Labat; K Bean; A Aviv
Journal:  Hypertension       Date:  2001-02       Impact factor: 10.190

3.  Arterial alterations with aging and high blood pressure. A noninvasive study of carotid and femoral arteries.

Authors:  A Benetos; S Laurent; A P Hoeks; P H Boutouyrie; M E Safar
Journal:  Arterioscler Thromb       Date:  1993-01

4.  E4BP4/NFIL3 modulates the epigenetically repressed RAS effector RASSF8 function through histone methyltransferases.

Authors:  Isai Pratha Karthik; Pavitra Desai; Sudarkodi Sukumar; Aleksandra Dimitrijevic; Krishnaraj Rajalingam; Sundarasamy Mahalingam
Journal:  J Biol Chem       Date:  2018-02-21       Impact factor: 5.157

5.  Hemodynamic patterns of age-related changes in blood pressure. The Framingham Heart Study.

Authors:  S S Franklin; W Gustin; N D Wong; M G Larson; M A Weber; W B Kannel; D Levy
Journal:  Circulation       Date:  1997-07-01       Impact factor: 29.690

6.  Telomeres shorten during ageing of human fibroblasts.

Authors:  C B Harley; A B Futcher; C W Greider
Journal:  Nature       Date:  1990-05-31       Impact factor: 49.962

7.  Biological age and its estimation. II. Assessment of biological age of albino rats by multiple regression analysis.

Authors:  T L Dubina; V A Dyundikova; E V Zhuk
Journal:  Exp Gerontol       Date:  1983       Impact factor: 4.032

Review 8.  Histidine-rich glycoprotein modulation of immune/autoimmune, vascular, and coagulation systems.

Authors:  Miri Blank; Yehuda Shoenfeld
Journal:  Clin Rev Allergy Immunol       Date:  2008-06       Impact factor: 8.667

9.  Impotence and its medical and psychosocial correlates: results of the Massachusetts Male Aging Study.

Authors:  H A Feldman; I Goldstein; D G Hatzichristou; R J Krane; J B McKinlay
Journal:  J Urol       Date:  1994-01       Impact factor: 7.450

10.  Genome-wide methylation profiles reveal quantitative views of human aging rates.

Authors:  Gregory Hannum; Justin Guinney; Ling Zhao; Li Zhang; Guy Hughes; SriniVas Sadda; Brandy Klotzle; Marina Bibikova; Jian-Bing Fan; Yuan Gao; Rob Deconde; Menzies Chen; Indika Rajapakse; Stephen Friend; Trey Ideker; Kang Zhang
Journal:  Mol Cell       Date:  2012-11-21       Impact factor: 17.970

View more
  4 in total

1.  Informative SNP Selection Based on a Fuzzy Clustering and Improved Binary Particle Swarm Optimization Algorithm.

Authors:  Zejun Li; Li Ang; Wei Shi; Ning Xin; Min Chen; Hua Tang
Journal:  Comput Math Methods Med       Date:  2022-06-16       Impact factor: 2.809

Review 2.  Effect of Aging on Homeostasis in the Soft Tissue of the Periodontium: A Narrative Review.

Authors:  Yu Gyung Kim; Sang Min Lee; Sungeun Bae; Taejun Park; Hyeonjin Kim; Yujeong Jang; Keonwoo Moon; Hyungmin Kim; Kwangmin Lee; Joonyoung Park; Jin-Seok Byun; Do-Yeon Kim
Journal:  J Pers Med       Date:  2021-01-18

3.  Development and Validation of a Nomogram for the Prediction of Hospital Mortality of Patients With Encephalopathy Caused by Microbial Infection: A Retrospective Cohort Study.

Authors:  Lina Zhao; Yun Li; Yunying Wang; Qian Gao; Zengzheng Ge; Xibo Sun; Yi Li
Journal:  Front Microbiol       Date:  2021-08-19       Impact factor: 5.640

4.  Discussion of tumor mutation burden as an indicator to predict efficacy of immune checkpoint inhibitors: A case report.

Authors:  Mingrui Wu; Lan Liang; Xiaotian Dai
Journal:  Front Oncol       Date:  2022-08-03       Impact factor: 5.738

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.