| Literature DB >> 27191382 |
Evgeny Putin1,2, Polina Mamoshina1,3, Alexander Aliper1, Mikhail Korzinkin1, Alexey Moskalev1,4, Alexey Kolosov5, Alexander Ostrovskiy5, Charles Cantor6, Jan Vijg7, Alex Zhavoronkov1,3.
Abstract
One of the major impediments in human aging research is the absence of a comprehensive and actionable set of biomarkers that may be targeted and measured to track the effectiveness of therapeutic interventions. In this study, we designed a modular ensemble of 21 deep neural networks (DNNs) of varying depth, structure and optimization to predict human chronological age using a basic blood test. To train the DNNs, we used over 60,000 samples from common blood biochemistry and cell count tests from routine health exams performed by a single laboratory and linked to chronological age and sex. The best performing DNN in the ensemble demonstrated 81.5 % epsilon-accuracy r = 0.90 with R(2) = 0.80 and MAE = 6.07 years in predicting chronological age within a 10 year frame, while the entire ensemble achieved 83.5% epsilon-accuracy r = 0.91 with R(2) = 0.82 and MAE = 5.55 years. The ensemble also identified the 5 most important markers for predicting human chronological age: albumin, glucose, alkaline phosphatase, urea and erythrocytes. To allow for public testing and evaluate real-life performance of the predictor, we developed an online system available at http://www.aging.ai. The ensemble approach may facilitate integration of multi-modal data linked to chronological age and sex that may lead to simple, minimally invasive, and affordable methods of tracking integrated biomarkers of aging in humans and performing cross-species feature importance analysis.Entities:
Keywords: aging biomarkers; biomarker development; deep learning; deep neural networks; human aging; machine learning
Mesh:
Substances:
Year: 2016 PMID: 27191382 PMCID: PMC4931851 DOI: 10.18632/aging.100968
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Figure 1Project pipeline
Laboratory blood biochemistry data sets were normalized and cleaned of outliers and some abnormal markers. For biological age prediction, 21 different DNNs with different parameters were combined in ensemble based on ElasticNet model. For biological sex prediction, single DNN were trained.
Figure 2Analysis of best DNN model in the ensemble and the whole ensemble
(A) Correlation between actual and predicted age values by the best DNN in the ensemble. (B) Biological age epsilon-prediction accuracy plot for the best DNN. (C) Biological age marker Importance, performed using FPI method. (D) Correlation between actual and predicted age values by whole ensemble based on ElasticNet model. (E) Biological age epsilon-prediction accuracy plot for the ensemble. (F) Heat map for Pearson's correlation coefficients between 40 DNNs. Scale bar colors indicate the sign and magnitude of Pearson's correlation coefficient between predictions of DNNs.
Figure 3DNNs outperform baseline ML approaches in terms of R2 statistics
DNN were compared with 7 ML techniques: GBM (Gradient Boosting Machine), RF (Random Forests), DT (Decision Trees), LR (Linear Regression), kNN (k-Nearest Neighbors), ElasticNet, SVM (Support Vector Machines). (A) GBM shows the higher 0,72 R2 among ML models for biological age prediction. (B) All ML models have comparable high R2 for biological sex prediction.
Figure 4Comparison of sub-models for stacking ensemble and evaluation of filling strategies
(A) ElasticNet model has the higher epsilon-prediction accuracy among the stacking models. (B) ElasticNet is the best model for stacking from the point of R2 statistics. (C) Median filling strategy has higher epsilon-prediction accuracy than other strategies. Median filling strategy shows 64,5 % epsilon accuracy within 10 years frame. (D) Median filling strategy is better from the point of R2 statistics.
Figure 5Top features analysis
(A) Dependence of the epsilon-prediction accuracy from the number of features. (B) Dependence of R2 statistics from the number of features.