| Literature DB >> 28265788 |
Fabio Fabris1, João Pedro de Magalhães2, Alex A Freitas3.
Abstract
Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.Entities:
Keywords: Ageing; Model interpretation; Supervised machine learning
Mesh:
Year: 2017 PMID: 28265788 PMCID: PMC5350215 DOI: 10.1007/s10522-017-9683-y
Source DB: PubMed Journal: Biogerontology ISSN: 1389-5729 Impact factor: 4.277
Fig. 1Overview of the supervised learning process, adapted from (Kuncheva 2004)
Fig. 2Categorisation of works using supervised machine learning applied to the biology of ageing
Main data analysis characteristics of papers that focus on applying some supervised machine learning algorithm to tackle a biological ageing problem and then interpret the results to get some type of biological insight about the ageing process
| Type of sup. machine learning problem | References | Paper’s title | Supervised learning algorithm | Feature type | Species |
|---|---|---|---|---|---|
| Binary classification problem (involving DNA repair and ageing-related proteins) |
Freitas et al. ( | A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related | Decision tree (J48) | Protein–protein interactions (PPI), Gene Expression, Gene Ontology terms, type of DNA Repair, Dn/Ds ratio | Human |
| Binary classification problem (involving DNA repair and ageing-related proteins) |
Jiang and Ching ( | Classifying DNA repair genes by kernel-based support vector machines | SVM (Support vector machine) | Gene expression levels | Human |
| Binary classification problem (involving DNA repair and ageing-related proteins) |
Fang et al. ( | Classifying aging genes into DNA repair or non-DNA repair-related categories | Feature selection based on random forests | Protein-protein interactions (PPI) | Human |
| Binary classification using hierarchical features (pro-longevity vs. anti-longevity proteins) |
Wan and Freitas ( | Prediction of the pro-longevity or anti-longevity effect of | Hierarchical feature selection used in the first phase of a naive Bayes algorithm | Gene Ontology terms | Worm |
| Binary classification using hierarchical features (pro-longevity vs. anti-longevity proteins) |
Wan et al. ( | Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods | Hierarchical feature selection used in the first phase of a naive Bayes algorithm | Gene Ontology terms | Worm, fly, mouse, yeast |
| Hierarchical classification (using proteins as instances and ageing-related GO terms as classes) |
Fabris et al. ( | An extensive empirical comparison of probabilistic hierarchical classifiers in datasets of ageing-related genes | Decision tree for hierarchical classification | Protein-protein interactions | Worm, fly, mouse, human, yeast |
| Binary classification (ageing-related vs. non-ageing-related mortality-related proteins) |
Fabris and Freitas ( | New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins | Decision table | KEGG pathway features | Mouse |
| Binary classification (ageing-related vs. non-ageing-related genes)a |
Song et al. ( | Discovering aging-genes by topological features in | SVM | PPI Network features | Fruit fly |
| Binary classification (ageing-related vs. non-ageing-related genes)a |
Feng et al. ( | Topological analysis and prediction of aging genes in | SVM | PPI Network features | Mouse |
| Binary classification (ageing-related vs. non-ageing-related genes)a |
Li et al. ( | Computational prediction of aging genes in human | SVM, k-NN, Decision tree | PPI Network features | Human |
| Binary classification (Longevity vs. non-longevity genes) |
Li et al. ( | Systematic analysis and prediction of longevity genes in | SVM, k-NN, Decision tree | Functional interaction network features, conservation score | Worm |
| Two-layer binary classification (life span change and then increase or decrease the life span of genes) |
Huang et al. ( | Deciphering the effects of gene deletion on yeast longevity using network and machine learning approaches | Selected features using k-NN with Incremental feature selection | PPI Network, biochemical, physicochemical, functional, and deletion features | Yeast |
| Regression (prediction of rate of ageing) |
Nakamura and Miyao ( | A method for identifying biomarkers of aging and constructing an index of biological age in humans | Logistic regression | Various physiological biomarkers | Human |
| Regression (prediction of chronological age) |
Hannum et al. ( | Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates | Elastic net | Methylome profile | Human |
| Regression (prediction of chronological age) |
Horvath ( | DNA methylation age of human tissues and cell types | Elastic net | Methylome profile | Human |
| Regression (prediction of chronological age) |
Weidner et al. ( | Aging of blood can be tracked by DNA methylation changes at just three CpG sites | Unspecified regression algorithm | Methylome profile | Human |
| Regression (prediction of chronological age) |
Fortney et al. ( | Inferring the functions of longevity genes with modular subnetwork biomarkers of | SVR (support vector regression) | Modular features from gene interaction networks | Worm |
| Regression (prediction of chronological age) |
Putin et al. ( | Deep biomarkers of human aging: Application of deep neural networks to biomarker development | Deep neural network | Features extracted from standard blood tests | Human |
| Regression (prediction of chronological age and survival) |
Kerber et al. ( | Gene expression profiles associated with aging and mortality in humans | LASSO regression algorithm | Gene expression profiles | Human |
Columns 1 and 4 to 6 inform us, respectively, the type of supervised learning problem that was considered in the paper, the supervised learning algorithm whose results were interpreted, the feature type used by the algorithm and the species that were considered in the interpretation
Note that a paper may contain other machine learning algorithms, feature types and species whose results were not interpreted and therefore are not listed in the table
aThese works also include analysis of individual features