| Literature DB >> 33253142 |
F William Townes1, Kareem Carr2, Jeffrey W Miller2.
Abstract
Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.Entities:
Mesh:
Year: 2020 PMID: 33253142 PMCID: PMC7728194 DOI: 10.1371/journal.pcbi.1008429
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Ranking machine learning algorithms based on AUC.
Numeric values indicate the fraction of times the row algorithm has higher classification performance than the column algorithm. pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function, xgb: gradient boosted trees, nb: naive Bayes, knn: k-nearest neighbors.
Fig 2Combining gene expression (archs4, gxp) with gene ontology (GO) features yields improved classification performance in terms of AUC.
pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function. An AUC value of 1 indicates perfect classification, whereas an AUC of 0.5 signifies performance no better than random.
Fig 3Receiver operator curves (ROC) for the best performing algorithm (pglm: Elastic net penalized logistic regression) with the best performing feature sets (GO+GXP for yeast and GO+ARCHS4 for worm).
Each curve represents predictive performance on the held-out data from a single cross validation fold. The diagonal gray dotted line indicates the theoretical performance of an untrained random classifier as a baseline.
Top pro-longevity and anti-longevity genes not in GenAge predicted using GO terms and ARCHS4 gene expression for worm and yeast with the pglm (GLM-Net) algorithm.
| Species | Effect | Gene | Prob | ID | Description from ENSEMBL |
|---|---|---|---|---|---|
| worm | pro-longevity | CLEC-196 | 0.868 | WBGene00009156 | C-type LECtin |
| F44E5.4 | 0.866 | WBGene00009691 | |||
| CEH-13 | 0.859 | WBGene00000437 | Homeobox protein ceh-13 | ||
| LPR-3 | 0.853 | WBGene00012261 | LiPocalin-Related protein | ||
| HIL-7 | 0.845 | WBGene00001858 | HIstone H1 Like; Histone H1.Q | ||
| W04A8.4 | 0.836 | WBGene00012239 | |||
| TTH-1 | 0.816 | WBGene00006649 | Thymosin beta | ||
| GST-1 | 0.814 | WBGene00001749 | Glutathione S-transferase P | ||
| F44E5.5 | 0.812 | WBGene00009692 | |||
| F20C5.6 | 0.807 | WBGene00008971 | |||
| worm | anti-longevity | RPL-34 | 0.986 | WBGene00004448 | Ribosomal Protein, Large subunit |
| MSP-59 | 0.985 | WBGene00003452 | Major sperm protein | ||
| Y59E9AR.7 | 0.982 | WBGene00022002 | Major sperm protein | ||
| RPL-39 | 0.982 | WBGene00004453 | 60S ribosomal protein L39 | ||
| MSP-57 | 0.981 | WBGene00003450 | Major sperm protein | ||
| MSP-81 | 0.981 | WBGene00003467 | Major sperm protein | ||
| MSP-113 | 0.979 | WBGene00003468 | Major sperm protein | ||
| MSP-19 | 0.978 | WBGene00003426 | Major sperm protein | ||
| NLP-27 | 0.977 | WBGene00003765 | Neuropeptide-Like Protein | ||
| RPL-11.1 | 0.977 | WBGene00004422 | 60S ribosomal protein L11-1 | ||
| yeast | pro-longevity | ACS1 | 0.882 | YAL054C | Acetyl-coA synthetase isoform |
| UBC5 | 0.863 | YDR059C | Ubiquitin-conjugating enzyme | ||
| ETR1 | 0.824 | YBR026C | 2-enoyl thioester reductase | ||
| UBI4 | 0.779 | YLL039C | Ubiquitin | ||
| PDI1 | 0.72 | YCL043C | Protein disulfide isomerase | ||
| PRE3 | 0.713 | YJL001W | Beta 1 subunit of the 20S proteasome | ||
| POR1 | 0.705 | YNL055C | Mitochondrial porin (voltage-dependent anion channel) | ||
| PRE7 | 0.701 | YBL041W | Beta 6 subunit of the 20S proteasome | ||
| HSP12 | 0.698 | YFL014W | Plasma membrane protein involved in maintaining membrane organization | ||
| SBA1 | 0.695 | YKL117W | Co-chaperone that binds and regulates Hsp90 family chaperones | ||
| yeast | anti-longevity | RPS30B | 1 | YOR182C | Protein component of the small (40S) ribosomal subunit |
| TMA23 | 1 | YMR269W | Nucleolar protein implicated in ribosome biogenesis | ||
| URA3 | 1 | YEL021W | Orotidine-5’-phosphate (OMP) decarboxylase | ||
| RPS29B | 0.999 | YDL061C | Protein component of the small (40S) ribosomal subunit | ||
| RLP24 | 0.999 | YLR009W | Essential protein required for ribosomal large subunit biogenesis | ||
| COX9 | 0.999 | YDL067C | Subunit VIIa of cytochrome c oxidase (Complex IV) | ||
| HOR7 | 0.999 | YMR251W-A | Protein of unknown function | ||
| TOM7 | 0.999 | YNL070W | Component of the TOM (translocase of outer membrane) complex | ||
| MFA1 | 0.999 | YDR461W | Mating pheromone a-factor | ||
| TAR1 | 0.999 | YLR154W-C | Protein potentially involved in regulation of respiratory metabolism |
Fig 4Predicted probability of a gene being pro-aging versus effect of deletion on replicative lifespan (RLS) in yeast.
Probabilities are from the pglm classifier trained on the full GenAge dataset. Solid curve is a nonparametric smoother.
Top GO terms identified by the pglm (GLM-Net) algorithm.
logOR: log-odds ratio. OR: odds ratio. Positive logOR indicates a gene annotated to that GO term is more likely to be pro-longevity. BP: biological process, CC: cellular component, MF: molecular function.
| Species | ID | logOR | OR | Type | Description |
|---|---|---|---|---|---|
| worm | GO:0006412 | -0.98 | 0.38 | BP | translation |
| GO:0005634 | 0.89 | 2.4 | CC | nucleus | |
| GO:0015031 | 0.82 | 2.3 | BP | protein transport | |
| GO:0005789 | 0.77 | 2.1 | CC | endoplasmic reticulum membrane | |
| GO:0005840 | -0.73 | 0.48 | CC | ribosome | |
| GO:0009792 | 0.68 | 2 | BP | embryo development ending in birth or egg hatching | |
| GO:0006511 | 0.66 | 1.9 | BP | ubiquitin-dependent protein catabolic process | |
| GO:0009408 | 0.66 | 1.9 | BP | response to heat | |
| GO:0043005 | 0.65 | 1.9 | CC | neuron projection | |
| GO:0030150 | 0.6 | 1.8 | BP | protein import into mitochondrial matrix | |
| GO:0055120 | -0.6 | 0.55 | CC | striated muscle dense body | |
| GO:0005783 | 0.6 | 1.8 | CC | endoplasmic reticulum | |
| GO:0046872 | -0.53 | 0.59 | MF | metal ion binding | |
| GO:0005739 | -0.52 | 0.59 | CC | mitochondrion | |
| GO:0006281 | -0.52 | 0.59 | BP | DNA repair | |
| GO:0035556 | -0.52 | 0.59 | BP | intracellular signal transduction | |
| GO:0045893 | 0.52 | 1.7 | BP | positive regulation of transcription; DNA-templated | |
| GO:0008289 | 0.52 | 1.7 | MF | lipid binding | |
| GO:0048477 | 0.5 | 1.6 | BP | oogenesis | |
| GO:0003824 | 0.49 | 1.6 | MF | catalytic activity | |
| yeast | GO:0001302 | 1.8 | 5.8 | BP | replicative cell aging |
| GO:0006915 | 0.87 | 2.4 | BP | apoptotic process | |
| GO:0016020 | -0.82 | 0.44 | CC | membrane | |
| GO:0005634 | 0.73 | 2.1 | CC | nucleus | |
| GO:0000183 | 0.72 | 2.1 | BP | chromatin silencing at rDNA | |
| GO:0005624 | 0.71 | 2 | CC | membrane fraction | |
| GO:0007049 | 0.67 | 1.9 | BP | cell cycle | |
| GO:0005739 | 0.64 | 1.9 | CC | mitochondrion | |
| GO:0005515 | -0.64 | 0.53 | MF | protein binding | |
| GO:0003824 | 0.61 | 1.8 | MF | catalytic activity | |
| GO:0031307 | 0.56 | 1.7 | CC | integral component of mitochondrial outer membrane | |
| GO:0000723 | 0.55 | 1.7 | BP | telomere maintenance | |
| GO:0005758 | 0.53 | 1.7 | CC | mitochondrial intermembrane space | |
| GO:0055085 | 0.53 | 1.7 | BP | transmembrane transport | |
| GO:0017111 | 0.52 | 1.7 | MF | nucleoside-triphosphatase activity | |
| GO:0006811 | 0.51 | 1.7 | BP | ion transport | |
| GO:0006281 | 0.5 | 1.7 | BP | DNA repair | |
| GO:0034599 | 0.48 | 1.6 | BP | cellular response to oxidative stress | |
| GO:0008270 | -0.48 | 0.62 | MF | zinc ion binding | |
| GO:0045861 | 0.47 | 1.6 | BP | negative regulation of proteolysis |