| Literature DB >> 28783712 |
Diogo G Barardo1, Danielle Newby2, Daniel Thornton1, Taravat Ghafourian3, João Pedro de Magalhães1, Alex A Freitas4.
Abstract
Increasing age is a risk factor for many diseases; therefore developing pharmacological interventions that slow down ageing and consequently postpone the onset of many age-related diseases is highly desirable. In this work we analyse data from the DrugAge database, which contains chemical compounds and their effect on the lifespan of model organisms. Predictive models were built using the machine learning method random forests to predict whether or not a chemical compound will increase Caenorhabditis elegans' lifespan, using as features Gene Ontology (GO) terms annotated for proteins targeted by the compounds and chemical descriptors calculated from each compound's chemical structure. The model with the best predictive accuracy used both biological and chemical features, achieving a prediction accuracy of 80%. The top 20 most important GO terms include those related to mitochondrial processes, to enzymatic and immunological processes, and terms related to metabolic and transport processes. We applied our best model to predict compounds which are more likely to increase C. elegans' lifespan in the DGIdb database, where the effect of the compounds on an organism's lifespan is unknown. The top hit compounds can be broadly divided into four groups: compounds affecting mitochondria, compounds for cancer treatment, anti-inflammatories, and compounds for gonadotropin-releasing hormone therapies.Entities:
Keywords: C. elegans; ageing; anti-ageing drugs; bioinformatics; longevity; machine learning; pharmaceutical interventions
Mesh:
Year: 2017 PMID: 28783712 PMCID: PMC5559171 DOI: 10.18632/aging.101264
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Predictive accuracy (median AUC values on 10-fold cross validation) obtained by random forest with parameters optimized for each DrugAge dataset version (each with a different feature type combination)
| Dataset features | RF's optimized parameters | Median AUC | |
|---|---|---|---|
| ntrees | mtry | ||
| GO terms only | 300 | 52 | 0.716 |
| Chemical descriptors only | 100 | 16 | 0.781 |
| GO terms and chemical descriptors | 900 | 210 | 0.800 |
Top 20 selected features with highest median variable importance
| Median Variable Importance | Feature | Feature type | Feature Description |
|---|---|---|---|
| 14.4 | a_nN | MD | Number of nitrogen atoms in the molecule |
| 12.8 | isomerase activity | GO | Catalysis of the geometric or structural changes within one molecule |
| 11.8 | macromitophagy | GO | Degradation of a mitochondrion by macroautophagy |
| 11.6 | macroautophagy | GO | Process in which cellular contents are degraded by lysosomes |
| 11.1 | protein disulfide isomerase activity | GO | Catalysis of the rearrangement of both intrachain and interchain disulfide bonds in proteins. |
| 11.0 | dipeptidase activity | GO | Catalysis of the hydrolysis of a dipeptide. |
| 9.72 | pyruvate metabolic process | GO | The chemical reactions and pathways involving pyruvate |
| 9.47 | PEOE_VSA+4 | MD | Total positive van der waals surface area of atoms with atomic charge in the range of 0.20-0.25. |
| 9.31 | fatty acid transport | GO | The directed movement of fatty acids into, out of or within a cell, or between cells |
| 8.79 | mitochondrial electron transport, NADH to ubiquinone | GO | The transfer of electrons from NADH to ubiquinone mediated by the multisubunit enzyme known as complex I |
| 8.64 | vsurf_Wp2 | MD | Polar volume at -0.5, a descriptor reflecting the polarizability of a molecule |
| 8.57 | isotype switching | GO | The switching of activated B cells from IgM biosynthesis to biosynthesis of other isotypes |
| 8.40 | translation | GO | The cellular metabolic process in which a protein is formed |
| 8.18 | Q_RPC- | MD | Relative negative partial charge, defined as the most negative atomic charge divided by the sum of all negative atomic charges in the molecule. |
| 8.09 | aerobic respiration | GO | The enzymatic release of energy from inorganic and organic compounds |
| 7.98 | a_IC | MD | Atom information content (total), defined as the entropy of the element distribution in the molecule multiplied by the number of atoms. |
| 7.95 | PEOE_VSA_FPPOS | MD | Fractional polar positive vdw surface area |
| 7.86 | triglyceride mobilization | GO | The release of triglycerides from storage within cells or tissues, making them available for metabolism. |
| 7.79 | chi1v | MD | Valence corrected molecular connectivity index (order 1) |
| 7.70 | bpol | MD | Sum of the absolute value of the difference between atomic polarizabilities of all bonded atoms in the molecule |
GO: Gene ontology term; MD: Chemical Molecular descriptor
Top 20 chemical compounds with the highest lifespan-increase class probability from the external screening dataset
| Chemical Compound Name | Predicted Probability |
|---|---|
| acrolein | 0.691 |
| valspodar | 0.683 |
| ganirelix | 0.674 |
| acetaldehyde | 0.669 |
| mmk-1 | 0.667 |
| rdp-58 | 0.665 |
| cetrorelix | 0.657 |
| gal-b5 | 0.656 |
| m40 | 0.654 |
| DB03393 | 0.650 |
| bortezomib | 0.650 |
| ro 25-1392 | 0.650 |
| gv1001 | 0.650 |
| lactose | 0.650 |
| ergotamine | 0.650 |
| cardiolipin | 0.642 |
| dactinomycin | 0.642 |
| abt-510 | 0.640 |
| aplyronine a | 0.637 |
| valinomycin | 0.637 |
Compound numbers for the DGIdb dataset and different combinations of DrugAge datasets using different combinations of chemical and biological descriptors used in this work
| Dataset | n Positive | n Negative | n Total | Type of features used |
|---|---|---|---|---|
| DrugAge_1 | 190 | 783 | 973 | GO terms ONLY |
| DrugAge_2 | 229 | 1163 | 1392 | Chemical Descriptor ONLY |
| DrugAge_3 | 190 | 783 | 973 | GO terms + Chemical Descriptors |
| DGIdb | - | - | 6802 | GO terms + Chemical Descriptors |
Notation used in the table: n – number of compounds; Positive – increases longevity; Negative – no effect or decrease in longevity; Biological descriptors – GO terms (all three types); Chemical descriptors – molecular descriptors calculated from the chemical structure of compound entries using cheminformatics software.