| Literature DB >> 32575372 |
Daniel M Bean1,2, Ammar Al-Chalabi3,4, Richard J B Dobson1,2,5, Alfredo Iacoangeli1,4.
Abstract
Amyotrophic lateral sclerosis is a neurodegenerative disease of the upper and lower motor neurons resulting in death from neuromuscular respiratory failure, typically within two to five years of first symptoms. Several rare disruptive gene variants have been associated with ALS and are responsible for about 15% of all cases. Although our knowledge of the genetic landscape of this disease is improving, it remains limited. Machine learning models trained on the available protein-protein interaction and phenotype-genotype association data can use our current knowledge of the disease genetics for the prediction of novel candidate genes. Here, we describe a knowledge-based machine learning method for this purpose. We trained our model on protein-protein interaction data from IntAct, gene function annotation from Gene Ontology, and known disease-gene associations from DisGeNet. Using several sets of known ALS genes from public databases and a manual review as input, we generated a list of new candidate genes for each input set. We investigated the relevance of the predicted genes in ALS by using the available summary statistics from the largest ALS genome-wide association study and by performing functional and phenotype enrichment analysis. The predicted sets were enriched for genes associated with other neurodegenerative diseases known to overlap with ALS genetically and phenotypically, as well as for biological processes associated with the disease. Moreover, using ALS genes from ClinVar and our manual review as input, the predicted sets were enriched for ALS-associated genes (ClinVar p = 0.038 and manual review p = 0.060) when used for gene prioritisation in a genome-wide association study.Entities:
Keywords: amyotrophic lateral sclerosis; gene discovery; gene prioritisation; knowledge graph; machine learning; motor neurone disease
Mesh:
Year: 2020 PMID: 32575372 PMCID: PMC7349022 DOI: 10.3390/genes11060668
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Overlap between lists of ALS-linked genes from different sources. Note that areas are not to scale. There are no genes that are listed in both ClinVar and DisGeNet but not found in either ALSoD or the manual review. In total, there are 199 unique genes linked to ALS across all lists.
Cross-validation performance for each definition of ALS-linked genes. Precision, recall, and fold-change enrichment are given as the mean (standard deviation) for five folds. The numbers of genes are minimum-maximum. The fold-change enrichment is based on the expected precision of random guessing. Each of the five folds is considered significantly enriched if the p-value for predicting at least as many genes correctly is < 0.05 under the hypergeometric distribution.
| Training | Validation | |||||||
|---|---|---|---|---|---|---|---|---|
| # of Genes | Precision | Recall | # of Genes | Precision | Recall | Fold-Change Enrichment | Number of Significantly Enriched Folds | |
| ALSoD | 60–61 | 0.23 (0.16) | 0.55 (0.09) | 15–16 | 0.07 (0.08) | 0.45 (0.17) | 23.33 (23.53) | 5/5 |
| ClinVar | 29–30 | 0.16 (0.09) | 0.86 (0.04) | 7–8 | 0.05 (0.02) | 0.82 (0.17) | 30.05 (15.64) | 5/5 |
| DisGeNet | 54–55 | 0.44 (0.33) | 0.23 (0.16) | 13–14 | 0.15 (0.22) | 0.09 (0.08) | 55.90 (81.66) | 2/5 |
| Manual | 21–22 | 0.27 (0.01) | 0.85 (0.04) | 5–6 | 0.09 (0.01) | 0.86 (0.14) | 84.54 (13.27) | 5/5 |
| Union | 96 | 0.16 (0.04) | 0.73 (0.05) | 24 | 0.04 (0.02) | 0.67 (0.07) | 8.92 (4.28) | 5/5 |
Figure 2Overlap of predicted ALS-linked genes from the five gene lists. “Union prediction” is the new predictions made, based on the union of all other lists of known linked genes, not the union of all predictions. Note that areas are not to scale.
Genes predicted to be linked to ALS by the model trained on the manually curated list. For each gene, we reported the Ensembl gene name, coordinates (hg19) of the longest transcript, strand, and whether the gene was present in the other lists of genes or among their model predictions. Please note that if a gene were present in one list, it could not be predicted by its corresponding model.
| Ensembl Gene Name | Ensembl Transcript Name | Chr | Transcript Start | Transcript End | Strand | ClinVar Genes | ClinVar Predictions | DisGeNet Genes | DisGeNet Predictions | ALSoD Genes | ALSoD Predictions | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALS2 | ENSG00000003393 | ENST00000489440 | 2 | 202581364 | 202591275 | − | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| BCL2L1 | ENSG00000171552 | ENST00000307677 | 20 | 30252254 | 30310701 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| BSG | ENSG00000172270 | ENST00000573216 | 19 | 572571 | 581376 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| CASP1 | ENSG00000137752 | ENST00000436863 | 11 | 104896234 | 104905977 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| CHMP2B | ENSG00000083937 | ENST00000466696 | 3 | 87302198 | 87303063 | + | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| CLU | ENSG00000120885 | ENST00000522413 | 8 | 27463898 | 27472209 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| CNTF | ENSG00000242689 | ENST00000361987 | 11 | 58390145 | 58393198 | + | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| CREBBP | ENSG00000005339 | ENST00000574740 | 16 | 3786508 | 3794958 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| CST3 | ENSG00000101439 | ENST00000398409 | 20 | 23614293 | 23619110 | − | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
| CTSD | ENSG00000117984 | ENST00000438213 | 11 | 1775253 | 1782770 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| DPP6 | ENSG00000130226 | ENST00000377770 | 7 | 153749764 | 154685161 | + | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
| ERBB4 | ENSG00000178568 | ENST00000484594 | 2 | 212426486 | 213403306 | − | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| FOS | ENSG00000170345 | ENST00000556324 | 14 | 75745530 | 75746234 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| GDI1 | ENSG00000203879 | ENST00000465640 | X | 153670112 | 153671075 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| GFAP | ENSG00000131095 | ENST00000253408 | 17 | 42982993 | 42992920 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| GLE1 | ENSG00000119392 | ENST00000309971 | 9 | 131266978 | 131304567 | + | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
| GSR | ENSG00000104687 | ENST00000221130 | 8 | 30535582 | 30585443 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| GSTP1 | ENSG00000084207 | ENST00000489040 | 11 | 67351604 | 67352535 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| GSX2 | ENSG00000180613 | ENST00000326902 | 4 | 54966197 | 54968672 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| HSF1 | ENSG00000185122 | ENST00000529630 | 8 | 145532954 | 145533780 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| INA | ENSG00000148798 | ENST00000369849 | 10 | 105036919 | 105050108 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| JAK3 | ENSG00000105639 | ENST00000526008 | 19 | 17949078 | 17958841 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| JUND | ENSG00000130522 | ENST00000600972 | 19 | 18390828 | 18391739 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| KIF3C | ENSG00000084731 | ENST00000455394 | 2 | 26149470 | 26205366 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| LAT | ENSG00000213658 | ENST00000566415 | 16 | 29000897 | 29001776 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| LDLR | ENSG00000130164 | ENST00000560467 | 19 | 11215982 | 11224300 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| PARK7 | ENSG00000116288 | ENST00000465354 | 1 | 8021807 | 8031581 | + | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE |
| PLA2G4A | ENSG00000116711 | ENST00000466600 | 1 | 186823417 | 186908362 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| PPARGC1A | ENSG00000109819 | ENST00000264867 | 4 | 23793643 | 23891700 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| PRPH | ENSG00000135406 | ENST00000551194 | 12 | 49687034 | 49687780 | + | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| RXRA | ENSG00000186350 | ENST00000484822 | 9 | 137208943 | 137298240 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| SELPLG | ENSG00000110876 | ENST00000228463 | 12 | 109016604 | 109025854 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| SHC1 | ENSG00000160691 | ENST00000448116 | 1 | 154934773 | 154943223 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| SLC1A2 | ENSG00000110436 | ENST00000531628 | 11 | 35287147 | 35323075 | − | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
| SNAI1 | ENSG00000124216 | ENST00000244050 | 20 | 48599535 | 48605423 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| SOD2 | ENSG00000112096 | ENST00000541573 | 6 | 160103513 | 160113110 | − | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
| TIAM1 | ENSG00000156299 | ENST00000455508 | 21 | 32638611 | 32716594 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| TLE3 | ENSG00000140332 | ENST00000557815 | 15 | 70341315 | 70351129 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| TMSB4X | ENSG00000205542 | ENST00000451311 | X | 12993226 | 12995346 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| TNF | ENSG00000232810 | ENST00000449264 | 6 | 31543344 | 31546113 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| TP53 | ENSG00000141510 | ENST00000574684 | 17 | 7577571 | 7578437 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| TRPM7 | ENSG00000092439 | ENST00000558444 | 15 | 50867155 | 50874661 | − | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
| VIM | ENSG00000026025 | ENST00000544301 | 10 | 17270257 | 17279584 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| WNT7A | ENSG00000154764 | ENST00000285018 | 3 | 13857754 | 13921618 | − | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
| XIAP | ENSG00000101966 | ENST00000496602 | X | 123046609 | 123047465 | + | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE |
Figure 3Overlap of the predicted ALS-linked genes from each input list with the other lists of known genes. All the predictions from the Manual list are already contained in at least one other list. By definition, none of the predictions from the union list are in any other list.
Top 20 most enriched Revigo-grouped biological processes (ranked by enrichment fold) for each set of predicted ALS-linked genes. Enrichment was calculated using Pantherdb. All terms are significant at p-value < 0.05 level following the Bonferroni correction.
| Rank | DisGeNet | ALSoD | Manual | ClinVar | Union |
|---|---|---|---|---|---|
| 1 | Smooth muscle contraction | Peripheral nervous system development | Erythrocyte differentiation | Cardiac muscle tissue development | Peripheral nervous system development |
| 2 | Response to xenobiotic stimulus | Response to xenobiotic stimulus | Translational termination | Nuclear migration | Protein localization to Golgi apparatus |
| 3 | Response to antibiotic | Phosphatidylcholine metabolic process | Phospholipid catabolic process | Nucleus localization | Response to antibiotic |
| 4 | Tissue remodelling | Inactivation of MAPK activity | Regulation of transcription from RNA polymerase II promoter in response to stress | Heart contraction | Regulation of neuron death |
| 5 | Sprouting angiogenesis | Ammonium transport | Response to heat | Cellular component assembly involved in morphogenesis | Mitochondrial fusion |
| 6 | Peripheral nervous system development | Protein localization to Golgi apparatus | Endosome transport via multivesicular body sorting pathway (GO:0032509) | Myofibril assembly | Response to mechanical stimulus |
| 7 | Response to hypoxia | Response to antibiotic | Multivesicular body sorting pathway | Response to mechanical stimulus | Regulation of phosphatidylinositol 3-kinase signalling |
| 8 | Regulation of blood pressure | Triglyceride homeostasis | Glutathione metabolic process | Skeletal muscle contraction | Ammonium transport |
| 9 | Ammonium transport | Behaviour | Regulation of cysteine-type endopeptidase activity | Protein homooligomerization | Regulation of phospholipase activity |
| 10 | Regulation of phospholipase activity | Ammonium ion metabolic process | Cellular modified amino acid metabolic process | Muscle structure development | Regulation of lipase activity |
| 11 | Phospholipase C-activating G protein-coupled receptor signalling pathway | Positive regulation of lipase activity | Wnt signalling pathway | Cellular lipid catabolic process | Protein targeting to the vacuole |
| 12 | Regulation of mitochondrion organization | Organophosphate ester transport | Apoptotic process | Synapse organization | Phospholipase C-activating G protein-coupled receptor signalling pathway |
| 13 | Positive regulation of protein kinase B signalling | Phospholipid transport | Cell death | Lipid catabolic process | Positive regulation of blood pressure |
| 14 | Response to oxidative stress | Phospholipase C-activating G protein-coupled receptor signalling pathway | Sulfur compound metabolic process | Circulatory system process | Action potential |
| 15 | Regulation of lipase activity | Positive regulation of the cellular catabolic process | Response to abiotic stimulus | Actomyosin structure organization | Membrane protein proteolysis |
| 16 | Superoxide metabolic process | Regulation of neurotransmitter levels | Positive regulation of DNA-binding transcription factor activity | Cell-cell adhesion via plasma-membrane adhesion molecules | Behaviour |
| 17 | Membrane protein proteolysis | Regulation of trans-synaptic signalling | Response to cytokine | Regulation of canonical Wnt signalling pathway | Protein localization to the vacuole |
| 18 | Behaviour | Regulation of lipase activity | Positive regulation of molecular function | Negative regulation of the apoptotic process | Vacuolar transport |
| 19 | Glycolytic process | Glutamate receptor signalling pathway | Cell surface receptor signalling pathway | Circulatory system development | Ammonium ion metabolic process |
| 20 | Cyclic nucleotide metabolic process | Response to drug | Regulation of molecular function | Maintenance of location | Regulation of trans-synaptic signalling |
Top 5 most enriched human diseases in Enrichr analysis based on the OMIM disease database for each set of predicted ALS-linked genes. Overlapping genes are between brackets. Only diseases significant at adjusted p-value < 0.05 are reported.
| Rank | DisGeNet | ALSoD | Manual | ClinVar | Union |
|---|---|---|---|---|---|
| 1 | Parkinson disease (PRKN;NR4A2;PINK1;UCHL1;TBP;HTRA2;MAPT;SNCAIP;FBXO7;SNCA) | Parkinson disease (PRKN;NR4A2;PINK1;UCHL1;TBP;HTRA2;DBH;SNCAIP;FBXO7;SNCA) | Amyotrophic lateral sclerosis (ALS2;CHMP2B;TRPM7;PRPH) | Cardiomyopathy(DSP;MYBPC3;CAV3;ACTN2;TPM1;LDB3;ABCC9;PSEN1;TAZ;TTN;PLN;SGCD;DES;ACTC1;MYL2;LMNA;MYL3;TCAP;TNNI3;DMD;SCN5A;MYH6;VCL;MYH7) | Cardiomyopathy (DSP;MYBPC3;CAV3;ACTN2;TPM1;PSEN2;LDB3;ABCC9;TAZ;TTN;PLN;SGCD;DES;ACTC1;MYL2;LMNA;MYL3;TCAP;TNNI3;DMD;SCN5A;MYH6;VCL;MYH7) |
| 2 | Dystonia (SGCE;GCH1;PRKRA;ATP1A3;DRD2;THAP1;TAF1) | Alzheimer′s disease (APP;NOS3;PSEN2;APBB2;BLMH;A2M;MPO;SORL1) | Frontotemporal dementia (CHMP2B;TRPM7;TNF) | Cardiomyopathy, dilated (DSP;MYBPC3;ACTN2;TPM1;LDB3;ABCC9;PSEN1;TAZ;TTN;PLN;SGCD;DES;ACTC1;LMNA;TCAP;TNNI3;DMD;SCN5A;VCL;MYH7) | Cardiomyopathy, dilated (DSP;MYBPC3;ACTN2;TPM1;PSEN2;LDB3;ABCC9;TAZ;TTN;PLN;SGCD;DES;ACTC1;LMNA;TCAP;TNNI3;DMD;SCN5A;VCL;MYH7) |
| 3 | Diabetes (IL6;EPO;IRS1;HFE;INSR;IRS2;PPARG;SLC2A4;GCK) | Dystonia (SGCE;GCH1;PRKRA;ATP1A3;DRD2;THAP1;TAF1) | Charcot-Marie-Tooth disease (PRPS1;MTMR2;EGR2;HSPB8;LITAF;NDRG1;DNM2;MPZ;LMNA;MFN2;NEFL;KIF1B;GARS;SBF2) | Ataxia (PRKCG;TBP;ABCB7;FMR1;KCNA1;ITPR1;SLC1A3;CP;APTX;SYNE1;TTBK2;ATCAY;CACNB4;ATXN1;ATXN7;PPP2R2B;TDP1;SACS;ATXN10;FXN;POLG;SPTBN2) | |
| 4 | Diabetes mellitus, type 2 (IRS1;INSR;IRS2;PPARG;SLC2A4;GCK) | Schizophrenia (CHRNA7;DTNBP1;AKT1;MTHFR;NRG1;HTR2A;COMT) | Neuropathy (EGR2;CTDP1;HSPB8;BSCL2;SPTLC1;GAN;WNK1;MPZ;TDP1;PRX;MFN2;GARS;CCT5;POLG) | Charcot-Marie-Tooth disease (PRPS1;MTMR2;EGR2;HSPB8;LITAF;NDRG1;DNM2;MPZ;LMNA;MFN2;NEFL;KIF1B;SBF2) | |
| 5 | Alzheimer disease (NOS3;HFE;PSEN1;MPO) | Colorectal cancer (PLA2G2A;AKT1;BAX;CTNNB1;TLR4;TP53) | Cardiomyopathy, hypertrophic (MYBPC3;ACTC1;CAV3;MYL2;TPM1;MYL3;TNNI3;MYH6;TTN;MYH7) | Cardiomyopathy, hypertrophic (MYBPC3;ACTC1;CAV3;MYL2;TPM1;MYL3;TNNI3;MYH6;TTN;MYH7) |
Validation of predicted ALS-linked genes. *ZFP91-CNTF is a read-through transcript of both ZPF91 and CNTF. CNTF was predicted to be ALS-linked by the model. The p-value and number of validated genes shown in brackets indicate results if the ZFP91-CNTF transcript is not considered.
| Predicted (n) | Mapped (n) | Validated (n) | Significant genes ( | Magma Gene set | ||
|---|---|---|---|---|---|---|
| DisGeNet | 0.49 | 176 | 166 | 1 | 0.44 | |
| ALSoD | 0.33 | 327 | 305 | 2 | 0.24 | |
| Manual | 0.060 | 45 | 41 | 2 (1) | 0.057 | |
| ClinVar | 0.038 | 192 | 170 | 3 | 0.065 | |
| Union | 0.67 | 575 | 534 | 1 | 0.72 |