| Literature DB >> 30068933 |
Marina L Garcia-Vaquero1, Margarida Gama-Carvalho1, Javier De Las Rivas2, Francisco R Pinto3.
Abstract
Discovering disease-associated genes (DG) is strategic for understanding pathological mechanisms. DGs form modules in protein interaction networks and diseases with common phenotypes share more DGs or have more closely interacting DGs. This prompted the development of Specific Betweenness (S2B) to find genes associated with two related diseases. S2B prioritizes genes frequently and specifically present in shortest paths linking two disease modules. Top S2B scores identified genes in the overlap of artificial network modules more than 80% of the times, even with incomplete or noisy knowledge. Applied to Amyotrophic Lateral Sclerosis and Spinal Muscular Atrophy, S2B candidates were enriched in biological processes previously associated with motor neuron degeneration. Some S2B candidates closely interacted in network cliques, suggesting common molecular mechanisms for the two diseases. S2B is a valuable tool for DG prediction, bringing new insights into pathological mechanisms. More generally, S2B can be applied to infer the overlap between other types of network modules, such as functional modules or context-specific subnetworks. An R package implementing S2B is publicly available at https://github.com/frpinto/S2B .Entities:
Mesh:
Year: 2018 PMID: 30068933 PMCID: PMC6070533 DOI: 10.1038/s41598-018-29990-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1S2B performance with artificial disease modules. (A) Fraction of candidates that were in the overlap between modules as a function of S2B decreasing rank. (B) Fraction of candidates that are direct neighbors of proteins in the overlap (C) Recall as a function of S2B decreasing rank. Recall is the fraction of proteins in the overlap between the two modules that have an S2B rank lower or equal to the candidate rank ploted. In A, B and C three models of disease modules were tested: shell, connectivity (conn) and random walk with restart (rwr) based modules. The impact on method performance of excluding seeds known to be part of both modules was evaluated in A and C. Hereafter, results were computed excluding seeds known to be part of both modules. (D) S2B robustness upon reduction of the fraction of module proteins used as seeds. (E) S2B robustness upon randomly rewiring a fraction of network edges. (F) S2B robustness upon replacing a fraction of input seeds by random proteins. In plots A, B, D, E and F, values are averages of S2B candidates in three consecutive ranks. In A, B and C, 95 pairs of shell modules, 355 pairs of conn modules and 200 pairs of rwr modules were evaluated. In D, E and F, 50 pairs of shell modules were used. Shell modules have between 200 and 400 nodes, while conn and rwr modules have 250 nodes. The overlap between two modules is always between 50 and 125 nodes. In A, B, C, E and F, a 50% random sample of each module was used as seeds.
Precision of DIAMOnD and S2B predictions of proteins in the overlap between pairs of artificial modules.
| Module type | # Candidates retrieved by DIAMOND (equal to # top S2B candidates) median [1stQ-3rdQ] | Precision median [1stQ-3rdQ] | |
|---|---|---|---|
| DIAMOnD | S2B | ||
| Shell | 4 [1–9] | 0.00 [0.00–0.18] | 1.00 [0.75–1.00] |
| Connectivity | 135 [104–149] | 0.60 [0.54–0.73] | 0.18 [0.16–0.22] |
| RWR | 8 [1–26] | 0.13 [0.00–0.25] | 1.00 [0.88–1.00] |
Predictions are matched relatively to the number of candidates generated by DIAMOnD for the same pair of modules. 50 module pairs of each type were evaluated.
Figure 2Comparison of functional enrichments between S2B candidates and Disease Genes (MND-DGs) sets. Two independent Functional Enrichment Analyses (FEAs) were performed for S2B candidates and DG sets. FEA results were simplified by merging GO terms into GO groups by gene co-occurrence (if they have 70% of associated genes in common) and semantic similarity (if they have a Lin similarity score higher than 0.70). To further simplify the results, each GO group was assigned to a single GO class by counting the key words most frequent in GO terms descriptions (supplementary text). 67 GO groups were not related to any GO class and therefore were discarded. (A) GO groups related only to S2B candidates genes. (B) GO groups related both with S2B candidates and with MND-DGs. (C) GO groups related only with MND-DGs. Each dot represent a single GO group characterized by the sum of gene frequencies (dot size). GO groups with a 3rd quartile fold enrichment higher that 7 are highlighted with bold border.
Enrichment of S2B candidates in ALS and SMA DGs from diferent evidence sources.
| DGs not present in DisGeNet or OMIM | S2B candidates (206 proteins) | APID3HuRI network (10991 proteins) | Fold Enrichment | p-value | |
|---|---|---|---|---|---|
| Open Targets | ALS | 44 | 1242 | 1.89 | <10−5 |
| SMA | 8 | 152 | 2.80 | 0.005 | |
| Both | 6 | 72 | 4.45 | 0.001 | |
| DISEASES | ALS | 4 | 77 | 2.77 | 0.043 |
| SMA | 3 | 13 | 12.31 | 0.017 | |
| Both | 1 | 1 | 53.35 | <10−6 | |
| Pubmed abstracts | ALS | 72 | 1482 | 2.59 | <10−6 |
| SMA | 48 | 641 | 3.99 | <10−6 | |
| Both | 37 | 413 | 4.78 | <10−6 | |
Open Targets and DISEASES platforms were queried for ALS and SMA DGs. For the Pubmed abstracts category, a gene was considered associated with a disease if at least one abstract contained the gene symbol and the disease name (“Amyotrophic Lateral Sclerosis” or “Spinal Muscular Atrophy”). Abstract search was performed with the reutils R package. S2B candidates and interactome network nodes that were DGs identified through DisGeNet or OMIM were excluded from this analysis. Fold enrichment is the ratio between DG frequency in S2B candidates and DG frequency in APID3HuRI network. p-values were computed with an hypergeometric test. S2B candidates that are DGs according to these sources and the pmid of the associated abstracts are available in supplementary data.
Figure 3S2B candidate interaction network. Edges represent direct physical interactions between S2B proteins retrieved from the APID3HuRI interactome. Cliques of at least 4 proteins are highlighted with black edges. Clusters formed by proteins that appear frequently together in the shortest paths used by the S2B method (supplementary text) are labeled by node color. A, B and C boxes outline examples in which cliques and clusters overlap. S2B candidates simultaneously identified as ALS or SMA Disease Genes are denoted by node square shape. Node size is proportional to the S2B score.