| Literature DB >> 29323246 |
Sebastian Vlaic1,2, Theresia Conrad3, Christian Tokarski-Schnelle3,4, Mika Gustafsson5, Uta Dahmen4, Reinhard Guthke3, Stefan Schuster6.
Abstract
The identification of disease-associated modules based on protein-protein interaction networks (PPINs) and gene expression data has provided new insights into the mechanistic nature of diverse diseases. However, their identification is hampered by the detection of protein communities within large-scale, whole-genome PPINs. A presented successful strategy detects a PPIN's community structure based on the maximal clique enumeration problem (MCE), which is a non-deterministic polynomial time-hard problem. This renders the approach computationally challenging for large PPINs implying the need for new strategies. We present ModuleDiscoverer, a novel approach for the identification of regulatory modules from PPINs and gene expression data. Following the MCE-based approach, ModuleDiscoverer uses a randomization heuristic-based approximation of the community structure. Given a PPIN of Rattus norvegicus and public gene expression data, we identify the regulatory module underlying a rodent model of non-alcoholic steatohepatitis (NASH), a severe form of non-alcoholic fatty liver disease (NAFLD). The module is validated using single-nucleotide polymorphism (SNP) data from independent genome-wide association studies and gene enrichment tests. Based on gene enrichment tests, we find that ModuleDiscoverer performs comparably to three existing module-detecting algorithms. However, only our NASH-module is significantly enriched with genes linked to NAFLD-associated SNPs. ModuleDiscoverer is available at http://www.hki-jena.de/index.php/0/2/490 (Others/ModuleDiscoverer).Entities:
Mesh:
Year: 2018 PMID: 29323246 PMCID: PMC5764996 DOI: 10.1038/s41598-017-18370-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The concept of disease modules exemplified using a sample PPIN. One or more topological modules (highlighted red) contain proteins involved in similar biological processes forming functional modules (highlighted blue). A disease module (highlighted green) is a sub-network of proteins enriched with disease-relevant proteins, e.g., known disease-associated proteins.
Figure 2Given a PPIN and gene expression data (Input), the algorithm works in three steps. Step I) The community structure underlying the PPIN is approximated by the identification of protein cliques. Step II) Identification of cliques significantly enriched with DEGs. Step III) Assembly of the regulatory module based on the union of significantly enriched cliques.
Figure 3Clique enumeration using ModuleDiscoverer. (A) Sample PPIN with 10 proteins and 26 known relations. (B) Representation of the PPIN as an undirected labeled graph with each vertex representing one of the proteins in (A). The edge weight denotes for the number of existing relations between its connecting nodes. (C–F) Red vertices denote for seed nodes. Yellow vertices are first neighbors of seed nodes. Green vertices represent cliques. Their label represents clique forming proteins.
Figure 4The identified NASH-regulatory module. Nodes (proteins) are labeled with the official gene symbol. Their membership in a sub-module is shape-coded.
Node-wise overlap between identified regulatory modules of DEGAS, MATISSE, KeyPathwayMiner (KPM), ModuleDiscoverer single-seed (MD-SS) and multi-seed (MD-MS) as well as the set of DEG-associated proteins, i.e., differentially regulated proteins (DRPs).
| DRPs | MD-SS | MD-MS | DEGAS | MATISSE | KPM | |
|---|---|---|---|---|---|---|
| DRPs | 410 | 9.08% | 8.84% | 2.26% | 23.55% | 23.79% |
| MD-SS | 311 | 74.49% | 3.22% | 22.79% | 15.77% | |
| MD-MS | 415 | 2.47% | 21.50% | 13.44% | ||
| DEGAS | 42 | 3.19% | 8.40% | |||
| MATISSE | 314 | 26.22% | ||||
| KPM | 100 |
The overlap (given in %) is defined as fraction of the intersection of the module’s nodes from the union of the module’s nodes. The diagonal of the matrix contains the total number of proteins in the module.
Figure 5Similarity of modules given by the correlation-based distance measure of ranked lists of significantly enriched GO-terms. The height corresponds to the correlation-based distance (see methods), where values <1 denote for a positive average correlation.
Figure 6Enrichment of FLD-related diseases with proteins of modules produced by ModuleDiscoverer (single-seed and multi-seed), DEGAS, KeyPathwayMiner and MATISSE as well as the set of DEGs. Higher values equal lower p-values.