| Literature DB >> 27213009 |
Rui Henriques1, Sara C Madeira1.
Abstract
BACKGROUND: Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Although state-of-the-art biclustering algorithms are able to discover modules with varying coherency and robustness to noise, their application for the discovery of non-dense modules in biological networks has been poorly explored and it is further challenged by efficiency bottlenecks.Entities:
Keywords: Biclustering; Fleixble module discovery; Large-scale biological networks
Year: 2016 PMID: 27213009 PMCID: PMC4875761 DOI: 10.1186/s13015-016-0074-8
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Fig. 1Structured view on the existing challenges, proposed contributions (and their applicability) for an effective and efficient (pattern-based) biclustering of network data
Fig. 2Illustrative discrete biclusters with varying coherency and quality
Fig. 3Pattern-based discovery of biclusters with constant and order-preserving coherency
Fig. 4Pattern-based biclustering of (heterogeneous) biological networks using real-valued matrices derived from minimal weighted bipartite graphs
Fig. 5Biclustering non-dense modules: the constant model and the relevance of tolerating noise
Fig. 6Non-dense biclustering modules: the symmetric and plaid models
Fig. 7Non-dense biclustering modules: the order-preserving model
Fig. 8Illustrative symbolic network with annotations
Fig. 9Tackling the existing limitations with BicNET: 1 addressing inconsistencies and guarantee the applicability towards different types of network; 2 enabling for the first time the discovery of modules with varying coherency criteria; 3 guaranteeing the robustness of the searches and the possibility to parameterize the desirable quality of the modules; 4 surpassing efficiency bottlenecks of state-of-the-art and peer pattern-based biclustering algorithms; and 5) benefiting from the guidance of available background knowledge
Fig. 10Simplified illustration of BicNET behavior: efficient storage of multi-item discrete adjacency matrices mapped from network data; iterative application of distinct pattern mining searches with decreasing support for the discovery of modules with varying coherency criteria; and postprocessing of the discovered modules
Fig. 11Advanced aspects of BicNET: 1 allowing symmetries within the discovered modules through iterative sign adjustments to model biological entities simultaneously involved in up- and down-regulatory interactions, and 2 allowing plaid effects through the guided inclusion of new interactions explained by cumulative contributions to model biological entities involved in multiple biological processes (commonly associated with overlapping regions or hub-nodes within a network)
Fig. 12BicNET graphical interface for sound parameterizations and visual analyzes of results
Default synthetic data benchmarks for network data analyzes
| Network nodes (10 % density) | Network density (2000 nodes) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 200 | 500 | 1000 | 2000 | 10,000 | 1 % | 5 % | 10 % | 25 % | |
| Nr. of hidden modules | 5 | 10 | 15 | 20 | 30 | 3 | 5 | 10 | 20 |
| Nr. of nodes per module | [20, 30] | [30, 40] | [40, 50] | [50, 70] | [100, 140] | [50, 70] | [50, 70] | [50, 70] | [50, 70] |
| % interactions in modules | 19.5 | 12.2 | 7.6 | 4.5 | 1.1 | 22.5 | 9.0 | 4.5 | 2.3 |
Biological networks used to assess the relevance and efficiency of BicNET
| Type | Organism |
|
| Density (%) | Notes |
|---|---|---|---|---|---|
| GI | Yeast | 4455 | 1,91,309 | 1.0 | Links (65 % negative) from double-mutant arrays [ |
| GI | Yeast | 6314 | 4,23,335 | 1.1 | Known and predicted associations benchmarked from multiple data sources and text mining, and combined through an integrative score [ |
| PPI |
| 8428 | 32,93,416 | 4.6 | |
| PPI | Human | 19,247 | 85,48,002 | 2.3 |
Comparison of widely-used tasks for modular analysis of networks using the introduced synthetic and real datasets
| Approach | Method | Solution aspects and concerns | Efficiency |
|---|---|---|---|
| Clustering (exhaustive and non-overlapping node coverage) | k-Means | Majority of clusters show loose connectedness; High variation on the size of modules (1-to-3 clusters covering almost all nodes and the remaining clusters being statistically non-significant [ | Efficiency problems for networks with >100.000 interactions |
| Spectral | Able to isolate modules where the degree of connectedness is approximately constant per module; Only a small subset of clusters is relevant (medium-to-high degree of connectedness) | Medusa implementation only scales for networks with <10.000 interactions | |
| Affinity propagation | The clusters collected from (small samples of) the target biological networks show a generalized lack of biological relevance | Time and memory bottlenecks for small nets (<1000 interactions) | |
| Clustering (non-exhaustive and possibly overlapping node coverage) | CPMw (weighted | Intolerance to noise; Intractably large solutions (explosion of similar clusters) with strict coherency criterion ( | Only scales for nets with <5000 nodes (5–10 % density). Bottlenecks for the target biological data even when removing >95 % interactions |
| Biclustering (bi-sets of nodes) | Hypercliques (unweighted) | Intolerant to missing interactions; Large number of highly similar modules; Dense coherency only | BicNET implementation efficient for large networks (>10000 nodes) with density up to 25 % |
| Hypercliques (differential) | Intolerant to noise and the prone item-boundaries problem during the selection of differential weights; Dense coherency only | BicNET implementation scales for large dense networks | |
| BicNET (dense assumption) | Focus on dissimilar modules robust to noise and missings, with possibly distinct forms of coherency strength (| | Efficiency bounded by the search for unweigthed hypercliques (| |
Fig. 13Efficiency of biclustering algorithms able to discover non-dense modules for synthetic networks with varying size and density
Fig. 14Efficiency gains of BicNET when using sparse data structures, pattern mining searches providing robust alternatives to bitset vectors, and noise handlers
Fig. 15Accuracy of BicNET against pattern-based biclustering algorithms on networks for the discovery of dense modules with varying degree of noisy and missing interactions (networks with 2000 nodes and 10 % density)
Fig. 16Assessment of BicNET’s ability to recover planted modules with constant, symmetric, plaid and order-preserving coherencies from noisy networks (networks with 2000 nodes according to Table 1)
Fig. 17Properties of BicNET solutions against hypercliques discovered in GI and PPI networks (described in Table 2) when considering varying coherency criteria
Description of the biological role of an illustrative set of BicNET’s modules with varying properties
| ID | Homogeneity |
| Putative functionality: group of enriched terms ( | |
|---|---|---|---|---|
| STRING (yeast) | Y1 | Dense (high noise-tolerance) | 231 × 14 | Metabolic processes with incidence on protein, peptide and amide metabolism and biosynthesis |
| Y2 | Dense (medium noise-tolerance) | 217 × 9 | Metabolism of nitrogen compounds and some organic substances | |
| Y3 | Constant (few high | 103 × 8 | Amino acid activation and tRNA metabolism for tRNA aminoacylation | |
| Y4 | Constant (few high | 206 × 6 | Organic acid metabolic process and its subterms | |
| Y5 | Constant (few high or low | 55 × 7 | Signal transduction and its subterms | |
| Y6 | Constant (few high or low | 43 × 6 | Phosphorylation related terms (with incidence on protein phosphorylation) | |
| Y7 | Order-preserving | 176 × 12 | Transport of organic acids (with incidence on aminoacid transmembrane transport) | |
| Y8 | Order-preserving | 235 × 9 | Oxidation-reduction process and metabolism of aminoacids. Assembly of ribonucleoprotein | |
| Y9 | Order-pres. (few high | 146 × 8 | Transport of molecules (highest enrichment found for drug transmembrane) | |
| STRING (human) | H1 | Dense (high noise-tolerance) | 811 × 28 | Multiple metabolic processes with incidence on transcription activity |
| H2 | Dense (high noise-tolerance) | 787 × 25 | Regulation of metabolic processes (both positive and negative regulation) | |
| H3 | Constant (few high | 693 × 14 | Regulation of intracellular signal transduction (over 20 highly enriched terms) | |
| H4 | Constant (few high | 645 × 10 | Regulation of molecular functions (incidence on catalytic activity) | |
| H5 | Order-preserving | 720 × 24 | Establishment of protein localization (protein targeting to ER and membrane) | |
| H6 | Order-preserving | 733 × 29 | Protein phosphorylation and its subterms | |
| DryGIN | D1 | Dense (high noise-tolerance) | 28 × 17 | Organelle localization (establishment of spindle and nuclear localization) |
| D2 | Constant (with pos&neg | 22 × 10 | Chromatin remodeling and nucleosome organization | |
| D3 | Constant (with pos&neg | 21 × 7 | Transport processes for the establishment of protein localization | |
| D4 | Constant (with pos&neg | 19 × 9 | Regulation of growth (incidence on filamentous growth) | |
| D5 | Order-preserving | 39 × 7 | Organelle and nucleous organization | |
| D6 | Order-preserving | 54 × 6 | Regulation of cellular metabolic processes (both positive and negative regulation) |
Illustrative set of biologically significant BicNET’s modules: description of the highly enriched terms in the modules presented in Table 5 [74, 75]
| ID | Terms description ( |
|
| |
|---|---|---|---|---|
| DryGIN | G1 | Histone modification; regulation of histone H3-K79 methylation, histone H2B ubiquitination, H2B conserved C-terminal lysine ubiquitination, H3-K4 methylation (4) | 6 | 27 |
| G2 | Regulation of gluconeogenesis; glutamate metabolic and catabolic processes (2);nicotinamide riboside metabolic process; nicotinamide nucleotide biosynthetic process | 6 | 13 | |
| G3 | Positive and negative regulation of transcription from RNA polymerase II; Invasive growth response to glucose limitation and hyperosmotic salinity response by regulating RNA polymerase II (5) | 5 | 12 | |
| G4 | Meiotic anaphase I; activation of anaphase-promoting complex activity involved in meiotic cell cycle | 4 | 12 | |
| G5 | Negative reg. of phospholipid biosynthesis; lipid homeostasis; isopropylmalate and oxaloacetate transport | 4 | 11 | |
| G6 | Cotranslational protein targeting to membrane; protein insertion into mitochondrial membrane; protein import into peroxisome membrane; reg. sporulation; actin filament bundle assembly involved in cytokinesis | 5 | 25 | |
| G7 | Acetate fermentation, acetyl-CoA biosynthesis (from acetate), reg. transcription on exit from mitosis | 7 | 50 | |
| STRING | S1 | Response to hypoxia; oxidation-dependent protein catabolic process; anaerobic respiration; age-dependent response to reactive oxygen species; cellular response to oxidative stress | 36 | 169 |
| S2 | Positive and negative reg. of mitotic and nuclear cell cycle, DNA replication, budding cell apical bud growth | 16 | 98 | |
| S3 | Transport of aerobic electron, acetyl-CoA, vacuolar transmembrane, amine, transport (5); ribose phosphate metabolic process; D-ribose metabolic and catabolic processes (2) | 22 | 93 | |
| S4 | Heterochromatin maintenance involved in chromatin silencing; sister chromatid segregation | 6 | 70 | |
| S5 | Cytoplasmic and mitochondrial translation (4); regulation of translational fidelity; ADP biosynthesis | 6 | 76 | |
| S6 | rRNA processing; separation, cleavage and maturation of SSU-rRNA (5); ribosomal (large subunit) biogenesis | 14 | 143 |
Exclusivity and relevance of BicNET solutions: properties of found modules
| ID | Type |
| Items |
| Notes | |
|---|---|---|---|---|---|---|
| DryGIN | G1 | Constant | 18 × 9 | {−4,..,−1} | 27 | Module with coherent strong (−4) and soft (−1) negative interactions |
| G2 | Symmetric | 4 × 9 | {−3,..,3} | 13 | Varying levels of strong (mainly positive) interactions ({ | |
| G3 | Symmetric | 5 × 6 | {−2,−1,1,2} | 12 | Module with either all positive or negative interactions per “row”-node ({ | |
| G4 | Constant | 7 × 5 | {1,2} | 12 | Module with coherent strong (2) and soft (1) positive interactions | |
| G5 | Symmetric | 7 × 5 | {−2,−1,1,2} | 11 | Module with either all positive or negative interactions per “row”-node ({ | |
| G6 | Order | 14 × 11 | {−3,..,3} | 25 | Preserved precedences and co-occurrences per “row”-node before postprocessing | |
| G7 | Order | 42 × 8 | {−2,−1,1,2} | 50 | Noise-tolerant module with mostly preserved orderings per “row”-node | |
| STRING | S1 | Order | 155 × 14 | {1,2,3,4} | 169 | Preserved precedences and co-occurrences per “row”-node before postprocessing |
| S2 | Constant | 80 × 18 | {1,2,3} | 98 | Module with mostly of non-dense interactions ({1,2}) | |
| S3 | Constant | 83 × 10 | {1,2} | 93 | Module with non-dense positive interactions before postprocessing ({1}) | |
| S4 | Constant | 50 × 20 | {1,2,3} | 70 | Module with non-dense positive interactions ({1,2}) before postprocessing | |
| S5 | Constant | 45 × 31 | {1,2,3} | 76 | Module with mostly dense interactions (scores in {2,3}) | |
| S6 | Constant | 55 × 85 | {1,2} | 143 | Module with mostly dense interactions ({2}) |
Sets of modules with meaningful overlapping areas (satisfying the in-between plaid assumption [21])
| ID | Modules with meaningful overlapping regions | Pattern |
| % Overlapping interactions |
|---|---|---|---|---|
| G6 | G7 from Table | Order | 42 × 8 | 21 |
| G8: tRNA re-export from nucleus; nuclear mRNA surveillance of mRNP export | Constant | 12 × 10 | 62 | |
| G9: More general module (background) including cellular responses to pH | Constant | 41 × 6 | 16 | |
| S4 | S2 from Table | Constant | 80 × 18 | 42 |
| S7: Telomere maintenance; translocation; protein import into nucleous | Constant | 104 × 20 | 37 | |
| S8: Response to ionizing radiation; ribose phosphate metabolic process | Constant | 59 × 31 | 45 | |
| S9: Positive regulation of mitochondrial translation in response to stress | Constant | 50 × 20 | 89 |
Fig. 18Taxonomy of enriched terms for BicNET’s modules from yeast GIs (on STRING and DryGIN networks)
Fig. 19Taxonomy of enriched terms of BicNET’s modules discovered from human PPIs (see Table 4)