| Literature DB >> 28153040 |
Rui Henriques1, Francisco L Ferreira2, Sara C Madeira3.
Abstract
BACKGROUND: Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entities). However, given its computational complexity, only recent breakthroughs on pattern-based biclustering enabled efficient searches without the restrictions that state-of-the-art biclustering algorithms place on the structure and homogeneity of biclusters. As a result, pattern-based biclustering provides the unprecedented opportunity to discover non-trivial yet meaningful biological modules with putative functions, whose coherency and tolerance to noise can be tuned and made problem-specific.Entities:
Mesh:
Year: 2017 PMID: 28153040 PMCID: PMC5290636 DOI: 10.1186/s12859-017-1493-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Symbolic pattern-based biclusters with varying coherency assumptions
Recent breakthroughs on pattern-based biclustering: algorithms and tackled limitations
| Contribution | Biological output | Behavior | Tackled limitations |
|---|---|---|---|
|
| Putative functional modules robust to noise, such as co-expressed genes with a regulatory pattern given by possibly different expression levels across a subset of conditions. | Algorithms consistently combining preprocessing, pattern mining (itemsets and association rules) and postprocessing procedures to guarantee the flexibility and robustness of the outputs. | Flexible structures; Exhaustive (yet efficient) searches; Tolerance to noise; Parameterizable coherence strength. |
|
| Modules with shifting and scaling factors to deal with the distinct responsiveness of biological entities and handle biases introduced by the applied measurement. | Iterative discovery of pattern differences (shifts) and least common divisors (scales), together with pruning strategies, to learn additive and multiplicative models. | Precise modeling of shifting and scaling factors across rows; Flexible structure and parameterizable quality. |
|
| Coherent variation of gene expression or molecular concentrations across samples or within a temporal progression (such as stages of a disease or drug response). | Biclustering is parameterized with enhanced sequential pattern miners (by ordering column indexes per row according to the observed values) to flexibly discover noise-tolerant orderings. | Surpasses efficiency and robustness issues of exhaustive peers; Flexible structures with guarantees of optimality, addressing the problems of greedy peers. |
|
| Modules associated with biological processes simultaneously capturing activation and repression mechanisms within transcriptomic, proteomic or metabolic data. | Combinatorial sign-adjustments (together with pruning principles) to model symmetries and integrate them with scales, shifts and orderings. | Discovery of non-constant biclusters with symmetries; Parameterizable properties. |
|
| Coherent modules in homo/heterogeneous biological networks with weighted/labeled interactions. Modules able to capture non-trivial forms of behavior and accommodate less-studied biological entities. | Extension of previous contributions towards biological networks. For this end, new data structures and searches are proposed to effectively and efficiently deal with the inherent sparsity of network data. | Discovery of non-dense modules; Robustness to noisy and missing interactions; Scalable for large networks. |
|
| Overlapping regulatory influence in expression data (cumulative effects that multiple biological processes have on a gene at a particular time) and network data (cumulative effects in interactions belonging to multiple modules). | Extended searches to recover excluded areas (due to cumulative contributions on regions where biclusters overlap) and to remove noisy areas. New composition functions and relaxations to deal with noise and non-linear cumulative effects. | Addresses the exact additive plaid assumption with relaxations; No need for all the data elements to follow a plaid assumption; Models non-constant biclusters. |
|
| Biological modules in accordance with user expectations (e.g. non-trivial homogeneity, satisfying a given pattern or preferred regulatory behavior (such as repression)) or with consistent functional terms. | Extended searches to benefit from background knowledge, including: constraints with succinct, anti-monotone and convertible properties, and incorporation of terms from knowledge repositories. | Focus on regions of interest; Efficiency gains; Removal of uninformative values. |
Fig. 2BicPAMS: sound and parameterizable behavior (annotations in purple)
Fig. 3BicPAMS: textual and visual display of results
Default and dynamic/data-driven parameterizations of BicPAMS
| Parameter | Value | Notes | |
|---|---|---|---|
| Major parameters | P3 Coherency assumption | Constant assumption | A default assumption considers a (possibly noise-tolerant) constant pattern on a subset of rows/columns/nodes, providing an adequate degree of flexibility (superior to biclusters with differential/dense values or constant values overall) well suited for initial analyzes. |
| P4 Coherency strength |
| Adequate sensitivity to different levels of expression ({-2,-1} {0} and {1,2} sets of symbols correspond to down-regulation, preserved and up-regulation) or association strength. Multiple symbols can be assigned to a single real-valued element to guarantee robustness to noise. | |
| P5 Quality | 80% | Guarantees an adequate tolerance to noise, allowing biclusters to have up to 20% of noisy values. | |
| P15 Pattern representation | Closed | Closed pattern representations enable the discovery of maximal biclusters (biclusters that cannot be extended without removing rows or columns). | |
| P16 Orientation | Patterns on rows | In accordance with Def.2. Considering expression data where rows correspond to genes, a bicluster with coherency across rows is defined by a group of genes with the same pattern along a subset of conditions. When rows correspond to conditions, a less-trivial bicluster is given by a group genes with preserved expression spanning a subset of conditions. | |
| Mapping options | P6 Normalization | Row | Normalization of values per biological entity or sample. |
| P7 Discretization | Gaussian | Cut-off points of a learned Gaussian curve to minimize imbalanced distributions of items. | |
| P8 Noise handler | None | By default multi-item assignments are deactivated for an easy interpretation of results. Nevertheless, we suggest the selection of multi-item assignments to guarantee a heightened robustness to discretization drawbacks and noise. | |
| P9 Symmetries | Dynamic | Symmetries are dynamically selected if the inputted data has negative values. This option can be deactivated to force the biclustering task to not distinguish positive from negative values. | |
| P10 Missings handler | Remove | Remove is suggested since Quality P5 is already in place to accommodate missing values within biclusters. Nevertheless, Replace option is suggested for data with a considerable amount of missing values. | |
| P11 Remove uninformative elements | None | By default, no items are removed. Alternative options should be only selected in the presence of knowledge regarding uninformative elements, such as non-differential expression or loose interactions. | |
| Mining options | P12 Stopping criteria | 50 biclusters | A minimum number of 50 biclusters (before postpro cessing) is suggested by default since the combination of this option with the quality and dissimilarity criteria leads to a compact set of dissimilar biclusters. This number (as well as the number of iterations) can be increased to guarantee more complete solutions for complex or large datasets. |
|
P13 Min. | 4 | Although maximal biclusters have at least 4 columns by default, this number should be increased for datasets where biclusters have a significantly higher number of columns. | |
|
P14
| 2 | Guarantees the removal of small and highly coherent regions in the dataset (after the 1st iteration) to enable the discovery of less-trivial biclusters. This number can be increased to promote a more even distribution of biclusters across the regions of the inputted data. | |
| P17 Pattern miner | Dynamic | From empirical evidence, CharmDiff is suggested for closed patterns, CharmMFI for maximal patterns, and F2G for simple patterns. When order-preserving coherency is inputted, IndexSpan is suggested by default. | |
| P18 Scalability | Dynamic | Option activated in the presence of very large datasets (>20 million elements under a constant assumption and >1 million elements for the remaining coherency assumptions). | |
| Closing | P19 Merging | Heuristic | Guarantees an efficient yet quasi-exact postprocessing. |
| P20 Filtering | 40% dissimilar elements | Guarantees an adequate level of dissimilarity. Biclusters sharing more than 60% of their elements with a larger bicluster are removed. |
BicPAMS: input data, major parameters, and output models
| Input: Data | P1 Matrix | The accepted file formats include attribute-relation files (.ARFF) and standard matrix files (such as.TXT). The first line of standard matrix files should specify the column identifiers, while the first entry of each line should specify the row identifier. Tabular data can be either delimited by tabs, spaces or commas. |
| P2 Network | BicPAMS accepts any input file format (such as.TXT or.SIG) assuming that: the first line specifies the column identifiers, and each other line specifies an interaction/entry within the network. An entry specifies the nodes and the association strength. Entries can be either delimited by tabs, spaces or commas. In addition to the file, the column index identifying the first node, second node and association strength needs to be inputted. Illustrating, for a network with header “idProteinA,nameProteinA,idProteinB,nameProteinB,weight”, the user should fix (node1,node2,score) indexes as (0,2,4) or (1,3,4). Finally, the user can specify whether each entry is directional from the first node towards the second node or bidirectional. Bidirectional entries increase the density of the network. | |
| Desirable Biclustering Models | P3 Coherency Assumption | The coherency assumption defines the correlation of values within a bicluster. In constant models, an observed pattern (possibly containing different items) is preserved across rows (or columns). In additive or multiplicative models, shifting or scaling factors are allowed per row (or column) in order to allow meaningful variations of the original pattern. In order-preserving models, the values per row induce the same ordering across columns. A plaid model considers the cumulative effect of the contributions from multiple biclusters on areas where their rows and columns overlap. Previous models can further accommodate symmetric factors. |
| P4 Coherency Strength | The number of items determines the allowed deviations from expected values. Illustrating, a gene expression matrix parameterized with 5 items will have 2 levels of activation ({1,2}), 2 levels of repression ({-1,-2}) and 1 level of unchanged expression ({0}). By going beyond the differential values, BicPAMS enables the discovery of non-trivial yet coherent and meaningful correlations. To maintain consistency, additive (multiplicative) models should be used with an uneven (even) number of items. When considering order-preserving models, the number of items should be increased to balance the degree of co-occurrences versus precedences. | |
| P5 Quality | This field specifies the maximum number of allowed noisy/missing elements (determining the minimum overlapping threshold for merging procedures). The tolerance of biclusters to noise can be additionally addressed using noise handlers (see mapping options) and alternative postprocessing procedures. | |
| P15 Pattern Representation | Closed patterns (default option) enable the discovery of maximal biclusters (biclusters that cannot be extended without the need of removing rows and columns). Maximal patterns gives a preference towards flattened biclusters, possibly neglecting both vertical and smaller biclusters. Finally, the use of simple/all frequent patterns leads to biclustering solutions with a high number of biclusters (possibly contained by another bicluster), which can be useful to guide postprocessing steps. As the user specifies one of these three options, the available pattern miners are dynamically updated. | |
| P16 Orientation | Coherency can be either observed across rows (default) or columns (searches are applied on the transposed matrix). When the number of columns highly exceeds the number of rows (or vice-versa when searches are applied on the transposed matrix), pattern miners with vertical data formats such as Eclat should be preferred. | |
| Output | Upon successfully running BicPAMS, a textual and graphical display of the outputs is provided. The user can select the level of details associated with the outputted biclustering solution (statistics only, list of rows and columns per bicluster, disclosure of values per bicluster). |
Additional parameters of BicPAMS along the mapping, mining and closing steps
| Mapping Options (includes P4 from Table | P6 Normalization | Depending on the properties of the input data, the user can either normalize data per Row, Column or for the Overall data elements or ignore normalization by selecting the None option. Both outliers and missing values are handled separately. | |
| P7 Discretization | Real-valued data needs to be discretized to apply pattern-based biclustering (see noise handling to understand how BicPAMs guarantees robustness to discretization drawbacks). The user can select the cut-off points of a Gaussian distribution (default) or fixed ranges of values (equal sized intervals after excluding outliers). Note that fixed ranges can lead to an imbalanced distribution of items. The user can bypass this option for symbolic data by selecting the None option. | ||
| P8 Noise Handler | Multi-item assignments can be considered to handle deviations on the expected values within a bicluster caused by noise or discretization issues. By selecting this option, 2 items are assigned to elements with a value near a boundary of discretization (value in range | ||
| P9 Symmetries | This option is dynamically selected if the input data is composed by positive and negative values (as it naturally affects the properties of the outputted biclusters). When using symmetric ranges, additive (multiplicative) models should be parameterized with an odd (even) number of items to guarantee consistent shifts (scales). | ||
| P10 Missings Handler | The user can specify what happens in the presence of missing values. Since BicPAMS is natively prepared to analyze sparse data, the Remove option (default) simply signals the algorithms to exclude missings from the searches. Alternatively, the Replace option uses WEKA’s imputation methods to fill missings (the error of imputations can be minimized by simultaneously activating a noise handler). We suggest the use of Remove option for network data and other meaningfully sparse datasets since BicPAMS is able to discover biclusters with missing interactions (see Quality parameter). | ||
| P11 Remove Uninformative Elements | This option supports the possibility to remove uninformative data elements. Zero Entries can be selected to remove the {0}-items, while the Differential option is used to focus on items with high absolute value (e.g. {-3,-2,2,3} when | ||
| Mining Options(includes P3, P15 and P16 from Table | P12 Stopping Criteria | The search algorithm runs until any of the available stopping criteria is met. The available options are: 1) minimum number of biclusters before merging (default), 2) minimum covered area by the discovered biclusters (as a percentage of the elements of the input data matrix or network), and 3) minimum support threshold (minimum number of rows per bicluster specified as a fraction of overall rows). The value associated with the selected option should be additionally specified. We suggest the definition of a high number of biclusters (>50) as the default option, in order to guarantee an adequate exploration of the input dataset. | |
|
P13 Minimum | The minimum number of columns per bicluster can be optionally inputted to promote efficiency and align the outputs according to user expectations. A good principle to fix this value is to use the square root of the number of columns (interactions per nodes) of the input matrix (network). | ||
|
P14
| BicPAMS default behavior relies on two iterations. For data with large coherent regions that may prevent the discovery of smaller (yet relevant) regions, the number of iterations can be increased to guarantee their discovery. On every new iteration, 25% of the most selected data elements (from the biclusters discovered from the previous iteration) are removed to guarantee a focus on new regions. 3 iterations already guarantee an adequate space exploration for hard data settings. | ||
| P17 Pattern Miner | The available pattern mining algorithms are dynamically provided based on the selected coherency assumption and pattern representation. Sequential pattern miners (SPM) are provided for order-preserving models: PrefixSpan and IndexSpan (an optimized algorithm able to explore gains in efficiency from the item-indexable properties) are made available for simple pattern representations, while BIDE+ is provided for closed pattern representations. Frequent itemset miners (FIM) are selected for the remaining coherency assumptions. AprioriTID, F2G (pattern-growth method for data with a large extent of coherent areas) and Eclat (vertical method for data with a high number of columns) are made available for simple pattern representations. CharmDiffsets, AprioriTID and CharmTID are made available for closed pattern representations, while CharmMFI with diffsets is provided for maximal pattern representations. | ||
| P18 Scalability | This option specifies whether data partitioning principles are applied or not to guarantee the scalability of the searches (only suggested for data with >100 Mb). | ||
| Closing Opt.(includes P5) | P19 Merging | Different merging procedures are made available (according to [ | |
| P20 Filtering | Filtering is essential to guarantee compact solutions (applied after merging). A biclustered is filtered if it has not enough Dissimilar Elements, Dissimilar Rows or Dissimilar Columns against a larger bicluster. Considering a filtering option with 20% of dissimilar elements. In this context, biclusters sharing more than 80% of their elements against a larger bicluster are removed. |
Fig. 4Illustrative application of BicPAMS: input data and output biclusters
Analysis of the highly enriched terms (p-value below 0.01 after correction using Enrichr [33]) for the 182 pattern-based biclusters found with BicPAMS in the dlbcl dataset (human cellular responses to chemotherapy) against multiple repositories: pathway databases (KEEG, WIKI, Reactome and BioCarta), human PPIs, GO, NCI-60 and cancer cell line Encyclopedia, Human Gene Atlas and MSigDB
| Database | Avg. | Summary | |
|---|---|---|---|
| Pathways | KEEG Pathways | 23±11 | Each of the 182 biclusters has a compact set of coherent and significantly enriched pathways in the KEEG database. There is a high dissimilarity (low overlapping) of enriched pathways between biclusters. To illustrate the relevance of enriched pathways to characterize the putative biological role per bicluster, consider the following four discovered biclusters {BK1,BK2,BK3,BK4} with terms showing a corrected |
| WIKI Pathways | 20±7 | Although dissimilarity of WIKI pathways between biclusters is also observed, the overlapping degree of pathways is higher than previous KEEG-based analysis. Consider the highly enriched terms (corrected | |
| Reactome Pathways | 69±37 | The found biclusters have in average a higher number of enriched pathways in the Reactome than in peer databases. Considering two randomly selected biclusters {BR1,BR2} and pathways with enriched | |
| BioCarta Pathways | 5,5±2,5 | The found biclusters are associated with small and dissimilar sets of enriched pathways in the BioCarta database. BioCarta provides unique pathway knowledge, being essential to guarantee a more complete view of the putative roles of biclusters. Let us consider the enriched pathways for 3 randomly selected biclusters, {BW1,BW2,BW3}. BW1 is associated with T-cell receptor (TCR) pathways, including TCR activation by tyrosine kinases, TCR apoptosis and TCR signaling. Similarly to WIKI pathways, BW2 is associated with the signaling of B-cell receptor (BCR) and BW3 with the control of cancerous cell proliferation (inc. regulation of DNA replication and p53 signaling). | |
| Cell lines | NCI-60 Cancer cell lines | 5,3±2,1 | The majority of biclusters shows a compact set of enriched cell lines – group of genes with unexpectedly high or low expression against remaining cell lines – with few overlapping cell lines between pairs of biclusters. This analysis is key t unravel unique properties of the lymphoma targeted by each bicluster. To illustrate, consider three randomly selected biclusters, {BN1,BN2,BN3}: BN1 was found to be primarily related with follicular lymphoma (RS11846 cell line with corrected 7.9E-9 |
| Cancer cell line Encyclopedia | 47±30 | The majority of enriched cancer cell lines were found to be associated with tumors of the hematopoietic and lymphoid tissues. In general, each bicluster shows an unique set of enriched cell lines. Consider 3 randomly selected biclusters {BC1,BC2,BC3} with enriched cell lines (corrected | |
| Human Gene Atlas | 4±1,4 | The analysis of terms enriched in the human gene atlas is pertinent to understand the types of cells more likely to be affected by the putative biological responses modeled per bicluster. A few biclusters were found to be associated with effects on the whole blood cells, while the remaining majority of biclusters model more specific biological responses thus showing enrichment on specific types of cells. Considering four randomly selected biclusters {BH1,BH2,BH3,BH4}, we found 721 B lymphoblasts and CD19+ B cells (with | |
| MSigDB Oncogenic Signatures | 9±1,5 | The Molecular Signatures database (MSigDB) tests the enrichment of genes with potential to cause cancer. Interestingly, the majority of the discovered biclusters have a single delineated oncogene (signature with considerably higher enrichment than peer signatures). A few illustrative signatures include: VEGFA UP with V1 DN (8.2E-8) corresponding to genes down-regulated by treatment with angionic factor VEGFA; RPS14 DN with V1 UP (4.3E-11) corresponding to genes up-regulated in CD34+ hematopoietic progenitor cells after knockdown of RPS14; or CAMP UP with V1 UP (3.4E-9) associated with genes up-regulated in primary thyrocyte cultures in response to cAMP signaling. This knowledge further discriminates the putative role of each bicluster. | |
| Regulation | Transcription factors | 11±3 | Compact and dissimilar sets of TFs were found to be associated with the found biclusters. Illustrating {STAT5A,STAT3,NFKB1}, {AIRE,ESR1,FOXP3,POU5F1,TP53} and {ILF2,CDKN1B,CCND1,UPF1} sets of TFs (with corrected |
| PPI Hub Proteins | 83±14 | This analysis shows the proteins enriched per bicluster acting as hubs in interaction networks. Despite the large number of enriched hubs per bicluster, it is interesting to notice that biclusters show a low number of overlapping hub-proteins with each other. The analysis of four randomly selected biclusters revealed the {PTPN6,JAK2,CBL}, {GABARAPL1,GABARAPL2, GABARAP}, {SHC1,IL7R,SRC} and {MCC,SLC2A4,CDK1} sets of hub proteins with corrected | |
| Gene Ontology | GO Biological processes | 298±90 | All biclusters show a high number of functionally coherent terms associated with cellular biological processes. An analysis of the enrichment for some biclusters is provided in Table |
| GO Cellular component | 28±16 | The analysis of the enriched cellular components provides complementary information to characterize the putative biological role of each bicluster. Given two randomly selected biclusters from the set of 182 biclusters: one bicluster was associated with cytosol and chromatin-related components (corrected | |
| GO Molecular function | 21±7 | Similarly to cellular components, the knowledge of the enriched molecular functions can be used to enlarge the GO-based analysis of biclusters. Each bicluster was found to be associated with a compact set of molecular functions consistently related with the molecular mechanisms underlying immunological responses to chemotherapy. Considering two randomly selected biclusters: the first bicluster showed enriched terms (with corrected | |
Illustrative set of terms highly enriched in BicPAMS biclusters
| Dataset | ID | Terms | Bicluster with best |
|
|---|---|---|---|---|
|
| Dl1 | Translation processes (including translational initiation and elongation) | 4.49E-5 | 81 |
| Dl2 | Transmembrane-related processes (including Golgi apparatus and MHC protein complex) | 5.40E-5 | 83 | |
| Dl3 | Defense response; processes related with intra-cellular communication, including receptor activity | 4.91-5 | 162 | |
| Dl4 | Innate immune responses, including response to interferon-gamma | 1.06E-4 | 58 | |
| Dl5 | Cellular responses to chemical stimulus, including response to cytokine stimulus | 0.001 | 60 | |
| Dl6 | Processes targeting the membrane-enclosed lumen associated with the cell cycle process | 2.92E-12 | 81 | |
| Dl7 | Immune system processes | 1.27E-4 | 52 | |
|
| H1 | Mitochondrion organization and translation; mitochondrial matrix | 2.70E-39 | 416 |
| H2 | Processes concerning the cell periphery and sporulation; cell wall constituent and organization | 1.73E-4 | 370 | |
| H3 | Ribonucleoprotein complex biogenesis | 3.61E-30 | 426 | |
| H4 | Metabolic and biosynthetic processes of cellular amino acids and carboxylic acids | 1.3E-25 | 581 | |
| H5 | Metabolic processes of organonitrogen and sulfur compounds | 1.62E-4 | 504 | |
|
| G1 | Cellular response to oxidative stress; generation of precursor metabolites and energy | 2.37E-4 | 296 |
| G2 | Processes to generate precursor metabolites and energy, including the tricarboxylic acid cycle | 1.16E-14 | 954 | |
| G3 | Retrotransposon nucleocapsid; viral procapsid maturation | 4.34E-6 | 102 | |
| G4 | Processes targeting the intracellular organelle lumen and nuclear lumen | 1.17E-47 | 263 | |
| G5 | Nucleolus; ncRNA metabolic processes | 1.03E-61 | 611 | |
| G6 | Intracellular non-membrane-bounded organelle; structural molecule activity | 5.33E-76 | 293 | |
| G7 | Processes targeting the cytosolic part and, in particular, the ribosomal subunit | 1.61E-88 | 460 | |
| G8 | Mitochondrion organization; mitochondrial part; biogenesis of certain protein complexes | 2.06E-26 | 592 | |
| G9 | Regulation of macromolecular biosynthetic processes; protein modification | 2.28E-13 | 1019 | |
| G10 | Organic substance catabolic and metabolic processes (including carbohydrates) | 1.02E-15 | 648 | |
| G11 | General processes associated with ribonucleoprotein complex biogenesis | 1.08E-94 | 784 |
Illustrative set of biologically relevant biclusters with different properties
| Dataset | ID | Pattern |
| Coherency assumption | Postprocessing |
|
|
|
| Best |
|---|---|---|---|---|---|---|---|---|---|---|
|
| B1 | FAABFFF | 6 (A-F) | constant | Merging with tight overlapping | 83 | 7 | 41 | 21 | 1.97E-10 |
| B2 | AAAABCAA | 3 (A-C) | constant | Extensions allowed (tight merging) | 153 | 8 | 9 | 1 | 2.27E-12 | |
| B3 | AAAAA/../EEEEE | 5 (A-E) | multiplicative | Reducing with high homogeneity | 119 | 5 | 5 | 18 | 4.12E-8 | |
|
| B4 | EEECEE | 5 (A-E) | constant | Merging allowed | 581 | 6 | 12 | 7 | 1.31E-25 |
| B5 | CCDCBCBCCC | 5 (A-E) | constant | Merging with relaxed overlapping | 654 | 10 | 16 | 4 | 1.31E-17 | |
| B6 | AAAAAA/... | 7 (A-G) | additive | Merging with tight overlapping | 476 | 6 | 12 | 10 | 1.92E-6 | |
|
| B7 | AAAGGGA/... | 7 (A-G) | multiplicative | Merging with tight overlapping | 483 | 7 | 57 | 10 | 1.24E-81 |
| B8 | AAABACCCAA/... | 5 (A-E) | additive | Merging allowed | 521 | 10 | 17 | 5 | 4.57E-12 |
Fig. 5Pattern-based biclusters retrieved from gasch data following a constant assumption with symmetries a, multiplicative assumption with symmetries (b), and additive assumption c and d
Biological networks used to experimentally assess BicPAMS
| Type | Source | Organism |
|
| Density | Notes on the weight of interactions |
|---|---|---|---|---|---|---|
| GI (gene interactions) | DryGIN | Yeast | 4455 | 191309 | 1.0% | Weights (65% negative) from double-mutant arrays [ |
| GI (gene interactions) | STRING | Yeast | 6314 | 3759902 | 1.1% | Known and predicted associations benchmarked from multiple data sources and literature (text mining), and combined through an integrative score [ |
| PPI (protein interactions) | STRING | E. Coli | 8428 | 3293416 | 4.6% | |
| PPI (protein interactions) | STRING | Human | 19247 | 8548002 | 2.3% |
Biological role of a subset of BicPAMS’ modules with varying properties
| ID | Homogeneity |
| Putative biological functions: enriched terms ( | |
|---|---|---|---|---|
| STRING (yeast) | Y1 | dense (high noise-tolerance) | 231 ×14 | Metabolic processes with incidence on peptide, protein and amide metabolism and biosynthesis. |
| Y2 | dense (medium noise-tolerance) | 217 ×9 | Metabolism of nitrogen compounds and other organic substances. | |
| Y3 | constant (few high | 103 ×8 | Amino acid activation and tRNA metabolism for aminoacylation. | |
| Y4 | constant (few high or low | 55 ×7 | Signal transduction and its related subterms. | |
| Y5 | constant (few high or low | 43 ×6 | Phosphorylation terms (with more incidence on protein phosphorylation). | |
| Y6 | order-preserving | 176 ×12 | Transport of organic acids (incidence on aminoacid transmembrane transport). | |
| Y7 | order-preserving | 235 ×9 | Oxidation-reduction process and metabolism of aminoacids. | |
| Y8 | order-pres. (few high | 146 ×8 | Transport of molecules (highest enrichment found for drug transmembrane). | |
| STRING (human) | H1 | dense (high noise-tolerance) | 811 ×28 | Multiple metabolic processes with incidence on transcription activity. |
| H2 | constant (few high | 693 ×14 | Regulation of intracellular signal transduction (over twenty highly enriched terms). | |
| H3 | constant (few high | 645 ×10 | Regulation of molecular functions with incidence on catalytic activity. | |
| H4 | order-preserving | 720 ×24 | Establishment of protein localization (protein targeting to ER and membrane). | |
| H5 | order-preserving | 733 ×29 | Protein phosphorylation and its subterms. | |
| DryGIN | D1 | dense (high noise-tolerance) | 28 ×17 | Organelle localization (establishment of spindle and nuclear localization). |
| D2 | constant (with pos&neg | 22 ×10 | Chromatin remodeling and nucleosome organization. | |
| D3 | constant (with pos&neg | 21 ×7 | Transport processes for the establishment of protein localization. | |
| D4 | constant (with pos&neg | 19 ×9 | Regulation of growth (with incidence on filamentous growth). | |
| D5 | order-preserving | 39 ×7 | Organelle and nucleous organization. | |
| D6 | order-preserving | 54 ×6 | Negative and positive regulation of cellular metabolic processes. |
Relevance and exclusivity of BicPAMS’ solutions: properties of some of the found modules in DryGIN
| ID | Type |
| Items |
| Details | |
|---|---|---|---|---|---|---|
| DryGIN | G1 | constant | 18 ×9 | {-4,..,-1} | 27 | Module with coherently strong (–4) and soft (–1) negative interactions. |
| G2 | symmetric | 4 ×9 | {-3,..,3} | 13 | Module with multiple levels of strong (mainly positive) interactions ({ ±3,±2}). | |
| G3 | symmetric | 5 ×6 | {-2,-1,1,2} | 12 | Module with either all negative or positive interactions per “row”-node ({ ±1,±2}). | |
| G4 | constant | 7 ×5 | {1,2} | 12 | Module with coherent strong (2) and soft (1) positive interactions. | |
| G5 | symmetric | 7 ×5 | {-2,-1,1,2} | 11 | Module with either all negative or positive interactions per “row”-node ({ ±1,±2}). | |
| G6 | order | 14 ×11 | {-3,..,3} | 25 | Preserved precedences and co-occurrences per “row”-node before postprocessing. | |
| G7 | order | 42 ×8 | {-2,-1,1,2} | 50 | Noise-tolerant module with mostly preserved orderings per “row”-node. |