| Literature DB >> 18366617 |
Abstract
BACKGROUND: Biclustering of gene expression data searches for local patterns of gene expression. A bicluster (or a two-way cluster) is defined as a set of genes whose expression profiles are mutually similar within a subset of experimental conditions/samples. Although several biclustering algorithms have been studied, few are based on rigorous statistical models.Entities:
Mesh:
Year: 2008 PMID: 18366617 PMCID: PMC2386069 DOI: 10.1186/1471-2164-9-S1-S4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Simulated data with two biclusters and the results of the BBC analysis. Bayesian biclustering for simulated datasets. (a) A dataset with two non-overlapping clusters. (b)-(c) The two clusters found by the Bayesian biclustering model from (a). (d) A dataset with two clusters with common genes. (e)-(g) The three clusters found by the Bayesian biclustering model from (d). (h) A dataset with two clusters with both common samples and common genes. (i)-(k) The three clusters found by the Bayesian biclustering model from (h).
Figure 2Datasets simulated according to the plaid model Datasets for comparison. (a) A dataset with one single cluster (b) A dataset with two clusters, of which both genes and samples overlap.
Biclustering results of different methods for simulated data using the plaid model
| Sensitivity | Specificity | Overlapping rate | # of clusters | |||||
| ISA (0.6, 1) | 1 | 0.84 | 0.99 | 0.84 | 0 | 0.12 | 1 | 3 |
| ISA (0.6, 1.2) | 0.95 | 0.53 | 0.84 | 0.90 | 0.06 | 0.08 | 10 | 8 |
| ISA (0.7, 1.1) | 0.84 | 0.68 | 0.91 | 0.84 | 0 | 0.16 | 10 | 8 |
| SAMBA | 0.43 | 0.39 | 0.99 | 0.99 | 0.31 | 0.3 | 7 | 14 |
| CC* | 1 | 0.98 | 0 | 0 | 0.02 | 0 | 10 | 10 |
| OPSMs | 0.38 | 0.25 | 0.94 | 0.96 | 0.3 | 0.5 | 11 | 12 |
| Plaid | 1 | 1 | 1 | 0.73 | 0 | 0.63 | 1 | 11 |
| BBC** | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 3 |
Note: *In CC's method, the number of clusters is preset to be 10. **In BBC, the overlapping rate is automatically 0.
Figure 3The Simulated dataset with realistic characters
Biclustering results of different methods for simulated data with realistic characteristics
| Sensitivity | Specificity | Overlapping rate | # of clusters | |||||
| ISA (0.6, 1) | 0.98 | 0.70 | 0.76 | 0.78 | 0.51 | 0.65 | 7.2 | 9.8 |
| ISA (0.6, 1.2) | 0.90 | 0.75 | 0.79 | 0.73 | 0.57 | 0.57 | 11.2 | 13 |
| ISA (0.7, 1.1) | 0.94 | 0.76 | 0.80 | 0.79 | 0.48 | 0.59 | 8.3 | 10.9 |
| SAMBA | 0.38 | 0.28 | 0.99 | 0.99 | 0.37 | 0.37 | 5.8 | 5.3 |
| CC* | 0.84 | 0.70 | 0.15 | 0.25 | 0.02 | 0.01 | 10 | 10 |
| OPSMs | 0.21 | 0.16 | 0.91 | 0.91 | 0.35 | 0.35 | 9.3 | 8.9 |
| Plaid | 1.00 | 0.99 | 0.48 | 0.61 | 0.30 | 0.18 | 5 | 2.9 |
| BBC** | 1 | 0.97 | 0.99 | 0.97 | 0 | 0 | 2 | 2 |
Note: *In CC's method, the number of clusters is preset to be 10. **In BBC, the overlapping rate is automatically 0.
Comparison of normalization methods for Bayesian Biclustering Model
| Sensitivity | Specificity | Overlapping rate | # of clusters | |
| RSN | 0.84 | 0.85 | 0 | 3 |
| CSN | 0.95 | 0.58 | 0 | 3 |
| QN | 1 | 0.44 | 0 | 4 |
| IQRN | 1 | 1 | 0 | 3 |
| SQRN | 1 | 1 | 0 | 3 |
Bayesian Biclustering results for yeast expression data
| Cluster name | size* | Significant conditions (P value) | Enriched TFBS (P value) | Enriched gene functions (P-value) |
| ribosome proteins | 213,85 | nitrogen depletion(7.1e-3), steady state (3.9e-4) | RAP1 (2.9e-60) | ribosomal protein (2.1e-160) |
| rRNA processing | 329,113 | steady state (8.9e-4) | ABF1 (5.2e-4), PAC (1.2e-127), RRPE (2.7e-63) | rRNA processing (4.3e-77), nucleic acid binding (1.6e-25) |
| ubiquitin | 113,88 | diamide stress(4.2e-3), menadione stress(2.7e-2) | RPN4 (4e-12) | ubiquitin / proteasomal pathway (8.3e-12) |
| oxidative stress | 40,38 | hydrogen peroxide stress (4.8e-8), menadione stress(4e-7), diamide stress (3.2e-6) | CAD1(5.7e-15), YAP1(1.9e-15) | oxidative stress response (9.3e-8), metabolism of phenylalanine (4.2e-8), metabolism of tyrosine (2.7e-8) |
| respiration | 55,97 | steady state(1.8e-7) | HAP4 (1.3e-16), SKN7(6.3e-8), MSN24a(7.4e-4) | respiration (2.5e-38), electron transport and membrane-associated energy conservation (5.1e-45) |
| purin metabolism | 42,48 | menadione stress (4.1e-6), amino acid starvation (4.8e-3) | BAS1 (3.2e-5) | purin nucleotide/nucleoside/nucleobase anabolism (6.2e-10) |
| stress response and protein folding | 48,46 | heat shock (4.5e-7), diamide stress (1.7e-4), osmolarity stress (6.5e-4) , MSN2/4 and YAP1 deletion (3.8e-3) | HSF1 (4.7e-3), | protein folding and stabilization (8e-8), stress response(3.0e-5) |
| stress response and heat shock | 87,191 | heat shock (5.2e-3) | HSF1 (1.5e-3), MSN24 (6.1e-11), MSN24a (9.6e-11), STRE (1.0e-5), GIS1 (1.9e-4) | C-compound and carbohydrate metabolism (1.0e-3), energy (7.4e-4) |
| cell cycle | 86,87 | α factor (3.5e-8), cdc15 (3.7e-8), cdc28 (4.5e-2), elu (4.0e-6) | MCM1 (1.0e-10), SWI4 (4.16e-7), FKH1 (6.6e-7), MBP1(3.6e-4), TATA (1.3e-4) | cell cycle and DNA processing (5.1e-9), cytokinesis (cell division) (2.9e-6), pheromone response (7.6e-4) |
| DNA topology | 35,45 | cln3, clb2 (2.1e-2) | GCN4(4.3e-6), MBP1 (2.0e-5), MCM1 (3.2e-3), SWI4 (1.1e-3), XBP1 (1.3e-5) | DNA topology (1.3e-22), somatic/ mitotic recombination (8.9e-9) |
| cell cycle (G1 phase) | 108,62 | α factor (3.35e-11), cdc 15 (2.5e-10), cdc28 (7.8e-6) | MBP1 (3.7e-14), SWI4 (6.4e-5) | cell cycle and DNA processing (1.4e-12) |
| nitrogen, sulfur & selenium metabolism | 37,16 | amino acid starvation (1.2e-5), nitrogen depletion (4.2e-2) | CBF1(3.3e-7), GCN4 (7.3e-5), MET31 (8.7e-4), MET4(1e-7) | amino acid metabolism (1.5e-30), nitrogen, sulfur and selenium metabolism (1.3e-13) |
| glycolysis regulation | 38,78 | Disulfide-reducing agent stress (1.6e-4), diamide (1.5e-3) | GCR1 (4.6e-3) | sugar, glucoside, polyol and carboxylate catabolism (3.3e-10), glycolysis and gluconeogenesis (3.1e-11) |
*size:(the number of genes in the cluster, the number of conditions in the cluster)