| Literature DB >> 20015398 |
Wassim Ayadi1, Mourad Elloumi, Jin-Kao Hao.
Abstract
BACKGROUND: In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed.Entities:
Year: 2009 PMID: 20015398 PMCID: PMC2804695 DOI: 10.1186/1756-0381-2-9
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Figure 1Different typical Biclusters. Data matrix M1 represents a constant bicluster, M2 represents a constant rows bicluster, M3 represents a constant column bicluster, M4 represents coherent values (additive model), M5 represents coherent values (multiplicative model), M6 represents coherent values (multiplicative model, where the first row of M5 is multiplied by 10) and M7 represents a coherent evolution.
ASR versus MSR and ACV.
| Biclusters | |||||||
|---|---|---|---|---|---|---|---|
| Evaluation Functions | |||||||
| MSR | 0 | 0 | 0 | 0 | 0.62 | 2.425 | 131.87 |
| ACV | 1 | 1 | 1 | 1 | 1 | 1 | 0.84 |
| ASR | 1 | 1 | 1 | 1 | 1 | 1 | 0.99 |
Data matrix M'.
| I1 | 10 | 20 | 5 | 15 | 40 | 18 |
| I2 | 20 | 40 | 10 | 30 | 24 | 20 |
| I3 | 23 | 12 | 8 | 15 | 29 | 50 |
| I4 | 4 | 8 | 2 | 6 | 5 | 5 |
| I5 | 15 | 25 | 8 | 12 | 29 | 50 |
Data matrix M after preprocess.
| I1 | 10 | 20 | 5 | 15 | 40 | - |
| I2 | 20 | 40 | 10 | 30 | - | 20 |
| I3 | - | 12 | 8 | 15 | 29 | 50 |
| I4 | 4 | 8 | 2 | 6 | - | - |
| I5 | 15 | - | 8 | 12 | 29 | 50 |
Figure 2.
Figure 3First level of BET.
Figure 4Children construction of the first node of the second level of BET.
Figure 5Second level of BET.
Figure 6Last level of BET.
Figure 7Extracted biclusters are presented with bold line.
BiMine results and comparison with other algorithms in synthetic data without overlapped biclusters.
| Algorithms | ||
|---|---|---|
| CC | 18.21% | 36.57% |
| OPSM | 46.39% | 74.42% |
| ISA | 39.38% | 5.31% |
| 58.18% | 21.39% | |
| 100% | 33.03% | |
BiMine results and comparison with other algorithms in synthetic data with overlapped biclusters.
| Algorithms | ||
|---|---|---|
| CC | 9.21% | 47.94% |
| OPSM | 42.87% | 49.31% |
| ISA | 23.28% | 23.97% |
| 34.07% | 3.43% | |
| 85.35% | 41.78% | |
Proportions of Biclusters significantly enriched by GO annotations.
| p-value | 5% | 1% | 0.5% | 0.1% | 0.001% |
|---|---|---|---|---|---|
| Algorithms | |||||
| 100 | 100 | 93 | 82 | 51 | |
| OPSM | 100 | 100 | 86 | 36 | 22 |
| 100 | 100 | 89 | 79 | 64 | |
| ISA | 89 | 89 | 87 | 69 | 32 |
| CC | 80 | 70 | 60 | 20 | 10 |
Most significant shared GO terms (process, function, component) for two biclusters on Yeast data.
| Bicluster volume (genes × conditions) | Process Ontology | Function Ontology | Component Ontology |
|---|---|---|---|
| (12 × 13) | cellular response to DNA damage stimulus (66.7%, 1.87e-08) | chromatin binding (25%,0.00037) | microtubule organizing center part(16.7%, 0.00742) |
| (11 × 11) | cell cycle process (63.6%, 2.93e-05) | GTPase activator activity (18.2%,0.00994) | microtubule cytoskeleton (45.5%, 6.33e-06) |
Figure 8Two Biclusters found by . (a): Bicluster of size (12 × 13) with ASR = 0.8873. (b): Bicluster of size (11 × 11) with ASR = 0.8690.