| Literature DB >> 24920961 |
Abstract
BACKGROUND: Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge.Entities:
Keywords: Consensus clustering; Gene expression; Semi-supervised clustering; Semi-supervised consensus clustering
Year: 2014 PMID: 24920961 PMCID: PMC4036113 DOI: 10.1186/1756-0381-7-7
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Attributes of four clustering algorithms
| Simple clustering | - | - | No | ||
| LCE | Consensus clustering | SC | HBGF | No | |
| SSC | Semi-supervised clustering | SC | - | - | Yes |
| SSCC | Semi-supervised consensus clustering | SSC | SC | HBGF | Yes |
Cancer gene expression datasets used in experiments
| CNS [ | 42 | 7129 | 1379 | 5 | 20 | 2.2% |
| Leukemia1 [ | 72 | 7129 | 1877 | 2 | 20 | 0.77% |
| Leukemia2 [ | 72 | 7129 | 1877 | 3 | 20 | 0.77% |
| Leukemia3 [ | 72 | 12582 | 2194 | 3 | 20 | 0.77% |
| LungCancer [ | 203 | 12600 | 1543 | 5 | 100 | 0.48% |
| St.Jude [ | 248 | 12625 | 2526 | 6 | 100 | 0.32% |
| Multi-Tissue1 [ | 174 | 12533 | 1571 | 10 | 100 | 0.66% |
| Multi-Tissue2 [ | 190 | 16063 | 1363 | 14 | 100 | 0.55% |
Figure 1Normalized mutual information with various numbers of constraints on (A) CNS (B) Leukemia1 (C) Leukemia2 (D) Leukemia3 (E) LungCancer (F) St. Jude (G) Multi-Tissue1 (H) Multi-Tissues2 datasets (Error bars show 95% confidence interval).
Figure 2Adjusted rand index with various numbers of constraints on (A) CNS (B) Leukemia1 (C) Leukemia2 (D) Leukemia3 (E) LungCancer (F) St. Jude (G) Multi-Tissue1 (H) Multi-Tissues2 datasets (Error bars show 95% confidence interval).
Without prior knowledge, comparison among SSCC, SSC, LCE, and -means
| | ||||||
|---|---|---|---|---|---|---|
| SSCC | 4/4/0 | 7/1/0 | 8/0/0 | 4/3/1 | 7/1/0 | 8/0/0 |
| SSC/SC | - | 6/2/0 | 8/0/0 | - | 6/2/0 | 6/2/0 |
| LCE | - | - | 6/2/0 | - | - | 5/3/0 |
All results are summarized in w/t/l, i.e. the first algorithm wins w times, ties t times and loses l times.
With prior knowledge, paired t-test for the mean difference between SSCC and SSC
| CNS | 0.041* | 0.097* |
| Leukemia1 | 0.056* | 0.053* |
| Leukemia2 | 0.094* | 0.143* |
| Leukemia3 | 0.024* | 0.031* |
| Lungcancer | 0.018* | -0.037* |
| St.Jude | 0.009* | 0.0144* |
| MultiTissue1 | 0.002 | 0.007 |
| MultiTissue2 | 0.012* | 0.035* |
| | SSCC vs. SSC | SSCC vs. SSC |
| w/t/l | 7/1/0 | 6/1/1 |
*The mean difference (SSCC - SSC) is significant at p<0.05 level. The results are summarized in w/t/l, i.e. the first algorithm wins w times, ties t times and loses l times.
Figure 3Normalized mutual information of SSCC and LCE with the change of ensemble size on eight datasets.
Figure 4Normalized mutual information of SSCC and LCE with two ensemble types on eight datasets.
Figure 5Normalized mutual information of SSC and SSCC with various numbers of neighbor size on eight datasets.