| Literature DB >> 19698124 |
Eun-Youn Kim1, Seon-Young Kim, Daniel Ashlock, Dougu Nam.
Abstract
BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.Entities:
Mesh:
Year: 2009 PMID: 19698124 PMCID: PMC2743671 DOI: 10.1186/1471-2105-10-260
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The data sets with complicated clusters: (a) Donut & Ball, (b) Horse Shoe, and (c) Spiral and their cut-plots.
The ARI values for the geometric data sets.
| 0.6018 (2) | 0 (2) | 0.0510 (2) | ||
| 0.2553 (6) | 0.1219 (5) | 0.1487 (6) | ||
| 0.2893 (10) | 0.2596 (11) | 0.2303 (20) | ||
| 0.4895 (2) | 0.3434 (6) | 0.1532 (29) | ||
| 0.1752 (12) | 0.3080 (7) | 0.2072 (18) | ||
| 0.3326 (6) | 0.1460 (29) | |||
In each parenthesis is shown the known (top header) and the predicted number of clusters. Those ARI values that exceed 0.7 are shown in bold.
Test results for high dimensional synthetic data sets.
| 0.5367 (5/10) | |||
| 0.0993 (0/10) | |||
| 0 (1/10) | 0.2277 (1/10) | ||
| 0.0839 (9/10) | 0.3444 (4/10) | ||
| 0.3715 (1/10) | |||
The adjusted Rand indexes are shown averaged over ten randomly generated data sets. In each parenthesis is shown the number of cases out of ten that correctly identified the number of clusters. ARI values that exceed 0.7 are shown in bold.
Description of microarray data sets tested
| Data set | Acquisition | Description of known subclasses | DNA Chip |
| Leukemia | ALL B-cell (38), ALL T-cell (9), AML (25) | Affymetrix HuFL | |
| Lymphoma | GEO (GSE60) | B-CLL (12), FL (9), DLBCL (68) | cDNA |
| Colon tumor | GEO (GSE5206) | Colon tumor (100), Normal colon (5) | Affymetrix U133 Plus 2.0 |
| Thyroid tumor I | GEO (GSE3467) | Thyroid tumor (9), Normal thyroid (9) | Affymetrix U133 Plus 2.0 |
| Thyroid tumor II | GEO (GSE3678) | Thyroid tumor (7), Normal thyroid (7) | Affymetrix U133 Plus 2.0 |
| St. Jude | BCR-ABL (15), E2A-PBX1 (27), Hyperdip-50 (64), | Affymetrix U95A | |
| Normal tissue I | GEO (GDS422) | Bone (2), Liver (2), Heart (2), Spleen (2), | Affymetrix U95A |
| Normal tissue II | Bladder (7), Breast (5), Cerebellum (3), Colon (11), Germinal Center (6), Kidney (12), Lung (7), Ovary (4), Pancreas (10), PBM (5), Prostate (9), Uterus (6), | Affymetrix HuFL |
Figure 2Cut and entropy-plots for the eight microarray gene expression data sets: (a) Leukemia, (b) Lymphoma, (c) Colon cancer, (d) Tyroid I, (e) Tyroid II, (f) St. Jude, (g) Normal I, and (h) Normal II. Upper figures represent cut-plots and lower figures, entropy-plots for each data set. Cut-intervals in three cut-plots (a), (f), and (h) were merged as indicated by arrows. The dotted intervals indicate the finalized numbers of clusters. The negligible entropy jumps were also indicated by vertical arrows in entropy-plots.
Test results for real expression data sets.
| Data set | MULTI-K | GCCk | GCCc | Hier. | |||
| gap | Sil | gap | Sil | ||||
| Leukemia ( | 0.4604 (6) | 0.6930 (2) | 0.4881 (2) | ||||
| Lymphoma ( | 0.3016 (5) | 0.1454 (6) | |||||
| Colon ( | 0.0349 (5) | 0.0386 (4) | 0.0540 (6) | 0.0088 (25) | |||
| Thyroid I ( | 0.2396 (5) | 0.1351 (4) | 0.3462 (3) | 0.4183 (2) | |||
| Thyroid II ( | 0.4468 (4) | 0.3986 (5) | 0.5895 (4) | ||||
| St. Jude ( | 0.6697 (4) | 0.5579 (9) | 0.1985 (2) | 0.1985 (2) | 0.1985 (2) | ||
| Normal I ( | 0.4964 (11) | 0 (1) | |||||
| Normal II ( | 0.2476 (13) | 0.6875 (10) | 0.1985 (2) | 0.6830 (16) | |||
In each parenthesis is shown the known (left header) and the predicted number of clusters. ARI values that exceed 0.7 are shown in bold.
Figure 3The cut-plot (left) and entropy-plot (right) for a randomized leukemia data set.
Test results for known number of classes.
| Data set | MULTI-K | GCCk | GCCc | Hier. | |
| Lymphoma | 0.4027 | ||||
| Colon | 0.0252 | 0.0342 | 0.0390 | ||
| Thyroid I | 0.5815 | 0.4138 | |||
| Thyroid II | |||||
| St. Jude | 0.6352 | 0.1958 | |||
| Normal I | 0.4544 | ||||
| Normal II | 0.4590 | 0.4934 |
ARI values that exceed 0.7 are shown in bold