| Literature DB >> 12959646 |
Mark Smolkin1, Debashis Ghosh.
Abstract
BACKGROUND: A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed.Entities:
Mesh:
Year: 2003 PMID: 12959646 PMCID: PMC200969 DOI: 10.1186/1471-2105-4-36
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Distance measures used for hierarchical cluster analysis
| Name | |
| Euclidean | |
| Manhattan | |
| Canberra | |
| Maximum | max1≤ |
Cluster stability scores for Khan et al. [3] data
| Cluster | |||||
| Gene % | 1 | 2 | 3 | 4 | 5 |
| 85 | 0.12 (0.66) | 0.63 (0.82) | 0.29 (0.66) | 0.87 (0.95) | 1.00 (1.00) |
| 75 | 0.07 (0.59) | 0.56 (0.78) | 0.23 (0.61) | 0.86 (0.95) | 1.00 (1.00) |
| 50 | 0.03 (0.51) | 0.31 (0.61) | 0.07 (0.41) | 0.85 (0.95) | 0.97 (0.98) |
| 25 | 0.00 (0.00) | 0.10 (0.38) | 0.03 (0.30) | 0.58 (0.83) | 0.88 (0.93) |
Note: Average linkage hierarchical clustering used here. The sizes of clusters 1–5 are 66,4,7,7 and 2, respectively. Gene % represents percentage (out of 100) of p = 2308 genes used for calculating cluster stability scores. Numbers in parentheses represent cluster size-adjusted stability scores.
Cluster stability scores for Khan et al. [3] data
| Cluster | |||||||
| Gene % | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 85 | 0.63 (0.89) | 0.53 (0.83) | 0.04 (0.43) | 0.79 (0.92) | 0.15 (0.64) | 0.67 (0.87) | 0.62 (0.7) |
| 75 | 0.61 (0.88) | 0.42 (0.77) | 0.02 (0.37) | 0.71 (0.88) | 0.04 (0.47) | 0.64 (0.86) | 0.60 (0.7) |
| 50 | 0.17 (0.64) | 0.05 (0.41) | 0.00 (0.00) | 0.31 (0.66) | 0.01 (0.33) | 0.36 (0.71) | 0.69 (0.8) |
| 25 | 0.06 (0.49) | 0.01 (0.26) | 0.00 (0.00) | 0.14 (0.49) | 0.00 (0.00) | 0.21 (0.59) | 0.47 (0.6) |
Note: Complete linkage hierarchical clustering used here. The sizes of clusters 1–7 are 19, 11, 18, 6, 26, 7 and 2, respectively. Gene % represents percentage of p = 2308 genes used for calculating cluster stability scores. See note to Table 2.
Cluster stability scores for Alizadeh et al. [1] data
| Cluster | ||||||||
| Gene % | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 85 | 1.00 (1.00) | 0.19 (0.70) | 1.00 (1.00) | 0.39 (0.64) | 0.42 (0.75) | 1.00 (1.00) | 0.99 (0.99) | 1.00 (1.00) |
| 75 | 1.00 (1.00) | 0.12 (0.70) | 0.99 (1.00) | 0.35 (0.61) | 0.44 (0.76) | 1.00 (1.00) | 0.92 (0.95) | 1.00 (1.00) |
| 50 | 0.97 (0.99) | 0.11 (0.62) | 0.95 (0.99) | 0.28 (0.55) | 0.34 (0.69) | 1.00 (1.00) | 0.73 (0.83) | 0.84 (0.90) |
| 25 | 0.90 (0.95) | 0.02 (0.43) | 0.77 (0.94) | 0.08 (0.30) | 0.37 (0.71) | 1.00 (1.00) | 0.41 (0.59) | 0.63 (0.76) |
Note: Average linkage hierarchical clustering used here. The sizes of clusters 1–8 are 3, 40, 26, 3, 7, 5, 2 and 2, respectively. Gene % represents percentage of p = 4026 genes used for calculating cluster stability scores. See note to Table 2.
Cluster stability scores for Alizadeh et al. [1] data
| Cluster | ||||||||
| Gene % | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 85 | 0.98 (0.99) | 0.19 (0.70) | 0.98 (0.99) | 0.72 (0.87) | 0.99 (1.00) | 1.00 (1.00) | 1.00 (1.00) | 1.00 (1.00) |
| 75 | 0.89 (0.96) | 0.10 (0.61) | 0.95 (0.99) | 0.57 (0.79) | 0.92 (0.96) | 0.98 (0.99) | 1.00 (1.00) | 0.98 (0.99) |
| 50 | 0.62 (0.86) | 0.08 (0.59) | 0.71 (0.90) | 0.36 (0.65) | 0.75 (0.87) | 0.82 (0.93) | 0.97 (0.99) | 0.88 (0.97) |
| 25 | 0.35 (0.71) | 0.03 (0.48) | 0.49 (0.81) | 0.13 (0.43) | 0.53 (0.74) | 0.66 (0.86) | 0.82 (0.91) | 0.72 (0.92) |
Note: Complete linkage hierarchical clustering used here. The sizes of clusters 1–8 are 8, 41, 11, 4, 3, 6, 3 and 15, respectively. Gene % represents percentage of p = 4026 genes used for calculating cluster stability scores. See note to Table 2.
Cluster stability scores for Bittner et al. [2] data
| Cluster | ||||
| Gene % | 1 | 2 | 3 | 4 |
| 85 | 0.09 (0.48) | 0.98 (0.99) | 0.09 (0.49) | 0.52 (0.73) |
| 75 | 0.03 (0.35) | 0.90 (0.96) | 0.04 (0.39) | 0.47 (0.70) |
| 50 | 0.03 (0.35) | 0.71 (0.88) | 0.03 (0.36) | 0.34 (0.60) |
| 25 | 0.00 (0.00) | 0.48 (0.77) | 0.01 (0.26) | 0.28 (0.55) |
Note: Complete linkage hierarchical clustering used here. The sizes of clusters 1–4 are 10, 6, 11, and 3, respectively. Gene % represents percentage of p = 3613 genes used for calculating cluster stability scores. See note to Table 2.
Cluster stability scores for Bittner et al. [2] data
| Cluster | ||||
| Gene % | 1 | 2 | 3 | 4 |
| 85 | 0.47 (0.83) | 1.00 (1.00) | 0.29 (0.28) | 0.16 (0.34) |
| 75 | 0.36 (0.78) | 1.00 (1.00) | 0.34 (0.53) | 0.09 (0.24) |
| 50 | 0.14 (0.62) | 0.98 (0.99) | 0.44 (0.62) | 0.06 (0.19) |
| 25 | 0.07 (0.52) | 0.87 (0.90) | 0.33 (0.52) | 0.05 (0.17) |
Note: Average linkage hierarchical clustering used here. The sizes of clusters 1–4 are 21, 2, 2, and 2, respectively. Gene % represents percentage of p = 3613 genes used for calculating cluster stability scores. See note to Table 2.