| Literature DB >> 21330289 |
Jessica C Mar1, Christine A Wells, John Quackenbush.
Abstract
MOTIVATION: Unsupervised 'cluster' analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution.Entities:
Mesh:
Year: 2011 PMID: 21330289 PMCID: PMC3072547 DOI: 10.1093/bioinformatics/btr074
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overall workflow of the informativeness metric-based approach.
Fig. 2.Schematic diagram of the assumptions underlying calculation of the informativeness metric. Individual dots represent data points that represent an expression profile; by definition, an informative profile will be one whose points sit far away from the overall mean in each condition.
Fig. 3.Average expression profiles for the MAPK pathway with different numbers of cluster applied. These clusters were generated from complete linkage agglomerative hierarchical clustering, using Pearson's correlation metric. (A) Two clusters. (B) Average cluster expression profiles for the MAPK pathway with the number of clusters prescribed by the informativeness metric. Note the appearance of the first cluster which has an expression profile distinct from those identified using other methods.
Optimal number of clusters inferred for the simulated datasets
| Number of simulated clusters | Compactness Metrics | Stability Metrics | Informativeness metric | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gap statistic | Connectivity | Dunn index | Silhouette width | APN | AD | ADM | FOM | |||
| 4 | 5 | 2 | 3 | 3 | 2 | 6 | 2 | 6 | 1 | 4 |
| 6 | 6 | 2 | 5 | 4 | 2 | 13 | 2 | 11 | 2 | 6 |
| 8 | 8 | 2 | 7 | 6 | 2 | 9 | 2 | 8 | 2 | 8 |
Optimal number of clusters inferred for the Müller dataset
| Experimental datasets | Compactness metrics | Stability metrics | Informativeness metric | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gap statistic | Connectivity | Dunn index | Silhouette width | APN | AD | ADM | FOM | |||
| MAPK (KEGG ID 04010) | 2 | 2 | 2 | 2 | 4 | 6 | 4 | 6 | 2 | 3 |
| Focal adhesion (KEGG ID 04510) | 2 | 2 | 4 | 2 | 2 | 7 | 2 | 5 | 2 | 4 |
| Regulation of actin cytoskeleton (KEGG ID 04810) | 2 | 2 | 10 | 2 | 2 | 11 | 2 | 5 | 3 | 5 |