| Literature DB >> 20007256 |
Yijing Shen1, Wei Sun, Ker-Chau Li.
Abstract
MOTIVATION: Various clustering methods have been applied to microarray gene expression data for identifying genes with similar expression profiles. As the biological annotation data accumulated, more and more genes have been organized into functional categories. Functionally related genes may be regulated by common cellular signals, thus likely to be co-expressed. Consequently, utilizing the rapidly increasing functional annotation resources such as Gene Ontology (GO) to improve the performance of clustering methods is of great interest. On the opposite side of clustering, there are genes that have distinct expression profiles and do not co-express with other genes. Identification of these scattered genes could enhance the performance of clustering methods.Entities:
Mesh:
Year: 2009 PMID: 20007256 PMCID: PMC2815660 DOI: 10.1093/bioinformatics/btp671
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flow chart of the DWCN algorithm.
Fig. 2.(a) The original clusters of simulated dataset. Gene IDs are displayed by y-axis and the x-axis displays the clusters. Each true cluster receives a color/shape. Gene IDs from 501 to 1000 are noise points. (b) Tight clustering. (c) Standard K-means with K=10. (d) PW-Kmeans. (e) DWCN.
Rand indices comparisons on simulation
| Panel A | |||
| Rand index 1 | Rand index 2 | Weighted rand | |
| DWCN | 0.53 | 0.72 | 0.59 |
| K-means | 0.10 | 0.47 | 0.29 |
| Tight | 0.25 | 0.35 | 0.29 |
| PWK-means | 0.37 | 0.23 | 0.32 |
| Panel B | |||
| Rand index 3 | Total entropy | No. of over-represented GO terms ( | |
| DWCN | 0.136 | 57.72 | 27 |
| R-DWCN | 0.026 | 83.47 | 7 |
| K-means | 0.06 | 69.57 | 18 |
| Tight | 9.5×10−5 | 109.95 | 12 |
| PWK-means | 0.004 | 85.3 | 1 |
| Panel C | |||
| Rand index 3 | Total entropy | No. of over-represented GO terms ( | |
| DWCN | 0.23 | 71.33 | 32 |
| K-means | 0.026 | 87.46 | 21 |
| Tight | 0.01 | 76.7 | 18 |
| PWK-means | 0.0019 | 101.83 | 0 |
| Panel D | |||
| Rand index 3 | Total entropy | No. of over-represented GO terms ( | |
| DWCN | 0.14 | 47.4 | 27 |
| K-means | 0.049 | 55.63 | 19 |
| Tight | 0.013 | 53.56 | 14 |
| PWK-means | 0.008 | 85.05 | 2 |
Panel A: Comparisons made between DWCN, K-means, tight clustering and PWK-means algorithms. Higher indices values imply better consistency between the identified clusters and the underlying true clusters. Panels B and C: Evaluation of the clusters identified from yeast cell-cycle data. Rand index 3, total entropy and total number of over-represented GO terms comparisons between DWCN, K-means, tight clustering and PWK-means by using (Panel B) 27 GO slim terms and (Panel C) 214 GO terms as functional categories correspondingly. Panel D: Same evaluation scheme used as (Panels B and C) for the clusters identified from yeast segregants data using 27 GO slim terms.
Fig. 3.Prediction accuracy percentage distribution for for (a) yeast cell cycle and (b) yeast segregants data with 27 GO slim terms.