| Literature DB >> 18801195 |
Yitan Zhu1, Huai Li, David J Miller, Zuyi Wang, Jianhua Xuan, Robert Clarke, Eric P Hoffman, Yue Wang.
Abstract
BACKGROUND: The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables.Entities:
Mesh:
Year: 2008 PMID: 18801195 PMCID: PMC2566986 DOI: 10.1186/1471-2105-9-383
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1VISDA's flowchart.
Figure 2The flowchart including the algorithm extension for phenotype clustering. The green blocks with dashed borders indicate the algorithm extensions, i.e. the modified visualization scheme and decomposition scheme.
Figure 3An illustration of VISDA on sample clustering. (a) The five different projections obtained at the top level. Red circles are brain cancer; green triangles are colon cancer; blue squares are lung cancer; and brown diamonds are ovary cancer. (b) The user's initialization of cluster means (indicated by the numbers in the small circles) and the resulted clusters (indicated by the green dashed ellipses). The left, middle, and right figures are for the models of one cluster, two clusters, and three clusters, respectively. (c) The hierarchical data structure detected by VISDA. Sub-Cluster Number (CN) and corresponding Description Length (DL) are shown under the visualization.
Comparison of clustering performance
| VISDA | HC | KMC | SOM (MSC) | SOM (CLL) | SFNM Fitting | |
| Average mean of partition accuracy | 58.89% | 76.47% | 76.52% | 79.39% | 64.47% | |
| Average standard deviation of partition accuracy | 4.01% | 5.03% | 3.92% | 4.73% | 5.07% |
The bolded font indicates the best performance respective to a particular measure.
Figure 4Analysis results of the detected gene cluster. (a) Top scoring gene regulation network indicated by the gene cluster. Grey colour indicates that the gene is in the detected gene cluster. Solid lines indicate direct interactions. Dashed lines indicate indirect interactions. (b) The negative log p-values of the most significant functional categories associated with the gene cluster. These two figures are from the IPA system.
Figure 5The TOP found by VISDA on the muscular dystrophy dataset. Rectangles contain individual phenotypes. Ellipses contain a group of phenotypes.
Figure 6Comparison between the most frequent TOP and the pathological relationships among the cancer classes. (a) Published developmental/morphological relationships among the cancer classes. (b) The most frequent TOP constructed by VISDA. Rectangles contain one cancer type. Ellipses contain a group of cancers.