| Literature DB >> 28584451 |
Michael Sekula1, Somnath Datta2, Susmita Datta2.
Abstract
There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a "best" option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data. AVAILABILITY: This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/.Entities:
Keywords: Clustering; Gene Expression; RNA-Seq; Validation
Year: 2017 PMID: 28584451 PMCID: PMC5450252 DOI: 10.6026/97320630013101
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Clustering algorithm and validation measure options offered by the optCluster package. Clustering algorithms are selected individually and divided into two categories: continuous data and count data. Validation measures are selected in groups and divided into three classifications: internal, stability, and biological.
| Clustering algorithms | |
| Continuous data | Hierarchical |
| Agnes | |
| Diana | |
| K-means | |
| Pam | |
| Clara | |
| Fanny | |
| Model-based | |
| SOM | |
| SOTA | |
| Count data | EM negative binomial |
| DA negative binomial | |
| SA negative binomial | |
| EM Poisson | |
| DA Poisson | |
| SA Poisson | |
| Validation measures | |
| Internal | Connectivity |
| Dunn index | |
| Silhouette width | |
| Stability | Average proportion of non-overlap |
| Average distance | |
| Average distance between means | |
| Figure of merit | |
| Biological | Biological homogeneity index |
| Biological stability index | |