| Literature DB >> 29297297 |
Alok Sharma1,2,3, Yosvany López1,4, Tatsuhiko Tsunoda5,6,7.
Abstract
BACKGROUND: Biological data comprises various topologies or a mixture of forms, which makes its analysis extremely complicated. With this data increasing in a daily basis, the design and development of efficient and accurate statistical methods has become absolutely necessary. Specific analyses, such as those related to genome-wide association studies and multi-omics information, are often aimed at clustering sub-conditions of cancers and other diseases. Hierarchical clustering methods, which can be categorized into agglomerative and divisive, have been widely used in such situations. However, unlike agglomerative methods divisive clustering approaches have consistently proved to be computationally expensive.Entities:
Keywords: Divisive approach; Hierarchical clustering; Maximum likelihood
Mesh:
Year: 2017 PMID: 29297297 PMCID: PMC5751574 DOI: 10.1186/s12859-017-1965-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1An illustration of the DRAGON method. This procedure results in the formation of one cluster. One sample is removed at a time, which maximally increases the likelihood function. At the beginning, the likelihood was L 1 and after two iterations the likelihood became L 3, where L 3 > L 2 > L 1
DRAGON Method
| 1. Given a sample set |
Fig. 2Average clustering accuracy (over 20 attempts) on synthetic data with four clusters for (a) DRAGON and various agglomerative hierarchical methods, and (b) divisive hierarchical methods
Clustering accuracy (%) on acute leukemia dataset
| Methods | Dim 2 | Dim 3 | Dim 4 | Dim 5 |
|---|---|---|---|---|
| SLINK | 66.7 | 66.7 | 66.7 | 66.7 |
| CLINK | 84.7 | 81.9 | 81.9 | 81.9 |
| ALINK | 76.4 | 81.9 | 84.7 | 84.7 |
| Wa-LINK |
| 81.9 | 81.9 | 81.9 |
| Wt-LINK |
| 81.9 | 81.9 | 81.9 |
| MLINK |
| 81.9 | 81.9 | 81.9 |
| SLINK (Divisive) | 66.7 | 66.7 | 66.7 | 66.7 |
| CLINK (Divisive) | 80.6 | 80.6 | 80.6 | 80.6 |
| ALINK (Divisive) | 66.7 | 66.7 | 66.7 | 66.7 |
| Dunn’s original (Divisive) | 76.4 | 80.6 | 80.6 | 80.6 |
| Dunn’s variant (Divisive) | 72.2 | 70.8 | 70.8 | 72.2 |
| Macnaughton-Smith (Divisive) | 86.1 | 81.9 | 81.9 | 81.9 |
| Principal Direction (Divisive) | 89.4 | 88.9 | 88.9 | 88.9 |
| K-means | 90.3 | 89.5 | 81.9 | 81.9 |
| DRAGON | 93.1 |
|
|
|
Clustering accuracy (%) on MLL dataset
| Methods | Dim 2 | Dim 3 | Dim 4 | Dim 5 |
|---|---|---|---|---|
| SLINK | 40.3 | 40.3 | 43.1 | 43.1 |
| CLINK | 45.8 | 50.0 | 54.2 | 72.2 |
| ALINK | 50.0 | 50.0 | 50.0 | 72.2 |
| Wa-LINK | 62.5 |
| 62.5 |
|
| Wt-LINK | 45.8 | 50.0 | 43.1 | 69.4 |
| MLINK | 45.8 | 50.0 | 43.1 | 69.4 |
| SLINK (Divisive) | 41.7 | 41.7 | 43.1 | 43.1 |
| CLINK (Divisive) | 54.2 | 45.8 | 56.9 | 72.2 |
| ALINK (Divisive) | 41.7 | 41.7 | 43.1 | 72.2 |
| Dunn’s original (Divisive) | 44.4 | 44.4 | 45.8 | 72.2 |
| Dunn’s variant (Divisive) | 41.7 | 41.7 | 43.1 | 73.6 |
| Macnaughton-Smith (Divisive) | 54.2 | 48.6 | 50.0 | 72.2 |
| Principal Direction (Divisive) | 62.5 |
| 62.5 | 81.9 |
| K-means | 56.0 | 57.0 | 58.1 | 61.6 |
| DRAGON |
|
|
|
|
Clustering accuracy (%) on mutation dataset
| Methods | Dim 2 | Dim 3 | Dim 4 | Dim 5 |
|---|---|---|---|---|
| SLINK |
| 77.9 | 77.9 | 82.4 |
| CLINK | 77.3 | 82.0 |
|
|
| ALINK | 77.3 | 77.3 |
| 82.4 |
| Wa-LINK | 77.3 | 77.7 | 77.9 | 77.9 |
| Wt-LINK | 77.3 | 82.0 |
| 54.6 |
| MLINK | 54.6 | 54.6 | 54.6 | 82.4 |
| SLINK (Divisive) | 54.5 | 54.5 | 54.5 | 82.2 |
| CLINK (Divisive) | 54.4 | 54.5 | 77.3 | 77.3 |
| ALINK (Divisive) |
| 82.6 | 82.6 | 82.4 |
| Dunn’s original (Divisive) | 77.3 |
| 77.3 | 82.4 |
| Dunn’s variant (Divisive) | 54.5 | 54.5 | 54.5 | 77.3 |
| Macnaughton-Smith (Divisive) |
|
|
| 82.4 |
| Principal Direction (Divisive) | 54.5 | 54.5 | 54.5 | 54.5 |
| K-means | 64.9 | 67.1 | 65.7 | 63.3 |
| DRAGON |
| 82.2 | 82.2 | 82.2 |