| Literature DB >> 26196383 |
Nan Lin1, Junhai Jiang1, Shicheng Guo2, Momiao Xiong1.
Abstract
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.Entities:
Mesh:
Year: 2015 PMID: 26196383 PMCID: PMC4510534 DOI: 10.1371/journal.pone.0132945
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1(a) Original image of one of the 121 histology images of the kidney cancer cells which were downloaded from the TCGA database, (b) reconstruction of the original histology images of kidney cancer cells by using its 133 FPCA scores, (c) reconstruction of the original kidney histology image by using its first 133 Fourier expansion coefficients, (d) reconstruction of the original kidney histology image by using its first 4,357 Fourier expansion coefficients.
Performance of standard and randomized sparse k-means clustering algorithm for FPCA, MPCA, GPCA, SIFT and Fourier expansion.
| Methods | Feature | Ovarian Cancer | KIRC | ||||
|---|---|---|---|---|---|---|---|
| Extraction | Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| Standard | FPCA | 0.570 | 0.660 | 0.400 | 0.809 | 0.917 | 0.612 |
| k-means | MPCA | 0.529 | 0.538 | 0.522 | 0.803 | 0.901 | 0.627 |
| GPCA | 0.522 | 0.519 | 0.529 | 0.787 | 0.901 | 0.582 | |
| SIFT | 0.557 | 0.547 | 0.547 | 0.681 | 0.587 | 0.701 | |
| Fourier | 0.557 | 0.557 | 0.557 | 0.803 | 0.917 | 0.597 | |
| Randomized | FPCA | 0.653 | 0.793 | 0.486 | 0.835 | 0.926 | 0.672 |
| sparse | MPCA | 0.539 | 0.538 | 0.543 | 0.819 | 0.918 | 0.642 |
| k-means | GPCA | 0.527 | 0.538 | 0.507 | 0.803 | 0.918 | 0.597 |
| SIFT | 0.608 | 0.708 | 0.457 | 0.729 | 0.818 | 0.567 | |
| Fourier | 0.608 | 0.679 | 0.500 | 0.814 | 0.884 | 0.687 | |
Performance of standard K-means, sparse K-means and randomized K-mean clustering algorithm using the SIFT descriptor clustering algorithm using the SIFT descriptor.
| Ovarian Cancer | KIRC | |||||||
|---|---|---|---|---|---|---|---|---|
| Features | Accuracy | Sensitivity | Specificity | Features | Accuracy | Sensitivity | Specificity | |
| K-means | 2,560 | 0.547 | 0.547 | 0.547 | 2,560 | 0.681 | 0.587 | 0.701 |
| Sparse K-means | 574 | 0.545 | 0.472 | 0.657 | 597 | 0.585 | 0.62 | 0.522 |
| Randomized K-means | 70 | 0.608 | 0.708 | 0.457 | 100 | 0.729 | 0.818 | 0.567 |
Performance of standard k-means, sparse k-means and randomized sparse k-means clustering algorithms using FPC scores.
| Ovarian Cancer | KIRC | |||||||
|---|---|---|---|---|---|---|---|---|
| Features | Accuracy | Sensitivity | Specificity | Features | Accuracy | Sensitivity | Specificity | |
| K-means | 176 | 0.574 | 0.660 | 0.400 | 188 | 0.809 | 0.917 | 0.612 |
| Sparse K-means | 81 | 0.585 | 0.670 | 0.457 | 92 | 0.819 | 0.819 | 0.642 |
| Randomized sparse K-means | 23 | 0.653 | 0.793 | 0.486 | 5 | 0.835 | 0.926 | 0.672 |
Performance of standard spectral, sparse K-means clustering and sparse spectral with randomized feature selection clustering algorithms with Fourier expansion.
| Ovarian Cancer | KIRC | |||||||
|---|---|---|---|---|---|---|---|---|
| Features | Accuracy | Sensitivity | Specificity | Features | Accuracy | Sensitivity | Specificity | |
| Spectral clustering | 65025 | 0.557 | 0.557 | 0.557 | 65025 | 0.803 | 0.917 | 0.597 |
| Sparse K-means | 959 | 0.545 | 0.500 | 0.614 | 161 | 0.819 | 0.917 | 0.642 |
| Randomized Spectral clustering | 100 | 0.642 | 0.576 | 0.743 | 10 | 0.835 | 0.926 | 0.672 |
Performance of standard k-means, sparse k-means and randomized k-means algorithms for clustering KIRC tumor cell grades.
| TRUE | ||||
|---|---|---|---|---|
| Method | Assigned | Group1 | Group 2 | Group 3 |
| Group 1 | 17 (58.6%) | 15 (53.6%) | 7 (50.0%) | |
| K-means | Group 2 | 12 (41.4%) | 12 (42.9%) | 7 (50.0%) |
| Group 3 | 0 | 1 (3.4%) | 0 | |
| Accuracy | 40.80% | |||
| Group 1 | 10 (34.5%) | 6 (21.4%) | 3 (21.4%) | |
| Group 2 | 13 (44.8%) | 17 (60.7%) | 7(50.0%) | |
| Sparse K-means | Group 3 | 6 (20.7%) | 5 (17.9%) | 4 (28.6%) |
| Accuracy | 43.70% | |||
| Group 1 | 14 (48.3%) | 4 (14.3%) | 2 (14.3%) | |
| Randomized sparse K-means | Group 2 | 8 (27.6%) | 20 (71.4%) | 8 (57.1%) |
| Group 3 | 7 (24.1%) | 4 (14.3%) | 4 (28.6%) | |
| Accuracy | 53.50% |
Fig 2Historic pathology images.
(a) Pathology grades 1 and 2, (b) pathology grade 3 and (c) pathology grade.
Percentage of the simulations sharing the same FPC features in KIRC study.
| Number of Features | 2 | 1 | 2 | 1 | 1 | 1 | 1 |
|---|---|---|---|---|---|---|---|
| Percentage of simulation sharing same features | 100% | 96% | 94% | 7% | 5% | 3% | 1% |
Stability of the estimated accuracy using the randomized sparse k-means clustering and FPC in KIRC study.
| Percentage | Accuracy | Sensitivity | Specificity |
|---|---|---|---|
| 93% | 0.835 | 0.926 | 0.672 |
| 6% | 0.824 | 0.909 | 0.672 |
| 1% | 0.819 | 0.917 | 0.642 |