| Literature DB >> 17912341 |
Alan R Dabney1, John D Storey.
Abstract
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.Entities:
Mesh:
Year: 2007 PMID: 17912341 PMCID: PMC1991588 DOI: 10.1371/journal.pone.0001002
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Class means with 10 features and 3 classes.
| Feature | μ1 | μ2 | μ2 | Score |
| 1 | 3.00 | 0.00 | 0.00 | 2.00 |
| 2 | 2.00 | 0.00 | 0.00 | 0.89 |
| 3 | 1.50 | 0.00 | 0.00 | 0.50 |
| 4 | 1.25 | 0.00 | 0.00 | 0.35 |
| 5 | 0.00 | 1.10 | 0.00 | 0.27 |
| 6 | 0.00 | 1.00 | 0.00 | 0.22 |
| 7 | 0.00 | 0.90 | 0.00 | 0.18 |
| 8 | 0.00 | 0.00 | 0.85 | 0.16 |
| 9 | 0.00 | 0.00 | 0.75 | 0.12 |
| 10 | 0.00 | 0.00 | 0.65 | 0.09 |
The effect of covariance on the optimal feature-selection procedure.
| Error | ||||
| Covariance | Selected Features |
|
| Rank |
| None | 1, 5, 6, 7, 8 | 13.1% | 13.1% | 1 |
| Block 1 | 1, 5, 6, 7, 8 | 13.1% | 13.1% | 1 |
| Block 2 | 1, 5, 8, 9, 10 | 14.9% | 18.2% | 64 |
| Block 3 | 1, 5, 6, 7, 8 | 13.1% | 13.1% | 1 |
| Block 1∼2 | 1, 4, 5, 8, 9 | 6.1% | 14.0% | 143 |
| Block 1∼3 | 1, 5, 6, 7, 8 | 11.6% | 11.6% | 1 |
| Block 2∼3 | 1, 2, 3, 7, 8 | 2.2% | 3.5% | 18 |
Test error rates (standard errors): Classes equidistant.
| Absolute Value of Correlation | ||||||||||
| Algorithm | Centroids | Covariance | 0.00 | 0.40 | 0.65 | 0.90 | ||||
| PAM | shrunken | diagonal | 0.15 | (0.07) | 0.15 | (0.06) | 0.18 | (0.07) | 0.29 | (0.08) |
| Clanc | unshrunken | unrestricted | 0.30 | (0.08) | 0.28 | (0.08) | 0.28 | (0.07) | 0.11 | (0.07) |
| Clanc | shrunken | unrestricted | 0.27 | (0.09) | 0.27 | (0.08) | 0.26 | (0.10) | 0.09 | (0.07) |
| Clanc | unshrunk. | diagonal | 0.07 | (0.04) | 0.08 | (0.04) | 0.10 | (0.04) | 0.19 | (0.07) |
| Clanc | shrunken | diagonal | 0.06 | (0.04) | 0.08 | (0.04) | 0.10 | (0.05) | 0.19 | (0.07) |
| Clanc | unshrunk. | shrunken | 0.30 | (0.08) | 0.28 | (0.08) | 0.28 | (0.07) | 0.06 | (0.04) |
| Clanc | shrunken | shrunken | 0.27 | (0.09) | 0.27 | (0.08) | 0.26 | (0.10) | 0.06 | (0.04) |
Shrinkage in this case takes place across classes rather than across features.
Test error rates (standard errors): Classes not equidistant.
| Absolute Value of Correlation | ||||||||||
| Algorithm | Centroids | Covariance | 0.00 | 0.40 | 0.65 | 0.90 | ||||
| PAM | shrunken | diagonal | 0.33 | (0.02) | 0.33 | (0.02) | 0.34 | (0.03) | 0.37 | (0.03) |
| Clanc | unshrunken | unrestricted | 0.25 | (0.07) | 0.26 | (0.08) | 0.25 | (0.08) | 0.12 | (0.08) |
| Clanc | shrunken | unrestricted | 0.25 | (0.08) | 0.28 | (0.09) | 0.22 | (0.07) | 0.13 | (0.09) |
| Clanc | unshrunk. | diagonal | 0.04 | (0.03) | 0.04 | (0.03) | 0.06 | (0.04) | 0.14 | (0.06) |
| Clanc | shrunken | diagonal | 0.03 | (0.03) | 0.03 | (0.03) | 0.06 | (0.03) | 0.13 | (0.06) |
| Clanc | unshrunk. | shrunken | 0.25 | (0.07) | 0.26 | (0.08) | 0.25 | (0.08) | 0.12 | (0.08) |
| Clanc | shrunken | shrunken | 0.25 | (0.08) | 0.28 | (0.09) | 0.22 | (0.07) | 0.13 | (0.09) |
Shrinkage in this case takes place across classes rather than across features.
Figure 1Results for SRBCT data.
Classifiers are identical to those in Tables 3 and 4, with Clanc v1-v4 corresponding to the last four variants reported there, respectively.
Figure 2Results for Lymphoma data.
Classifiers are identical to those in Tables 3 and 4, with Clanc v1-v4 corresponding to the last four variants reported there, respectively.
Figure 3Results for NCI data.
Classifiers are identical to those in Tables 3 and 4, with Clanc v1-v4 corresponding to the last four variants reported there, respectively.