| Literature DB >> 26137474 |
Abstract
The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.Entities:
Mesh:
Year: 2015 PMID: 26137474 PMCID: PMC4468284 DOI: 10.1155/2015/320385
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Leave-one-out cross validation performance of various classification methods for the data of Section 4.1. MRMR is used in version (1) or (20) to find 10 variables, while the optimal γ over all γ ≥ 0 is used. Sensitivity (SE) and specificity (SP) are given for selected fixed values of γ.
| Dimensionality reduction | Classif. method | Classif. accuracy | ||||||||||
|
| ||||||||||||
| MRMR variable selection | ||||||||||||
| Measure of | MRMR criterion | Parameter | ||||||||||
| relev. | redund. | 0 | 0.1 | 0.2 | 0.3 | 0.5 | 0.7 | 0.9 | ||||
|
| ||||||||||||
| Mutual info. | Mutual info. | ( | LDA | 0.92 | SE | 0.75 | 0.83 | 0.92 | 0.88 | 0.96 | 0.96 | 0.96 |
| SP | 0.67 | 0.92 | 0.88 | 0.92 | 0.96 | 0.92 | 0.92 | |||||
|
| ||||||||||||
| | | | | ( | LDA | 1.00 | SE | 0.92 | 0.92 | 0.83 | 0.88 | 0.96 | 0.96 | 0.96 |
| SP | 0.88 | 0.96 | 0.96 | 0.96 | 0.96 | 1.00 | 1.00 | |||||
|
| ||||||||||||
| | | | | ( | LDA | 0.96 | SE | 0.83 | 0.83 | 0.96 | 0.83 | 0.92 | 0.96 | 0.96 |
| SP | 0.88 | 0.88 | 0.83 | 0.96 | 1.00 | 0.96 | 1.00 | |||||
|
| ||||||||||||
| | | K-S | ( | LDA | 0.82 | SE | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 0.88 | 0.88 |
| SP | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.96 | 0.96 | |||||
|
| ||||||||||||
| | | Sign test | ( | LDA | 0.82 | SE | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 0.88 | 0.88 |
| SP | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.96 | 0.96 | |||||
|
| ||||||||||||
| | |
| ( | LDA | 1.00 | SE | 0.92 | 0.92 | 0.88 | 0.88 | 0.92 | 0.96 | 1.00 |
| SP | 0.88 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 1.00 | |||||
|
| ||||||||||||
| | |
| ( | LDA | 1.00 | SE | 0.92 | 0.92 | 0.96 | 0.96 | 0.96 | 0.96 | 1.00 |
| SP | 0.88 | 0.88 | 0.88 | 0.88 | 0.92 | 0.96 | 1.00 | |||||
|
| ||||||||||||
| | |
| ( | LDA | 1.00 | SE | 0.92 | 0.92 | 0.96 | 0.96 | 0.96 | 0.96 | 1.00 |
| SP | 0.88 | 0.88 | 0.92 | 0.92 | 0.92 | 0.96 | 1.00 | |||||
|
| ||||||||||||
| | |
| ( | LDA | 1.00 | SE | 0.92 | 0.92 | 0.96 | 0.96 | 0.96 | 0.96 | 1.00 |
| SP | 0.88 | 0.88 | 0.92 | 0.92 | 0.96 | 0.96 | 1.00 | |||||
Leave-one-out cross validation performance evaluated by classification accuracy for the data of Sections 4.1, 4.2, and 4.3. MRRMRR uses |r LWS | as the relevance measure and as the redundancy measure.
| Dimensionality reduction | Classification method | Classification accuracy | ||
|---|---|---|---|---|
|
|
|
| ||
| — | SVM | 1.00 | 1.00 | 0.93 |
| — | Classification tree | 0.94 | 0.97 | 0.55 |
| — | LDA | Infeasible | Infeasible | Infeasible |
| — | PAM | 0.85 | 0.98 | 0.75 |
| — | SCRDA | 1.00 | 1.00 | 0.79 |
|
| ||||
| Number of principal components | 10 | 20 | 4 | |
|
| ||||
| PCA | SVM | 0.75 | 1.00 | 0.90 |
| PCA | Clas. tree | 0.72 | 0.97 | 0.59 |
| PCA | LDA | 0.57 | 0.90 | 0.79 |
| PCA | PAM | 0.64 | 0.81 | 0.77 |
| PCA | SCRDA | 0.71 | 0.92 | 0.79 |
|
| ||||
| Number of variables for MRRMRR | 10 | 20 | 4 | |
|
| ||||
| MRRMRR | SVM | 1.00 | 1.00 | 0.93 |
| MRRMRR | Clas. tree | 0.76 | 0.97 | 0.55 |
| MRRMRR | LDA | 0.95 | 1.00 | 0.79 |
| MRRMRR | PAM | 0.82 | 0.97 | 0.75 |
| MRRMRR | SCRDA | 1.00 | 1.00 | 0.79 |
Leave-one-out cross validation performance evaluated by classification accuracy for the data of Section 4.1 contaminated by noise of three different types. MRMR is used in version (1) or (20) in the same way as in Table 1 to find 10 variables, while the optimal γ over all γ ≥ 0 is used.
| Dimensionality reduction | Classif. method | Noise 1 (normal) | Noise 2 (contam. normal) | Noise 3 (Cauchy) | |
|---|---|---|---|---|---|
| MRMR variable selection | |||||
| Measure of | Classification accuracy | ||||
| relev. | redund. | ||||
|
| |||||
| Mutual info. | Mutual info. | LDA | 0.79 | 0.88 | 0.92 |
| | | | | LDA | 0.92 | 0.85 | 0.96 |
| | | | | LDA | 0.92 | 0.92 | 0.96 |
| | | K-S | LDA | 0.92 | 0.83 | 0.89 |
| | | Sign test | LDA | 0.84 | 0.91 | 0.87 |
| | |
| LDA | 0.90 | 0.86 | 0.94 |
| | |
| LDA | 1.00 | 1.00 | 0.98 |
| | |
| LDA | 1.00 | 1.00 | 0.98 |
| | |
| LDA | 1.00 | 1.00 | 1.00 |
|
| |||||
| Unsupervised dimensionality reduction | |||||
| PCA (with 10 princ. components) | LDA | 0.79 | 0.74 | 0.78 | |
|
| |||||
| No dimensionality reduction | |||||
| — | LDA | Infeasible | Infeasible | Infeasible | |
| — | PAM | 0.79 | 0.73 | 0.79 | |
| — | SCRDA | 1.00 | 1.00 | 1.00 | |
| — | lasso-LR | 1.00 | 1.00 | 1.00 | |
| — | SVM | 1.00 | 1.00 | 1.00 | |