| Literature DB >> 33286064 |
Ruzhang Zhao1, Pengyu Hong2, Jun S Liu3.
Abstract
Traditional hypothesis-margin researches focus on obtaining large margins and feature selection. In this work, we show that the robustness of margins is also critical and can be measured using entropy. In addition, our approach provides clear mathematical formulations and explanations to uncover feature interactions, which is often lack in large hypothesis-margin based approaches. We design an algorithm, termed IMMIGRATE (Iterative max-min entropy margin-maximization with interaction terms), for training the weights associated with the interaction terms. IMMIGRATE simultaneously utilizes both local and global information and can be used as a base learner in Boosting. We evaluate IMMIGRATE in a wide range of tasks, in which it demonstrates exceptional robustness and achieves the state-of-the-art results with high interpretability.Entities:
Keywords: IMMIGRATE; entropy; feature selection; hypothesis-margin
Year: 2020 PMID: 33286064 PMCID: PMC7516747 DOI: 10.3390/e22030291
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Flow chart of IMMIGRATE. Step 0: Initialize randomly, under the constraints , and ). Step 1: Fix , update and . Step 2: Fix and , update . Steps 1 and 2 are iterated to optimize the cost function, where is the change of the cost function in (5) and is a pre-set limit.
Figure 2The synthesized dataset with 10% noise.
Figure 3IMMIGRATE (IGT) is more robust than LFE.
Summary of the UCI datasets and the gene expression datasets.
| Data | # of Features | # of Instances | Full Name |
|---|---|---|---|
| BCW | 9 | 116 | Breast Cancer Wisconsin (Prognostic) |
| CRY | 6 | 90 | Cryotherapy |
| CUS | 7 | 440 | Wholesale customers |
| ECO | 5 | 220 | Ecoli |
| GLA | 9 | 146 | Glass Identification |
| HMS | 3 | 306 | Haberman’s Survival |
| IMM | 7 | 90 | Immunotherapy |
| ION | 32 | 351 | Ionosphere |
| LYM | 16 | 142 | Lymphograph |
| MON | 6 | 432 | MONK’s Problems |
| PAR | 22 | 194 | Parkinsons |
| PID | 8 | 768 | Pima-Indians-Diabetes |
| SMR | 60 | 208 | Connectionist Bench (Sonar, Mines vs. Rocks) |
| STA | 12 | 256 | Statlog (Heart) |
| URB | 147 | 238 | Urban Land Cover |
| USE | 5 | 251 | User Knowledge Modeling |
| WIN | 13 | 130 | Wine |
| CRO * | 28 | 9003 | Crowdsourced Mapping |
| ELE * | 12 | 10,000 | Electrical Grid Stability Simulated |
| WAV * | 21 | 3304 | Waveform Database Generator |
| GLI | 22,283 | 85 | Gliomas Strongly Predicts Survival [ |
| COL | 2000 | 62 | Tumor and Normal Colon Tissues [ |
| ELO | 12,625 | 173 | Myeloma [ |
| BRE | 24,481 | 78 | Breast Cancer [ |
| PRO | 12,600 | 136 | Clinical Prostate Cancer Behavior [ |
* Large-scale datasets.
Summarizes the accuracies on five high-dimensional gene expression datasets .
| Data | SV1 | SV2 | LAS | DT | NBC | 1NN | 3NN | SOD | RF | XGB | IM4 | EGT | B4G |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GLI | 85.1 | 86.0 | 85.2 | 83.8 | 83.0 | 88.7 | 87.7 | 88.7 | 87.6 | 86.3 | 87.5 | 89.1 | 89.9 |
| COL | 73.7 | 82.0 | 80.6 | 69.2 | 71.1 | 72.1 | 77.9 | 78.1 | 82.6 | 79.5 | 84.3 | 78.6 | 82.5 |
| ELO | 72.9 | 90.2 | 74.6 | 77.3 | 76.3 | 85.6 | 91.3 | 86.9 | 79.2 | 77.9 | 88.9 | 88.6 | 88.4 |
| BRE | 76.0 | 88.7 | 91.4 | 76.4 | 69.4 | 83.0 | 73.6 | 82.6 | 86.3 | 87.3 | 88.1 | 90.2 | 91.5 |
| PRO | 71.3 | 69.9 | 87.9 | 86.4 | 68.0 | 83.2 | 82.7 | 83.2 | 91.8 | 90.5 | 88.0 | 89.5 | 89.7 |
| W,T,L | 5,0,0 | 4,0,1 | 4,1,0 | 5,0,0 | 5,0,0 | 5,0,0 | 4,0,1 | 5,0,0 | 3,1,1 | 4,0,1 | 3,1,1 | -,-,- | -,-,- |
Ten-fold cross-validation is performed for ten times, namely 100 trials are carried out for each dataset. The average accuracy is reported for each dataset in Table 1, Table 2 and Table 3. The paired Student’s t-test is carried out to compare the results of the Boosted IM4E-IMMIGRATE (B4G) versus those of any other given algorithm. Under the significance level of , an algorithm is significantly better than another one (i.e., the first algorithm wins) on a dataset if the p-value of the paired Student’s t-test is less than . The same rule is applied to the results reported in Table 2 and Table 3. The last row shows the number of times the Boosted IM4E-IMMIGRATE(B4G) W,T,L (win,tie,loss) compared with each algorithm in the table using the paired t-test.
Figure 4Results of paired t-test on gene expression datasets (top subplot) and UCI datasets (bottom subplot). The top plot shows how well (i.e., “Win” (red bars), “Tie” (green bars), and “Lose” (blue bars)) our Boosted IM4E-IMMIGRATE performs compared with other approaches. In the bottom plot, the results of methods labeled in black are the comparisons with our IMMIGRATE, and the results of methods (ABD, RF, and XGB) labeled in blue are the comparisons with our BIM.
Summarizes the accuracies on UCI datasets.
| Data | SV1 | SV2 | LAS | DT | NBC | RBF | 1NN | 3NN | LMN | REL | RFF | SIM | LFE | LDA | SOD | hIN | IM4 | IGT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BCW | 61.4 | 66.6 | 71.4 | 70.5 | 62.4 | 56.9 | 68.2 | 72.2 | 69.5 | 66.4 | 67.1 | 67.7 | 67.1 | 73.9 | 65.2 | 71.8 | 66.4 | 74.5 |
| CRY | 72.9 | 90.6 | 87.4 | 85.3 | 84.4 | 89.7 | 89.1 | 85.4 | 87.8 | 73.8 | 77.2 | 79.7 | 86.0 | 88.6 | 86.0 | 87.9 | 86.2 | 89.8 |
| CUS | 86.5 | 88.9 | 89.6 | 89.6 | 89.5 | 86.8 | 86.5 | 88.7 | 88.8 | 82.1 | 84.7 | 84.3 | 86.4 | 90.3 | 90.8 | 90.3 | 87.5 | 90.1 |
| ECO | 92.9 | 96.9 | 98.6 | 98.6 | 97.8 | 94.6 | 96.0 | 97.8 | 97.8 | 89.0 | 90.7 | 91.2 | 93.1 | 99.0 | 97.9 | 98.7 | 97.5 | 98.2 |
| GLA | 64.2 | 76.7 | 72.3 | 79.4 | 69.5 | 73.0 | 81.1 | 78.1 | 79.4 | 64.1 | 63.5 | 67.1 | 81.2 | 72.0 | 75.3 | 75.0 | 78.0 | 87.5 |
| HMS | 63.8 | 64.5 | 67.7 | 72.5 | 67.2 | 66.8 | 66.0 | 69.3 | 71.2 | 65.3 | 66.0 | 65.7 | 64.9 | 69.0 | 67.4 | 69.4 | 66.6 | 69.2 |
| IMM | 74.3 | 70.6 | 74.4 | 84.1 | 77.9 | 67.3 | 69.4 | 77.9 | 76.7 | 69.9 | 71.8 | 69.0 | 75.0 | 75.2 | 72.3 | 70.2 | 80.7 | 83.8 |
| ION | 80.5 | 93.5 | 83.6 | 87.4 | 89.4 | 79.9 | 86.7 | 84.1 | 84.5 | 85.8 | 86.2 | 84.2 | 91.0 | 83.3 | 90.3 | 92.6 | 88.3 | 92.9 |
| LYM | 83.6 | 81.5 | 85.2 | 75.2 | 83.6 | 71.1 | 77.2 | 82.8 | 86.6 | 64.9 | 71.0 | 70.4 | 79.6 | 85.2 | 79.3 | 84.8 | 83.3 | 87.2 |
| MON | 74.4 | 91.7 | 75.0 | 86.4 | 74.0 | 68.2 | 75.1 | 84.4 | 84.9 | 61.4 | 61.8 | 65.0 | 64.8 | 74.4 | 91.9 | 97.2 | 75.6 | 99.5 |
| PAR | 72.7 | 72.5 | 77.1 | 84.8 | 74.1 | 71.5 | 94.6 | 91.4 | 91.8 | 87.3 | 90.3 | 84.6 | 94.0 | 85.6 | 88.2 | 89.5 | 83.2 | 93.8 |
| PID | 65.6 | 73.1 | 74.7 | 74.3 | 71.2 | 70.3 | 70.3 | 73.5 | 74.0 | 64.8 | 68.0 | 67.0 | 67.8 | 74.5 | 75.7 | 74.1 | 72.1 | 74.7 |
| SMR | 73.5 | 83.9 | 73.6 | 72.3 | 70.3 | 67.1 | 86.9 | 84.7 | 86.1 | 69.5 | 78.3 | 81.0 | 84.3 | 73.1 | 70.5 | 83.0 | 76.4 | 86.5 |
| STA | 69.8 | 71.6 | 70.8 | 68.9 | 71.0 | 69.5 | 67.8 | 70.8 | 71.3 | 59.7 | 64.0 | 63.0 | 66.7 | 71.3 | 71.8 | 69.2 | 70.8 | 75.9 |
| URB | 85.2 | 87.9 | 88.1 | 82.6 | 85.8 | 75.3 | 87.2 | 87.5 | 87.9 | 81.9 | 83.2 | 73.0 | 87.9 | 73.0 | 87.9 | 88.3 | 87.4 | 89.9 |
| USE | 95.7 | 95.2 | 97.2 | 93.2 | 90.6 | 84.9 | 90.5 | 91.5 | 92.0 | 54.5 | 63.7 | 69.5 | 85.8 | 96.9 | 96.2 | 96.5 | 94.1 | 96.4 |
| WIN | 98.3 | 99.3 | 98.6 | 93.1 | 97.3 | 97.2 | 96.4 | 96.6 | 96.5 | 87.2 | 95.0 | 95.0 | 93.8 | 99.7 | 92.9 | 98.9 | 98.2 | 99.0 |
| CRO * | 75.4 | 97.5 | 89.9 | 91.0 | 88.8 | 75.4 | 98.4 | 98.5 | 98.6 | 98.5 | 98.7 | 95.1 | 98.6 | 89.1 | 95.2 | 95.5 | 81.9 | 98.2 |
| ELE * | 72.3 | 95.7 | 79.9 | 80.0 | 82.5 | 70.8 | 81.1 | 83.9 | 89.7 | 64.6 | 75.4 | 76.2 | 79.8 | 79.9 | 93.7 | 93.6 | 83.2 | 93.7 |
| WAV * | 90.0 | 91.9 | 92.2 | 86.2 | 91.4 | 84.0 | 86.5 | 88.3 | 88.8 | 77.6 | 80.0 | 83.6 | 84.7 | 91.8 | 92.0 | 92.1 | 91.1 | 92.4 |
| W,T,L | 20,0,0 | 16,2,2 | 15,4,1 | 16,3,1 | 19,1,0 | 20,0,0 | 17,2,1 | 18,2,0 | 16,3,1 | 19,1,0 | 19,1,0 | 19,1,0 | 18,2,0 | 15,4,1 | 13,4,3 | 12,7,1 | 19,0,1 | -,-,- |
* Large-scale datasets. The last row (W,T,L) shows the number of times that IMMIGRATE (IGT) wins/ties/losses the corresponding algorithm based on the paired t-test on the cross-validation results.
Summarizes the accuracies on the UCI datasets.
| Data | ADB | RF | XGB | BIM |
|---|---|---|---|---|
| BCW | 78.2 | 78.6 | 78.6 | 78.3 |
| CRY | 90.4 | 92.9 | 89.9 | 91.5 |
| CUS | 90.8 | 91.1 | 91.4 | 91.0 |
| ECO | 98.0 | 98.9 | 98.2 | 98.6 |
| GLA | 85.0 | 87.0 | 87.9 | 86.8 |
| HMS | 65.8 | 72.1 | 70.0 | 72.0 |
| IMM | 77.2 | 84.2 | 81.7 | 86.1 |
| ION | 92.1 | 93.5 | 92.5 | 93.1 |
| LYM | 84.8 | 87.0 | 87.4 | 88.1 |
| MON | 98.4 | 95.8 | 99.1 | 99.7 |
| PAR | 90.5 | 91.0 | 91.9 | 93.2 |
| PID | 73.5 | 76.0 | 75.1 | 76.2 |
| SMR | 81.4 | 82.8 | 83.3 | 86.6 |
| STA | 69.0 | 71.3 | 69.5 | 74.1 |
| URB | 87.9 | 88.6 | 88.8 | 91.4 |
| USE | 96.0 | 95.3 | 94.9 | 96.1 |
| WIN | 97.5 | 99.1 | 98.2 | 99.1 |
| CRO * | 97.3 | 97.4 | 98.5 | 98.6 |
| ELE * | 91.1 | 92.3 | 95.2 | 94.1 |
| WAV * | 89.5 | 91.2 | 90.8 | 93.3 |
| W,T,L | 17,3,0 | 11,8,1 | 14,4,2 | -,-,- |
* Large-scale datasets. The last row (W,T,L) shows the number of times that the Boosted IMMIGRATE (BIM) wins/ties/losses a corresponding algorithm based on the paired t-test on the cross-validation results.
Figure A1Heat Maps of Feature Weights Learned by IMMIGRATE. The color bars show the values of corresponding colors in the plots.