| Literature DB >> 24564944 |
Abstract
A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.Entities:
Year: 2013 PMID: 24564944 PMCID: PMC4043987 DOI: 10.1186/1753-6561-7-S7-S5
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Summary of the Microarray Datasets
| Colon | Lung | DLBCL | Myeloma | |
|---|---|---|---|---|
| # Samples | 60(40/22) | 86(24/62) | 77(58/19) | 173(137/36) |
| # Genes | 2000 | 5469 | 5469 | 12558 |
Average AUC of 10 different feature selection algorithms on 4 different microarray datasets
| Colon | Lung | DLBCL | Myeloma | Average | ||
|---|---|---|---|---|---|---|
| m = 20 | PC | .775 ± .159 | .657 ± .184 | .945 ± .051 | .689 ± .094 | .767 |
| ChiSquare | .763 ± .189 | .573 ± .146 | .945 ± .043 | .639 ± .121 | .730 | |
| GINI | .760 ± .217 | .590 ± .170 | .948 ± .054 | .653 ± .096 | .738 | |
| InfoGain | .758 ± .197 | .546 ± .160 | .948 ± .054 | .639 ± .111 | .723 | |
| KW | .735 ± .145 | .548 ± .165 | .858 ± .099 | .582 ± .112 | .681 | |
| Relief | .775 ± .149 | .949 ± .043 | .671 ± .104 | |||
| mRMR | .785 ± .163 | .556 ± .164 | .938 ± .074 | .649 ± .126 | .732 | |
| SASMIF | .710 ± .168 | .560 ± .145 | .931 ± .052 | .612 ± .076 | .703 | |
| QSFS | .793 ± .129 | .579 ± .186 | .942 ± .043 | .763 | ||
| ST-BIP | .612 ± .108 | .701 ± .048 | ||||
| m = 50 | PC | .763 ± .170 | .648 ± .184 | .958 ± .025 | .709 ± .071 | .770 |
| ChiSquare | .740 ± .189 | .600 ± .173 | .965 ± .035 | .676 ± .076 | .745 | |
| GINI | .742 ± .183 | .586 ± .167 | .966 ± .034 | .666 ± .096 | .740 | |
| InfoGain | .755 ± .179 | .595 ± .170 | .963 ± .026 | .682 ± .085 | .749 | |
| KW | .755 ± .187 | .574 ± .163 | .858 ± .128 | .606 ± .072 | .698 | |
| Relief | .785 ± .145 | .966 ± .027 | .677 ± .082 | .772 | ||
| mRMR | .748 ± .182 | .651 ± .219 | .948 ± .067 | .695 ± .093 | .761 | |
| SASMIF | .663 ± .206 | .563 ± .130 | .943 ± .043 | .636 ± .004 | .701 | |
| QSFS | .695 ± .208 | .608 ± .054 | .961 ± .031 | .745 | ||
| ST-BIP | .600 ± .124 | .710 ± .110 | . | |||
| m = 100 | PC | .753 ± .176 | .607 ± .122 | .963 ± .025 | .708 ± .062 | .758 |
| ChiSquare | .745 ± .184 | .631 ± .164 | .966 ± .024 | .688 ± .063 | .758 | |
| GINI | .748 ± .186 | .594 ± .202 | .965 ± .026 | .698 ± .079 | .751 | |
| InfoGain | .750 ± .180 | .631 ± .164 | .967 ± .022 | .690 ± .062 | .760 | |
| KW | .727 ± .188 | .570 ± .206 | .879 ± .113 | .624 ± .071 | .700 | |
| Relief | .773 ± .177 | .631 ± .176 | .958 ± .042 | .708 ± .066 | .768 | |
| mRMR | .758 ± .169 | .608 ± .169 | .966 ± .035 | .690 ± .075 | .756 | |
| SASMIF | .785 ± .131 | .611 ± .213 | .950 ± .035 | .647 ± .072 | .748 | |
| QSFS | .777 ± .173 | .965 ± .025 | .710 ± .073 | .772 | ||
| ST-BIP | .627 ± .180 | |||||
| m = 200 | PC | .760 ± .164 | .632 ± .120 | .973 ± .018 | .704 ± .059 | .767 |
| ChiSquare | .750 ± .165 | .611 ± .198 | .973 ± .030 | .673 ± .072 | .752 | |
| GINI | .753 ± .165 | .617 ± .199 | .974 ± .019 | .690 ± .064 | .759 | |
| InfoGain | .755 ± .165 | .611 ± .198 | .977 ± .017 | .673 ± .072 | .754 | |
| KW | .735 ± .219 | .571 ± .199 | .878 ± .145 | .637 ± .036 | .705 | |
| Relief | .758 ± .162 | .621 ± .157 | .979 ± .025 | .770 | ||
| mRMR | .755 ± .155 | .585 ± .169 | .974 ± .027 | .668 ± .068 | .746 | |
| SASMIF | .820 ± .011 | .590 ± .124 | .954 ± .221 | .644 ± .045 | .752 | |
| QSFS | .765 ± .171 | .974 ± .025 | .687 ± .052 | .773 | ||
| ST-BIP | .634 ± .156 | .706 ± .106 | ||||
| m = 1000 | PC | .740 ± .172 | .633 ± .193 | .979 ± .018 | .700 ± .049 | .763 |
| ChiSquare | .743 ± .174 | .606 ± .121 | .974 ± .028 | .676 ± .060 | .750 | |
| GINI | .735 ± .176 | .974 ± .027 | .679 ± .056 | .758 | ||
| InfoGain | .743 ± .174 | .606 ± .121 | .974 ± .028 | .676 ± .060 | .750 | |
| KW | .722 ± .198 | .568 ± .184 | .941 ± .051 | .652 ± .037 | .721 | |
| Relief | .728 ± .173 | .623 ± .150 | .980 ± .019 | .698 ± .051 | .757 | |
| mRMR | .743 ± .174 | .606 ± .121 | .976 ± .025 | .677 ± .060 | .751 | |
| SASMIF | .763 ± .149 | .587 ± .176 | .952 ± .038 | .669 ± .054 | .743 | |
| QSFS | .745 ± .175 | .624 ± .163 | .980 ± .017 | .690 ± .047 | .760 | |
| ST-BIP | .625 ± .192 | |||||
Multi-Task Microarray Datasets(cancer:normal case)
| Bladder | Lung | Prostate | Breast |
|---|---|---|---|
Figure 1Average AUC score of different feature selection algorithms across different train sizes.
Average AUC of 11 different feature selection algorithms on 8 different microarray datasets
| Blad | Breast | Colon | Lung | Renal | Uterus | Ave | |||
|---|---|---|---|---|---|---|---|---|---|
| PC | .991 | .696 | .816 | .703 | .78 | .603 | .916 | .883 | .799 |
| ChiSquare | .969 | .625 | .749 | .669 | .789 | .636 | .741 | .908 | .761 |
| GINI | .969 | .625 | .749 | .669 | .789 | .636 | .743 | .917 | .762 |
| InfoGain | .969 | .625 | .749 | .669 | .789 | .636 | .741 | .908 | .761 |
| KW | .903 | .621 | .907 | .750 | .876 | .626 | .870 | .913 | .808 |
| Relief | .991 | .729 | .795 | .721 | .796 | .594 | .929 | .888 | .805 |
| mRMR | .969 | .650 | .765 | .682 | .830 | .682 | .786 | .875 | .780 |
| SASMIF | .978 | .739 | .704 | .671 | .823 | .650 | .768 | .854 | .773 |
| QSFS | .991 | .693 | .817 | .700 | .788 | .600 | .916 | .883 | .799 |
| ST-BIP | .991 | .679 | .612 | .910 | .843 | ||||
| MT-BIP | .882 | .754 | .846 | .895 |
Top 10 enriched GO terms based on 100 MT-BIP selected genes
| Enriched GO Term | Hits | Disease Association | |
|---|---|---|---|
| GO:0005856 cytoskeleton | 21 | 7.49e-6 | #BCLPPRU |
| GO:0043232 intracellular non- | 29 | 1.79e-5 | ######## |
| GO:0043228 non-membrane- | 29 | 1.79e-5 | ######## |
| GO:0003779 actin binding | 10 | 5.35e-5 | #BCLPP## |
| GO:0008092 cytoskeletal | 12 | 6.41e-5 | ######## |
| GO:0030054 cell junction | 11 | 2.31e-4 | BBCLPP## |
| GO:0044459 plasma membrane part | 24 | 2.53e-4 | ##C#P### |
| GO:0005886 plasma membrane | 32 | 1.09e-3 | #BCLPPRU |
| GO:0015629 actin cytoskeleton | 7 | 2.22e-3 | BBCLPP## |
| GO:0032403 protein complex | 6 | 3.83e-3 | #BCLPPRU |