| Literature DB >> 28567418 |
Qiang Su1, Yina Wang2, Xiaobing Jiang3, Fuxue Chen4, Wen-Cong Lu5.
Abstract
BACKGROUND: To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test.Entities:
Mesh:
Year: 2017 PMID: 28567418 PMCID: PMC5439177 DOI: 10.1155/2017/1645619
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Dataset.
| Dataset | Samples | Genes |
|---|---|---|
| Breast cancer | 97 | 24481 |
| Lung cancer | 181 | 12533 |
| Colon tumor | 62 | 2000 |
| Ovarian cancer | 253 | 15154 |
| Leukemia | 72 | 7129 |
The number of genes selected by K-S test, Wilcoxon test, and T test in five datasets with different alpha.
| Dataset | Algorithm | Alpha = 1 | Alpha = 0.05 | Alpha = 0.01 | Alpha = 0.005 | Alpha = 0.001 |
|---|---|---|---|---|---|---|
| Breast cancer | K-S | 24481 | 3502 | 1397 | 940 | 349 |
| Wilcoxon | 24481 | 3829 | 1529 | 1029 | 381 | |
|
| 24481 | 3251 | 1161 | 726 | 273 | |
| Lung cancer | K-S | 12533 | 2886 | 1982 | 1588 | 1300 |
| Wilcoxon | 12533 | 3225 | 2658 | 1986 | 1528 | |
|
| 12533 | 3190 | 2580 | 1996 | 1625 | |
| Colon tumor | K-S | 2000 | 324 | 146 | 105 | 44 |
| Wilcoxon | 2000 | 387 | 188 | 140 | 59 | |
|
| 2000 | 389 | 171 | 113 | 53 | |
| Ovarian cancer | K-S | 15154 | 7268 | 3408 | 1386 | 268 |
| Wilcoxon | 15154 | 7652 | 3927 | 1876 | 329 | |
|
| 15154 | 7900 | 3848 | 1938 | 318 | |
| Leukemia | K-S | 7129 | 1716 | 1036 | 843 | 524 |
| Wilcoxon | 7129 | 1860 | 1169 | 962 | 644 | |
|
| 7129 | 1811 | 1115 | 931 | 583 |
The average classification accuracy (%) of 10-fold cross-validation in the gene subsets selected by K-S, Wilcoxon test, and T-test in five datasets with different alpha.
| Dataset | Algorithm | Alpha = 1 | Alpha = 0.05 | Alpha = 0.01 | Alpha = 0.005 | Alpha = 0.001 |
|---|---|---|---|---|---|---|
| Breast cancer | K-S | 68.6 | 83.6 | 86.3 | 86.3 | 83.2 |
| Wilcoxon | 67.8 | 83.5 | 84.8 | 84.8 | 84.8 | |
|
| 66.7 | 80.2 | 83.5 | 80.5 | 80.5 | |
| Lung cancer | K-S | 85.8 | 89.6 | 90.4 | 91.6 | 91.6 |
| Wilcoxon | 85.8 | 86.9 | 88.5 | 89.5 | 89.5 | |
|
| 83.6 | 86.9 | 88.5 | 89.5 | 89.5 | |
| Colon tumor | K-S | 73.4 | 81.4 | 85.9 | 85.9 | 83.2 |
| Wilcoxon | 73.4 | 79.2 | 80.2 | 81.5 | 83.2 | |
|
| 73.4 | 79.2 | 80.2 | 81.5 | 81.5 | |
| Ovarian cancer | K-S | 95.3 | 98.6 | 100 | 100 | 96.5 |
| Wilcoxon | 95.3 | 97.3 | 98.6 | 100 | 94.6 | |
|
| 95.3 | 97.3 | 98.6 | 100 | 94.6 | |
| Leukemia | K-S | 71.6 | 75.3 | 81.4 | 82.6 | 85.6 |
| Wilcoxon | 71.6 | 75.3 | 81.4 | 81.4 | 83.5 | |
|
| 71.6 | 75.3 | 81.4 | 81.4 | 82.2 |
The comparisons in CFS, mRMR, and ReliefF algorithms.
| Dataset | Gene selection method | |||||
|---|---|---|---|---|---|---|
| CFS | mRMR | ReliefF | ||||
| The number of genes | Accuracy | The number of genes | Accuracy | The number of genes | Accuracy | |
| Breast cancer | 11.7 | 87.4 | 10.4 | 85.6 | 15.9 | 59.5 |
| Lung cancer | 23.2 | 91.6 | 25.7 | 88.4 | 26.7 | 87.6 |
| Colon tumor | 10.7 | 90.1 | 12.6 | 86.4 | 15.3 | 84.8 |
| Ovarian cancer | 33.2 | 98.5 | 31.5 | 95.6 | 37.4 | 93.2 |
| Leukemia | 25.2 | 99.6 | 2.5 | 99.6 | 16.4 | 77.6 |
The comparisons in K-S test-CF, K-S, CFS, mRMR, and ReliefF algorithms.
| Dataset | Gene selection method | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| k-S test- CFS | CFS | k-S test | mRMR | ReliefF | ||||||
| The number of genes | Accuracy | The number of genes | Accuracy | The number of genes | Accuracy | The number of genes | Accuracy | The number of genes | Accuracy | |
| Breast cancer | 11.7 | 87.4 | 19.6 | 80.5 | 22.5 | 78.8 | 21.8 | 82.4 | 15.9 | 59.4 |
| Lung cancer | 23 | 91.6 | 27.3 | 88.9 | 33.4 | 80.6 | 289 | 89.8 | 33.6 | 84.7 |
| Colon tumor | 10.7 | 90.1 | 6.8 | 89.7 | 19.4 | 84.5 | 5.9 | 89.7 | 15 | 74.9 |
| Ovarian cancer | 33.2 | 98.5 | 31.6 | 95.3 | 46 | 78.9 | 32.7 | 95.2 | 39.6 | 90.6 |
| Leukemia | 25.2 | 79.6 | 33.3 | 78.9 | 38.7 | 72.7 | 28.6 | 75.7 | 36.4 | 77.6 |