| Literature DB >> 27999797 |
Liying Yang1, Zhimin Liu1, Xiguo Yuan1, Jianhua Wei2, Junying Zhang1.
Abstract
Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio. Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM, K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance. Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.Entities:
Mesh:
Year: 2016 PMID: 27999797 PMCID: PMC5143691 DOI: 10.1155/2016/4596326
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Dataset.
| Data | Feature | Sample | Class |
|---|---|---|---|
| Breast Cancer | 24481 | 97 | Relapse |
|
| |||
| Leukemia | 7129 | 72 | All |
|
| |||
| Lung Cancer | 12533 | 181 | Mesothelioma |
|
| |||
| Prostate | 12600 | 136 | Tumor |
|
| |||
| Colon Tumor | 2000 | 62 | Positive |
|
| |||
| CNS | 7129 | 60 | Class 1 |
|
| |||
| Ovarian | 15154 | 253 | Cancer |
|
| |||
| DLBCL | 4026 | 47 | Germinal |
Algorithm 1
Figure 1RS_SVM method.
Number of selected features and optimal size of subspace.
| Data | Number of selected features by | Optimal size of subspace |
|---|---|---|
| Breast Cancer | 1810 | 800 |
| Leukemia | 1697 | 400 |
| Lung Cancer | 3134 | 170 |
| Prostate | 5707 | 100 |
| Colon Tumor | 394 | 150 |
| CNS | 378 | 180 |
| Ovarian | 7949 | 1300 |
| DLBCL | 972 | 150 |
Figure 3Variation of train error and test error with subspace size.
Function and package used in R.
| Function | Package | Parameter |
|---|---|---|
|
| stats | Confidence level of the interval is 0.95. Assume two variances are equal |
|
| ||
| svm() | e1071 | Choose “radial” kernel; gamma is 1/dimension; epsilon is 0.1 |
|
| ||
| knn() | class | Choose |
|
| ||
| rpart() | rpart | Choose method = “class” |
|
| ||
| ada() | ada | Use decision trees as base classifiers; iteration is 50; under exponential loss, type of boosting algorithm to perform is “discrete” |
|
| ||
| ipredbagg() | ipred | Use decision trees as base classifiers; number of bootstrap replications is 25 |
Testing error comparison of RS_SVM and peer methods (%).
| RS_SVM | Single SVM | KNN | CART | AdaBoost | Bagging | |
|---|---|---|---|---|---|---|
| Breast Cancer |
| 15.79 | 47.37 | 31.58 | 10.53 | 31.58 |
| Leukemia | 5.89 | 26.47 |
| 8.82 | 41.18 | 8.82 |
| Lung Cancer |
| 9.40 | 2.68 | 9.40 | 51.01 | 9.40 |
| Prostate |
| 73.53 | 73.53 | 73.53 | 73.53 | 14.71 |
| Colon Tumor | 14.52 | 14.52 | 16.13 | 22.58 | 19.35 |
|
| CNS | 33.33 |
| 35.00 | 36.67 | 41.67 | 45.00 |
| Ovarian |
| 1.58 | 4.35 | 3.16 | 6.72 | 1.98 |
| DLBCL |
| 10.64 | 14.89 | 29.79 | 19.15 | 23.40 |
Testing error comparison of RS_SVM and the state-of-the-art methods (%).
| Breast Cancer | Leukemia | Lung Cancer | Prostate | Colon Tumor | CNS | Ovarian | DLBCL | |
|---|---|---|---|---|---|---|---|---|
| RS_SVM |
| 5.89 | 1.34 |
| 14.52 | 33.33 | 1.19 | 4.26 |
|
| ||||||||
| Nanni et al. [ | 11.43 |
|
| 3.85 | 26.67 | 33.33 |
| 1.43 |
|
| ||||||||
| Ye et al. [ | — | 2.50 | — | 7.5 | 15.00 | — | — | — |
|
| ||||||||
| Liu et al. [ | — |
|
| 3.00 | 8.10 | — | 0.80 | 2 |
|
| ||||||||
| Tan and Gilbert [ | — | 8.90 | 6.80 | 26.50 | 4.90 | 11.7 | — | — |
|
| ||||||||
| Ding and Peng [ | — |
| 2.70 | — | 6.50 | — | — | — |
|
| ||||||||
| Bonilla Huerta et al. [ | — |
| 0.70 | 4.00 | 8.1 | 13.40 |
|
|
|
| ||||||||
| Cheng [ | — |
| 0.67 | 5.88 | — | — | — | — |
|
| ||||||||
| Paliwal and Sharma [ | 26.3 |
| 2.70 | 23.5 | — | — | — | — |
|
| ||||||||
| Bolón-Canedo et al. [ | 36.22 | 11.96 | 2.75 | 11.81 | 13.10 | 36.67 | 1.20 | 20.50 |
| 46.56 | 4.11 |
| 41.87 | 16.19 | 30.00 | 0.8 | 6.50 | |
| 28.11 | 5.54 | 1.11 | 12.53 | 19.05 | 36.67 |
| 4.00 | |
|
| ||||||||
| Porto-Díaz et al. [ | 21.05 |
| 0.67 | 20.59 | 10.00 | 25.00 |
|
|
|
| ||||||||
| Hu et al. [ | — | — | 12.50 | 19.30 | 9.70 | — | — | — |
| — | — | 11.60 | 18.20 | 9.70 | — | — | — | |
|
| ||||||||
| Nagi and Bhattacharyya [ | 26.51 | 7.55 | 18.12 | 47.06 | 5.60 |
| 1.11 | |
|
| ||||||||
| Pati and Das [ | — | 7.89 | 6.25 | — | — | — | — | — |
|
| ||||||||
| Dash et al. [ | — |
| 11.55 | — | 10.95 | — | — | — |
| — | 0.45 |
| — |
| — | — | — | |
| — | 28.22 | 16 | — | 23.33 | — | — | — | |
| — | 0.41 | 0.95 | — | 0.31 | — | — | — | |
|
| ||||||||
| Ghorai et al. [ | 18.79 | 5.48 | 3.62 | 9.84 | 17.23 | — | — | — |
|
| ||||||||
| Luo et al. [ | — | 2.07 | — | — | 18.60 | — | — | 6.00 |
| — | 2.45 | — | — | 19.12 | — | — | 7.19 | |
The state-of-the-art methods are indexed by the first author in literatures. “—” means that there are no corresponding results in the given literature.
Figure 2Scattering Colon Tumor and CNS data by Principle Component Analysis.
Figure 4Scatter of training set and test set on Prostate based on the top two principle components.
Figure 5Variation of sensitivity and specificity with subspace size.
Effect of gene selection based on t-test (%).
| Breast Cancer | Leukemia | Lung Cancer | Prostate | Colon Tumor | CNS | Ovarian | DLBCL | |
|---|---|---|---|---|---|---|---|---|
| With selection |
|
|
|
|
|
|
|
|
| Without selection | 63.16 | 41.18 | 3.36 | 26.47 | 35.48 | 35.00 | 3.20 | 44.68 |