| Literature DB >> 27437198 |
Junshan Yang1, Jiarui Zhou2, Zexuan Zhu3, Xiaoliang Ma1, Zhen Ji1.
Abstract
BACKGROUND: Microarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues. Identifying relevant genes for sample classification of various tumor types is beneficial to clinical studies. One of the most widely used classification strategies for multiclass classification data is the One-Versus-All (OVA) schema that divides the original problem into multiple binary classification of one class against the rest. Nevertheless, multiclass microarray data tend to suffer from imbalanced class distribution between majority and minority classes, which inevitably deteriorates the performance of the OVA classification.Entities:
Year: 2016 PMID: 27437198 PMCID: PMC4943507 DOI: 10.1186/s40709-016-0045-8
Source DB: PubMed Journal: J Biol Res (Thessalon) ISSN: 1790-045X Impact factor: 1.889
Summary of microarray data sets
| Name | #Features | #Samples | #Samples in each class | #Classes | Source |
|---|---|---|---|---|---|
| GCM | 14,122 | 190 | 11 10 11 11 22 11 10 10 30 11 11 11 11 20 | 14 | [ |
| Lung | 12,600 | 203 | 139 17 6 21 20 | 5 | [ |
| ALL | 12,558 | 327 | 15 27 64 20 43 79 79 | 7 | [ |
| ALL-AML-4 | 7129 | 72 | 38 9 21 4 | 4 | [ |
| ALL-AML-3 | 7129 | 72 | 38 9 25 | 3 | [ |
| Thyroid | 2000 | 168 | 58 28 40 42 | 4 | [ |
Fig. 1Comparison of classification accuracy using KNN and SVM. The y-axis indicates the classification accuracy (in percentage, %). The x-axis indicates the number of selected gene signatures. In the legend, “Undersampling” is abbreviated to “US” and “Oversampling” is abbreviated to “OS”. The first and the third column are the experiments using KNN. The second and the fourth column are the experiments using SVM
Fig. 2Comparison of AUC using KNN and SVM. The y-axis indicates AUC (in percentage, %). The x-axis indicates the number of selected gene signatures. In the legend, “Undersampling” is abbreviated to “US” and “Oversampling” is abbreviated to “OS”. The first and the third column are the experiments using KNN. The second and the fourth column are the experiments using SVM
Fig. 3The iterative ensemble feature selection framework