| Literature DB >> 20100357 |
Taeho Hwang1, Choong-Hyun Sun, Taegyun Yun, Gwan-Su Yi.
Abstract
BACKGROUND: The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset.Entities:
Mesh:
Year: 2010 PMID: 20100357 PMCID: PMC3098082 DOI: 10.1186/1471-2105-11-50
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Various gene selection procedures in FiGS. As many as 60 different gene selection procedures can be developed by combining the feature selection methods, classification algorithms and various optional techniques. The feature vector addition technique is applied only to cases where the specified selection of up-regulated or down-regulated genes is used in the feature selection step. The range for the number of genes can be also set by users, though it is not shown here.
The best performing gene selection procedure with the .632+ bootstrap error identified by FiGS for each of the six microarray datasets.
| Dataset | Feature selection method | Gene expression pattern | Feature discretization | Feature vector addition | Classifier | Error | |
|---|---|---|---|---|---|---|---|
| Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Not apply | SVM | 0.02 |
| Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Not apply | RF | 0.02 |
| Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Apply | SVM | 0.02 |
| Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Apply | RF | 0.02 |
| Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Apply | Not apply | SVM | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Not apply | Not apply | SVM | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Not apply | Not apply | RF | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Not apply | Apply | SVM | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Not apply | Apply | RF | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Apply | Not apply | SVM | 0.02 |
| Leukemia | Information gain method | 10 | Down-regulated | Apply | Not apply | RF | 0.02 |
| Colon | Information gain method | 30 | Up-regulated | Not apply | Not apply | RF | 0.11 |
| Prostate | Information gain method | 25 | Total | Not apply | Not apply | RF | 0.05 |
| Adenocarcinoma | Wilcoxon rank sum test | 10 | Up-regulated | Not apply | Not apply | RF | 0.10 |
| Breast | Wilcoxon rank sum test | 15 | Down-regulated | Not apply | Apply | SVM | 0.31 |
| Breast | Information gain method | 15 | Down-regulated | Not apply | Apply | SVM | 0.31 |
| DLBCL | Wilcoxon rank sum test | 20 | Total | Not apply | Not apply | RF | 0.08 |
k is the number of selected genes; and error is the .632+ bootstrap error achieved by the best performing gene selection procedure tested on 100 bootstrap samples. In the case of the leukemia and breast datasets, the multiple gene selection procedures are the best.
Figure 2The feature vector addition technique. The symbol Cij is the jth sample in the ith class. The darker gray represents the higher expression values.
Figure 3Box plots of the .632+ bootstrap errors obtained by different gene selection procedures for each of the six cancer microarray datasets.
Figure 4Comparison of the best performing gene selection procedures identified by FiGS with other gene selection approaches in terms of the classification accuracy. The names of the compared gene selection procedures are abbreviated as follows: ttest, t-test; Wilcoxon, Wilcoxon rank sum test; and InfoGain, information gain method. 200 is the number of genes to select. varSelRF_SE0 and varSelRF_SE1 are two versions of varSelRF each with the standard error (SE) term set to 0 and 1, respectively. FiGS_best is the best gene selection procedure identified by FiGS; it produces the best classification accuracy with the smallest number of genes. The classification accuracy represented in the y-axis is 1-.632+bootstrap error.