| Literature DB >> 15450124 |
Bing Liu1, Qinghua Cui, Tianzi Jiang, Songde Ma.
Abstract
BACKGROUND: Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.Entities:
Mesh:
Year: 2004 PMID: 15450124 PMCID: PMC522806 DOI: 10.1186/1471-2105-5-136
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The whole flow chart
Figure 2Three cooperative and competitive neural networks
Gene expression datasets used in this paper
| Dataset | Number of genes | Training samples | Testing samples | References |
| ALL-AML Leukemia | 7129 | 38 (27:11) | 34 (20:14) | Golub et al (1999) |
| Lung Cancer | 12533 | 32 (16:16) | 149 (15:134) | Gordon et al (2002) |
| Prostate Cancer | 12600 | 102 (52:50) | 34 (25:9) | Singh et al (2002) |
| DLBCL | 4026 | 47 (24:23) | 0 | Alizadeh et al (2000) |
| Ovarian Cancer | 15154 | 253 (91:162) | 0 | Petricoin et al (2002) |
| Colon Tumor | 2000 | 62 (40:22) | 0 | Alon et al (1999) |
| MLL_Leukemia | 12582 | 57 (20:17:20) | 15 (4:3:8) | Armstrong et al (2002) |
All these datasets are downloaded from
The predictive accuracy of testing samples
| Bagged decision trees | |||
| The best methods | |||
| Our methods | |||
| LOOCV on training samples |
* Note that the row of the best methods refer to the different method in different datasets
Figure 3Comparing predictive accuracy of 3 separate testing samples with other methods
The predictive accuracy by LOOCV and 10-fold CV
| DLBCL | Ovarian cancer | Colon tumor | ||
| LOOCV | Other methods | — | — | |
| Our method | ||||
| 10-fold CV | Other methods | — | — | |
| Our method |
* Note that the two rows of other methods refer to the different best method in different datasets
Figure 4Comparing predictive accuracy of 3 datasets without testing samples with other methods