| Literature DB >> 16046818 |
Dechang Chen1, Zhenqiu Liu, Xiaobin Ma, Dong Hua.
Abstract
Gene selection is an important issue in analyzing multiclass microarray data. Among many proposed selection methods, the traditional ANOVA F test statistic has been employed to identify informative genes for both class prediction (classification) and discovery problems. However, the F test statistic assumes an equal variance. This assumption may not be realistic for gene expression data. This paper explores other alternative test statistics which can handle heterogeneity of the variances. We study five such test statistics, which include Brown-Forsythe test statistic and Welch test statistic. Their performance is evaluated and compared with that of F statistic over different classification methods applied to publicly available microarray datasets.Entities:
Year: 2005 PMID: 16046818 PMCID: PMC1184045 DOI: 10.1155/JBB.2005.132
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Multiclass gene expression datasets.
| Dataset | Leukemia72 | Ovarian | NCI | Lung cancer | Lymphoma |
| No of genes | 6817 | 7129 | 9703 | 918 | 4026 |
| No of samples | 72 | 39 | 60 | 73 | 96 |
| No of classes | 3 | 3 | 9 | 7 | 9 |
Performances of the test statistics with 50 informative genes.
| Dataset | F | B | W | W* | C | H |
| Leukemia | 3.4 | 2.4 | 2.8 | 2.8 | 3.2 | 3.0 |
| 3 | 2 | 3 | 3 | 3 | 3 | |
| Ovarian | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 0 | 0 | 0 | 0 | 0 | 0 | |
| NCI | 36.0 | 32.0 | 27.4 | 26.0 | 27.0 | 35.4 |
| 35 | 29 | 27 | 27 | 27 | 35 | |
| Lung cancer | 17.6 | 17.0 | 17.6 | 17.6 | 18.0 | 18.0 |
| 17 | 17 | 18 | 18 | 18 | 18 | |
| Lymphoma | 23.8 | 19.8 | 14.0 | 14.0 | 12.8 | 22.0 |
| 23 | 19 | 12 | 12 | 13 | 20 | |
Performances of the test statistics with 100 informative genes.
| Dataset | F | B | W | W* | C | H |
| Leukemia | 3.4 | 3.0 | 3.0 | 3.0 | 3.2 | 3.0 |
| 3 | 3 | 4 | 3 | 3 | 3 | |
| Ovarian | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 0 | 0 | 0 | 0 | 0 | 0 | |
| NCI | 33.0 | 22.6 | 23.8 | 25.2 | 25.2 | 31.6 |
| 33 | 22 | 25 | 26 | 26 | 31 | |
| Lung cancer | 12.2 | 12.2 | 11.4 | 12.2 | 12.2 | 15.8 |
| 12 | 12 | 11 | 11 | 11 | 14 | |
| Lymphoma | 21.8 | 19.2 | 13.0 | 13.8 | 14.4 | 18.2 |
| 17 | 16 | 12 | 12 | 12 | 18 | |
Performances of the test statistics with 200 informative genes.
| Dataset | F | B | W | W* | C | H |
| Leukemia | 3.0 | 3.0 | 2.4 | 2.8 | 1.8 | 2.4 |
| 3 | 3 | 2 | 3 | 1 | 2 | |
| Ovarian | 0.4 | 0.2 | 0.2 | 0.2 | 0.2 | 0.4 |
| 0 | 0 | 0 | 0 | 0 | 0 | |
| NCI | 25.6 | 22.6 | 22.6 | 22.8 | 22.2 | 25.6 |
| 26 | 22 | 24 | 25 | 24 | 25 | |
| Lung cancer | 15.2 | 12.6 | 14.2 | 13.2 | 12.8 | 13.2 |
| 13 | 11 | 12 | 12 | 12 | 11 | |
| Lymphoma | 21.2 | 18.8 | 12.0 | 12.6 | 12.8 | 16.2 |
| 15 | 14 | 8 | 9 | 8 | 14 | |
Figure 1Relative performances of test statistics based on the average errors.
Figure 2Relative performances of test statistics based on the median errors.
Mapping from genes selected by the Brown-Forsythe test statistic for the leukemia data to clusters of genes of interest provided by Getz et al [26].
| Gene description | Access number | Cluster by Getz et al [ | B |
| GB DEF = T-cell antigen receptor gene T3-delta | X03934 | LG5 | 70.808014 |
| Protein tyrosine kinase related mRNA sequence | L05148 | LG5 | 43.676056 |
| CD33CD33 antigen (differentiation antigen) | M23197 | LG1 | 42.435883 |
| GB DEF = T-lymphocyte specific protein tyrosine kinase p56lck (lck) abberant mRNA | U23852 s | LG5 | 35.120228 |
| T-cell surface glycoprotein CD3 epsilon chain precursor | M23323 s | LG5 | 35.028965 |
| CTSD (cathepsin D) (lysosomal aspartyl protease) | M63138 | LG1 | 34.865067 |
| HLA class II histocompatibility antigen, DR alpha chain precursor | X00274 | LG6 | 31.882597 |
| HLA class I histocompatibility antigen, F alpha chain precursor | X17093 | LG6 | 31.83585 |
| Leukotriene C4 synthase (LTC4S) gene | U50136 rna1 | LG1 | 31.183104 |
| RNS2 (ribonuclease 2) (eosinophil-derived neurotoxin (EDN)) | X16546 | LG1 | 29.52516 |
| TIMP2 (tissue inhibitor of metalloproteinase 2) | M32304 s | LG1 | 28.233025 |
| LMP2 gene extracted from | X66401 cds1 | LG6 | 27.11849 |