| Literature DB >> 19956419 |
Qihua Tan1, Mads Thomassen, Kirsten M Jochumsen, Ole Mogensen, Kaare Christensen, Torben A Kruse.
Abstract
Different from significant gene expression analysis which looks for genes that are differentially regulated, feature selection in the microarray-based prognostic gene expression analysis aims at finding a subset of marker genes that are not only differentially expressed but also informative for prediction. Unfortunately feature selection in literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significances. We introduce a combinatory feature selection strategy that integrates differential gene expression analysis with the Gram-Schmidt process to identify prognostic genes that are both statistically significant and highly informative for predicting tumour survival outcomes. Empirical application to leukemia and ovarian cancer survival data through-within- and cross-study validations shows that the feature space can be largely reduced while achieving improved testing performances.Entities:
Year: 2009 PMID: 19956419 PMCID: PMC2777003 DOI: 10.1155/2009/480486
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Flow diagram of the combinatory procedure. A combination of gene filtering using the supervised and gene ranking using the unsupervised analyses helps to assist the optimization step to identify a subset of prognostic genes for predicting the outcomes of an independent testing set.
Figure 2Prediction results for within-study cross-validation analysis of three cancer data sets. The results are shown as the SVM probability for each testing sample with censored observations in empty and uncensored in solid circles: (a), (c), (e) and as the Kaplan-Meier survival curves for the predicted favourable (solid) and unfavourable (dashed) groups: (b), (d), (f) with (a) and (b) for the adult acute myeloid leukemia data; (c) and (d) for the published ovarian cancer data; (e) and (f) for the in-house ovarian cancer data.
Figure 3Prediction results for cross-study validation analysis of the two ovarian cancer data sets. The results are shown as the LOO SVM probability for each sample in the in-house data with censored observations in empty and uncensored in solid circles (a) and as the Kaplan-Meier survival curves for the predicted favourable (solid) and unfavourable (dashed) groups (b). Results from analysis using genes ranked by their statistical significances are shown in (c) and (d).