| Literature DB >> 19455244 |
Qihua Tan1, Mads Thomassen, Torben A Kruse.
Abstract
Among the major issues in gene expression profile classification, feature selection is an important and necessary step in achieving and creating good classification rules given the high dimensionality of microarray data. Although different feature selection methods have been reported, there has been no method specifically proposed for paired microarray experiments. In this paper, we introduce a simple procedure based on a modified t-statistic for feature selection to microarray experiments using the popular matched case-control design and apply to our recent study on tumor metastasis in a low-malignant group of breast cancer patients for selecting genes that best predict metastases. Gene or feature selection is optimized by thresholding in a leaving one-pair out cross-validation. Model comparison through empirical application has shown that our method manifests improved efficiency with high sensitivity and specificity.Entities:
Keywords: feature selection; gene expression microarray; metastasis; prediction
Year: 2007 PMID: 19455244 PMCID: PMC2675839
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Probability of metastasis calculated by SVM using leaving one-pair out cross-validation based on the 32-gene signature by PAM (1a), the 5-gene signature by our new method (1b) and the 43-gene signature by paired t-test (1c) for the 13 pairs of low-malignant T1 (asterisk) and 17 pairs of low-malignant T2 (triangle) patients. The best performance is achieved by our 5-gene signature with improved prediction accuracy and better separation.
Figure 2.ROC analysis for model comparison with the dotted curves for the new method in black, for PAM in red and for the paired t-test in green. Since the black curve runs on top of the others in the upper-left triangle of the figure, our new method exhibits higher efficiency in its performance. The high AUC for our new method (0.86) indicates that it outperforms PAM (AUC = 0.83) and the paired t-test (AUC = 0.80).
Information on the 5 selected genes.
| FLJ20354 | NM_017779 | Hypothetical protein FLJ20354, mRNA. | Intracellular signaling cascade |
| IMAGE:4081483 | BC005998 | Clone IMAGE:4081483, mRNA | Unknown |
| UBE2R2 | NM_017811 | Ubiquitin-conjugating enzyme E2R 2, mRNA. | Ligase activity; ubiquitin conjugating enzyme activity; Ubiquitin cycle; ubiquitin-ligase activity |
| ZNF533 | NM_152520 | Zinc finger protein 533 | Unknown |
| DTL | NM_016448 | Denticleless homolog | Unknown |