| Literature DB >> 24098862 |
Mohammad Reza Sehhati1, Alireza Mehri Dehnavi, Hossein Rabbani, Shaghayegh Haghjoo Javanmard.
Abstract
Numerous studies used microarray gene expression data to extract metastasis-driving gene signatures for the prediction of breast cancer relapse. However, the accuracy and generality of the previously introduced biomarkers are not acceptable for reliable usage in independent datasets. This inadequacy is attributed to ignoring gene interactions by simple feature selection methods, due to their computational burden. In this study, an integrated approach with low computational cost was proposed for identifying a more predictive gene signature, for prediction of breast cancer recurrence. First, a small set of genes was primarily selected as signature by an appropriate filter feature selection (FFS) method. Then, a binary sub-class of protein-protein interaction (PPI) network was used to expand the primary set by adding adjacent proteins of each gene signature from the PPI-network. Subsequently, the support vector machine-based recursive feature elimination (SVMRFE) method was applied to the expression level of all the genes in the expanded set. Finally, the genes with the highest score by SVMRFE were selected as the new biomarkers. Accuracy of the final selected biomarkers was evaluated to classify four datasets on breast cancer patients, including 800 cases, into two cohorts of poor and good prognosis. The results of the five-fold cross validation test, using the support vector machine as a classifier, showed more than 13% improvement in the average accuracy, after modifying the primary selected signatures. Moreover, the method used in this study showed a lower computational cost compared to the other PPI-based methods. The proposed method demonstrated more robust and accurate biomarkers using the PPI network, at a low computational cost. This approach could be used as a supplementary procedure in microarray studies after applying various gene selection methods.Entities:
Keywords: Breast cancer; feature selection method; protein–protein interaction; recurrence prediction; support vector machine
Year: 2013 PMID: 24098862 PMCID: PMC3788198
Source DB: PubMed Journal: J Med Signals Sens ISSN: 2228-7477
Figure 1A schematic view of our approach
Summary of breast cancer microarray datasets
Figure 2Tuning of SVM parameters to obtain the best accuracy
Supplemental Figure 1CV accuracy versus gene signature size. For size 50 we reached the max accuracy over the Wang dataset
Figure 3Classification performance (five-fold CV) in the Wang dataset before and after applying our approach to five FFS methods
Supplemental Figure 2Improvement in accuracy (using AUC as classification measure) after applying our approach to different FFS methods
Robustness of gene signature performances. Each cell indicates the five-fold CV accuracy of a signature that was extracted by Wilcoxon, primarily from the source dataset and then expanded and tested over the destination dataset
Figure 4Comparison of robustness among different datasets for primary signatures and the final result of our approach. The horizontal axis indicates the source datasets that are used for extracting primary signatures by the Wilcoxon method. Bar height indicates the mean CV accuracy obtained over the other three datasets using the primary signatures