| Literature DB >> 24098861 |
Alireza Mehri Dehnavi1, Mohammad Reza Sehhati, Hossein Rabbani.
Abstract
Using primary tumor gene expression has been shown to have the ability of finding metastasis-driving gene markers for prediction of breast cancer recurrence (BCR). However, there are some difficulties associated with analysis of microarray data, which led to poor predictive power and inconsistency of previously introduced gene signatures. In this study, a hybrid method was proposed for identifying more predictive gene signatures from microarray datasets. Initially, the parameters of a Rough-Set (RS) theory based feature selection method were tuned to construct a customized gene extraction algorithm. Afterward, using RS gene selection method the most informative genes selected from six independent breast cancer datasets. Then, combined set of these six signature sets, containing 114 genes, was evaluated for prediction of BCR. In final, a meta-signature, containing 18 genes, selected from the combination of datasets and its prediction accuracy compared to the combined signature. The results of 10-fold cross-validation test showed acceptable misclassification error rate (MCR) over 1338 cases of breast cancer patients. In comparison to a recent similar work, our approach reached more than 5% reduction in MCR using a fewer number of genes for prediction. The results also demonstrated 7% improvement in average accuracy in six utilized datasets, using the combined set of 114 genes in comparison with 18-genes meta-signature. In this study, a more informative gene signature was selected for prediction of BCR using a RS based gene extraction algorithm. To conclude, combining different signatures demonstrated more stable prediction over independent datasets.Entities:
Keywords: Breast cancer recurrence prediction; gene expression signature; meta-signature; rough-set theory
Year: 2013 PMID: 24098861 PMCID: PMC3788197
Source DB: PubMed Journal: J Med Signals Sens ISSN: 2228-7477
Summary of breast cancer microarray datasets
Supplementary Figure 1The adaptive neuro-fuzzy inference system model structure. In this structure there are 16 Gaussian membership functions for input nodes and two Gaussian MFs for middle and output nodes. Settings for fuzzy inference system are: And = “prod”; Or = “probor”; Defuzzifier = “wtaver”; Implication = “prod”; Aggregation = “sum”
Supplementary Figure 2Performance evaluation in 100 epochs of training
Misclassification error rate of 10-fold independent cross-validation in six breast cancer studies
Misclassification error rate of 10-fold independent cross-validation in six breast cancer studies with combined set and meta-selected genes
Misclassification error rate of 10-fold cross-validation in Wang dataset (GSE2034) with Li gene sets (NRCx)
Figure 1Kaplan-Meier relapse-free survival curves. (a) Classified samples on Wang dataset with 114-genes combined signature; (b) Classified samples on combined dataset of 1338 samples with 114-genes combined signature
Supplementary Figure 3KM plot for classified samples of dataset GSE3494 using 23 genes extracted from this dataset (P = 3 e-05)
Supplementary Figure 8KM plot for classified samples of dataset GSE6532 using 114 genes from six datasets (P = 0.09)