Shib Sankar Bhowmick1, Debotosh Bhattacharjee2, Luis Rato3. 1. Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, 700107, India. shibsankar.bhowmick@heritageit.edu. 2. Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India. 3. Department of Informatics, University of Evora, 7004-516, Evora, Portugal.
Abstract
BACKGROUND: Recent advancement in bioinformatics offers the ability to identify informative genes from high dimensional gene expression data. Selection of informative genes from these large datasets has emerged as an issue of major concern among researchers. OBJECTIVE: Gene functionality and regulatory mechanisms can be understood through the analysis of these gene expression data. Here, we present a computational method to identify informative genes for breast cancer subtypes such as Basal, human epidermal growth factor receptor 2 (Her2), luminal A (LumA), and luminal B (LumB). METHODS: The proposed In Silico Markers method is a wrapper feature selection method based on Least Absolute Shrinkage and Selection Operator (LASSO), Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and Support Vector Machine (SVM) as a classifier. Moreover, the composite measure consisting of relevance, redundancy, and rank score of frequently appeared genes are used to select informative genes. RESULTS: The informative genes are validated by statistical and biologically relevant criteria. For a comparative evaluation of the proposed approach, biological similarity score designed on semantic similarity measure of GO terms are investigated. Further, the proposed technique is evaluated with 7 existing gene selection techniques using two-class annotated breast cancer subtype datasets. CONCLUSION: The utilization of this method can bring about the discovery of informative genes. Furthermore, under multiple criteria decision-making set-up, informative genes selected by the In Silico Markers are found to be admirable than the compared methods selected genes.
BACKGROUND: Recent advancement in bioinformatics offers the ability to identify informative genes from high dimensional gene expression data. Selection of informative genes from these large datasets has emerged as an issue of major concern among researchers. OBJECTIVE: Gene functionality and regulatory mechanisms can be understood through the analysis of these gene expression data. Here, we present a computational method to identify informative genes for breast cancer subtypes such as Basal, humanepidermal growth factor receptor 2 (Her2), luminal A (LumA), and luminal B (LumB). METHODS: The proposed In Silico Markers method is a wrapper feature selection method based on Least Absolute Shrinkage and Selection Operator (LASSO), Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and Support Vector Machine (SVM) as a classifier. Moreover, the composite measure consisting of relevance, redundancy, and rank score of frequently appeared genes are used to select informative genes. RESULTS: The informative genes are validated by statistical and biologically relevant criteria. For a comparative evaluation of the proposed approach, biological similarity score designed on semantic similarity measure of GO terms are investigated. Further, the proposed technique is evaluated with 7 existing gene selection techniques using two-class annotated breast cancer subtype datasets. CONCLUSION: The utilization of this method can bring about the discovery of informative genes. Furthermore, under multiple criteria decision-making set-up, informative genes selected by the In Silico Markers are found to be admirable than the compared methods selected genes.
Entities:
Keywords:
Biological analysis; Breast cancer subtype; Gene selection; Messenger RNA; Statistical analysis
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Ahmedin Jemal; Freddie Bray; Melissa M Center; Jacques Ferlay; Elizabeth Ward; David Forman Journal: CA Cancer J Clin Date: 2011-02-04 Impact factor: 508.702
Authors: Marilena V Iorio; Manuela Ferracin; Chang-Gong Liu; Angelo Veronese; Riccardo Spizzo; Silvia Sabbioni; Eros Magri; Massimo Pedriali; Muller Fabbri; Manuela Campiglio; Sylvie Ménard; Juan P Palazzo; Anne Rosenberg; Piero Musiani; Stefano Volinia; Italo Nenci; George A Calin; Patrizia Querzoli; Massimo Negrini; Carlo M Croce Journal: Cancer Res Date: 2005-08-15 Impact factor: 12.701
Authors: Christos Sotiriou; Pratyaksha Wirapati; Sherene Loi; Adrian Harris; Steve Fox; Johanna Smeds; Hans Nordgren; Pierre Farmer; Viviane Praz; Benjamin Haibe-Kains; Christine Desmedt; Denis Larsimont; Fatima Cardoso; Hans Peterse; Dimitry Nuyten; Marc Buyse; Marc J Van de Vijver; Jonas Bergh; Martine Piccart; Mauro Delorenzi Journal: J Natl Cancer Inst Date: 2006-02-15 Impact factor: 13.506