Literature DB >> 29313209

An Ensemble Framework Coping with Instability in the Gene Selection Process.

José A Castellanos-Garzón1,2, Juan Ramos3, Daniel López-Sánchez3, Juan F de Paz3, Juan M Corchado3,4.   

Abstract

This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.

Entities:  

Keywords:  Data mining; Ensemble method; Filter method; Gene expression data; Gene selection; Machine learning; Wrapper method

Mesh:

Year:  2018        PMID: 29313209     DOI: 10.1007/s12539-017-0274-z

Source DB:  PubMed          Journal:  Interdiscip Sci        ISSN: 1867-1462            Impact factor:   2.233


  2 in total

1.  Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality.

Authors:  Xinyu Zhang; Ying Hu; Bradley E Aouizerat; Gang Peng; Vincent C Marconi; Michael J Corley; Todd Hulgan; Kendall J Bryant; Hongyu Zhao; John H Krystal; Amy C Justice; Ke Xu
Journal:  Clin Epigenetics       Date:  2018-12-13       Impact factor: 6.551

2.  Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches.

Authors:  Cindy Perscheid; Bastien Grasnick; Matthias Uflacker
Journal:  J Integr Bioinform       Date:  2018-12-22
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.