Literature DB >> 15154966

Feature selection for splice site prediction: a new method using EDA-based feature ranking.

Yvan Saeys1, Sven Degroeve, Dirk Aeyels, Pierre Rouzé, Yves Van de Peer.   

Abstract

BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.
RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.
CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15154966      PMCID: PMC421631          DOI: 10.1186/1471-2105-5-64

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  9 in total

Review 1.  Pre-mRNA splicing in higher plants.

Authors:  Z J Lorković; D A Wieczorek Kirk; M H Lambermon; W Filipowicz
Journal:  Trends Plant Sci       Date:  2000-04       Impact factor: 18.313

2.  The equation for response to selection and its use for prediction.

Authors:  H Mühlenbein
Journal:  Evol Comput       Date:  1997       Impact factor: 3.277

Review 3.  Allosteric cascade of spliceosome activation.

Authors:  David A Brow
Journal:  Annu Rev Genet       Date:  2002-06-11       Impact factor: 16.830

4.  Feature subset selection for splice site prediction.

Authors:  Sven Degroeve; Bernard De Baets; Yves Van de Peer; Pierre Rouzé
Journal:  Bioinformatics       Date:  2002       Impact factor: 6.937

Review 5.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

Review 6.  Computational prediction of eukaryotic protein-coding genes.

Authors:  Michael Q Zhang
Journal:  Nat Rev Genet       Date:  2002-09       Impact factor: 53.242

7.  Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction.

Authors:  Yvan Saeys; Sven Degroeve; Dirk Aeyels; Yves Van De Peer; Pierre Rouzé
Journal:  Bioinformatics       Date:  2003-10       Impact factor: 6.937

8.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification.

Authors:  Xiang H-F Zhang; Katherine A Heller; Ilana Hefter; Christina S Leslie; Lawrence A Chasin
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

9.  Scanning and competition between AGs are involved in 3' splice site selection in mammalian introns.

Authors:  C W Smith; T T Chu; B Nadal-Ginard
Journal:  Mol Cell Biol       Date:  1993-08       Impact factor: 4.272

  9 in total
  8 in total

1.  Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization.

Authors:  Igor Vorechovský
Journal:  Nucleic Acids Res       Date:  2006-09-08       Impact factor: 16.971

2.  Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble.

Authors:  Dong-Jun Yu; Jun Hu; Hui Yan; Xi-Bei Yang; Jing-Yu Yang; Hong-Bin Shen
Journal:  BMC Bioinformatics       Date:  2014-09-05       Impact factor: 3.169

3.  A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data.

Authors:  Prabina Kumar Meher; Tanmaya Kumar Sahu; Atmakuri Ramakrishna Rao; Sant Dass Wahi
Journal:  BMC Bioinformatics       Date:  2014-11-25       Impact factor: 3.169

4.  Feature Subset Selection for Cancer Classification Using Weight Local Modularity.

Authors:  Guodong Zhao; Yan Wu
Journal:  Sci Rep       Date:  2016-10-05       Impact factor: 4.379

5.  A robust hybrid approach based on estimation of distribution algorithm and support vector machine for hunting candidate disease genes.

Authors:  Li Li; Hongmei Chen; Chang Liu; Fang Wang; Fangfang Zhang; Lihua Bai; Yihan Chen; Luying Peng
Journal:  ScientificWorldJournal       Date:  2013-02-07

6.  A review of estimation of distribution algorithms in bioinformatics.

Authors:  Rubén Armañanzas; Iñaki Inza; Roberto Santana; Yvan Saeys; Jose Luis Flores; Jose Antonio Lozano; Yves Van de Peer; Rosa Blanco; Víctor Robles; Concha Bielza; Pedro Larrañaga
Journal:  BioData Min       Date:  2008-09-11       Impact factor: 2.522

Review 7.  A survey on evolutionary algorithm based hybrid intelligence in bioinformatics.

Authors:  Shan Li; Liying Kang; Xing-Ming Zhao
Journal:  Biomed Res Int       Date:  2014-03-06       Impact factor: 3.411

8.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm.

Authors:  Kun-Huang Chen; Kung-Jeng Wang; Min-Lung Tsai; Kung-Min Wang; Angelia Melani Adrian; Wei-Chung Cheng; Tzu-Sen Yang; Nai-Chia Teng; Kuo-Pin Tan; Ku-Shang Chang
Journal:  BMC Bioinformatics       Date:  2014-02-20       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.