Literature DB >> 12385987

Feature subset selection for splice site prediction.

Sven Degroeve1, Bernard De Baets, Yves Van de Peer, Pierre Rouzé.   

Abstract

MOTIVATION: The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presence or absence of certain nucleotides in close proximity to the splice site. Since it is not known how many and which nucleotides are relevant for splice site prediction, the set of features is chosen large enough such that the probability that all relevant information sources are in the set is very high. Using only those features that are relevant for constructing a splice site prediction system might improve the system and might also provide us with useful biological knowledge. Using fewer features will of course also improve the prediction speed of the system.
RESULTS: A wrapper-based feature subset selection algorithm using a support vector machine or a naive Bayes prediction method was evaluated against the traditional method for selecting features relevant for splice site prediction. Our results show that this wrapper approach selects features that improve the performance against the use of all features and against the use of the features selected by the traditional method. AVAILABILITY: The data and additional interactive graphs on the selected feature subsets are available at http://www.psb.rug.ac.be/gps

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12385987     DOI: 10.1093/bioinformatics/18.suppl_2.s75

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

1.  Toward more efficient protein expression: keep the message simple.

Authors:  Stephan Kalwy; James Rance; Robert Young
Journal:  Mol Biotechnol       Date:  2006-10       Impact factor: 2.695

Review 2.  Machine learning applications in genetics and genomics.

Authors:  Maxwell W Libbrecht; William Stafford Noble
Journal:  Nat Rev Genet       Date:  2015-05-07       Impact factor: 53.242

3.  An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network.

Authors:  Yong-Hua Wang; Yan Li; Sheng-Li Yang; Ling Yang
Journal:  J Comput Aided Mol Des       Date:  2005-03       Impact factor: 3.686

4.  Feature selection for splice site prediction: a new method using EDA-based feature ranking.

Authors:  Yvan Saeys; Sven Degroeve; Dirk Aeyels; Pierre Rouzé; Yves Van de Peer
Journal:  BMC Bioinformatics       Date:  2004-05-21       Impact factor: 3.169

5.  Three-class classification models of logS and logP derived by using GA-CG-SVM approach.

Authors:  Hui Zhang; Ming-Li Xiang; Chang-Ying Ma; Qi Huang; Wei Li; Yang Xie; Yu-Quan Wei; Sheng-Yong Yang
Journal:  Mol Divers       Date:  2009-01-31       Impact factor: 3.364

6.  Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites.

Authors:  Peter Meinicke; Maike Tech; Burkhard Morgenstern; Rainer Merkl
Journal:  BMC Bioinformatics       Date:  2004-10-28       Impact factor: 3.169

7.  Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens.

Authors:  Stefan A Rensing; Dana Fritzowsky; Daniel Lang; Ralf Reski
Journal:  BMC Genomics       Date:  2005-03-22       Impact factor: 3.969

8.  A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data.

Authors:  Prabina Kumar Meher; Tanmaya Kumar Sahu; Atmakuri Ramakrishna Rao; Sant Dass Wahi
Journal:  BMC Bioinformatics       Date:  2014-11-25       Impact factor: 3.169

9.  Automatic detection of exonic splicing enhancers (ESEs) using SVMs.

Authors:  Britta Mersch; Alexander Gepperth; Sándor Suhai; Agnes Hotz-Wagenblatt
Journal:  BMC Bioinformatics       Date:  2008-09-10       Impact factor: 3.169

10.  Features generated for computational splice-site prediction correspond to functional elements.

Authors:  Rezarta Islamaj Dogan; Lise Getoor; W John Wilbur; Stephen M Mount
Journal:  BMC Bioinformatics       Date:  2007-10-24       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.