Literature DB >> 18999162

Variable-length positional modeling for biological sequence classification.

Andigoni Malousi1, Ioanna Chouvarda, Vassilis Koutkias, Sofia Kouidou, Nicos Maglaveras.   

Abstract

Selecting the most informative features in supervised biological classification problems is a decisive pre-processing step for two main reasons: (1) to deal with the dimensionality reduction problem, and (2) to ascribe biological meaning to the underlying feature interactions. This paper presents a filter-based feature selection method that is suitable for positional modeling of biological sequences. The basic motivation is the problem of using a positional model of fixed length that sub-optimally describes biological sequences in a specific classification problem. The core filtering criterion is the F-score and the source features are the positional probabilities describing variable-length interactions among residues. The proposed method was evaluated on human splice sites classification using a linear SVM classifier. The method yields to superior classification accuracy compared to the individual positional models, while it maintains the space complexity of the individual models, in a time-efficient way and independently of the classifier.

Entities:  

Mesh:

Year:  2008        PMID: 18999162      PMCID: PMC2656059     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  15 in total

1.  Interpolated markov chains for eukaryotic promoter recognition.

Authors:  U Ohler; S Harbeck; H Niemann; E Nöth; M G Reese
Journal:  Bioinformatics       Date:  1999-05       Impact factor: 6.937

2.  ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.

Authors:  C Iseli; C V Jongeneel; P Bucher
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1999

3.  GeneSplicer: a new computational method for splice site prediction.

Authors:  M Pertea; X Lin; S L Salzberg
Journal:  Nucleic Acids Res       Date:  2001-03-01       Impact factor: 16.971

4.  Prediction of beta-turns in proteins using the first-order Markov models.

Authors:  Thy-Hou Lin; Ging-Ming Wang; Yen-Tseng Wang
Journal:  J Chem Inf Comput Sci       Date:  2002 Jan-Feb

5.  Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

Authors:  B W Matthews
Journal:  Biochim Biophys Acta       Date:  1975-10-20

6.  SpliceMachine: predicting splice sites from high-dimensional local context representations.

Authors:  Sven Degroeve; Yvan Saeys; Bernard De Baets; Pierre Rouzé; Yves Van de Peer
Journal:  Bioinformatics       Date:  2004-11-25       Impact factor: 6.937

7.  Translation initiation site prediction on a genomic scale: beauty in simplicity.

Authors:  Yvan Saeys; Thomas Abeel; Sven Degroeve; Yves Van de Peer
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

8.  Prediction of complete gene structures in human genomic DNA.

Authors:  C Burge; S Karlin
Journal:  J Mol Biol       Date:  1997-04-25       Impact factor: 5.469

9.  A weight array method for splicing signal analysis.

Authors:  M Q Zhang; T G Marr
Journal:  Comput Appl Biosci       Date:  1993-10

10.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification.

Authors:  Xiang H-F Zhang; Katherine A Heller; Ilana Hefter; Christina S Leslie; Lawrence A Chasin
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.