Literature DB >> 24350601

Probabilistic grammatical model for helix-helix contact site classification.

Witold Dyrka1, Jean-Christophe Nebel, Malgorzata Kotulska.   

Abstract

BACKGROUND: Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium- and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited.
RESULTS: In this work, we present a probabilistic grammatical framework for problem-specific protein languages and apply it to classification of transmembrane helix-helix pairs configurations. The core of the model consists of a probabilistic context-free grammar, automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix-helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix-helix contact sites.
CONCLUSIONS: We demonstrated that our probabilistic context-free framework for analysis of protein sequences outperforms the state of the art in the task of helix-helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human-readable. Thus they could provide biologically meaningful information for molecular biologists.

Entities:  

Year:  2013        PMID: 24350601      PMCID: PMC3892132          DOI: 10.1186/1748-7188-8-31

Source DB:  PubMed          Journal:  Algorithms Mol Biol        ISSN: 1748-7188            Impact factor:   1.405


  95 in total

1.  Prediction of contact maps with neural networks and correlated mutations.

Authors:  P Fariselli; O Olmea; A Valencia; R Casadio
Journal:  Protein Eng       Date:  2001-11

2.  A sequence and structural study of transmembrane helices.

Authors:  R P Bywater; D Thomas; G Vriend
Journal:  J Comput Aided Mol Des       Date:  2001-06       Impact factor: 3.686

Review 3.  The language of genes.

Authors:  David B Searls
Journal:  Nature       Date:  2002-11-14       Impact factor: 49.962

4.  PROSITE: a documented database using patterns and profiles as motif descriptors.

Authors:  Christian J A Sigrist; Lorenzo Cerutti; Nicolas Hulo; Alexandre Gattiker; Laurent Falquet; Marco Pagni; Amos Bairoch; Philipp Bucher
Journal:  Brief Bioinform       Date:  2002-09       Impact factor: 11.622

5.  Designing antimicrobial peptides with weighted finite-state transducers.

Authors:  Christopher Whelan; Brian Roark; Kemal Sönmez
Journal:  Annu Int Conf IEEE Eng Med Biol Soc       Date:  2010

6.  Assessment of CASP7 structure predictions for template free targets.

Authors:  Ralf Jauch; Hock Chuan Yeo; Prasanna R Kolatkar; Neil D Clarke
Journal:  Proteins       Date:  2007

7.  Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints.

Authors:  Seung Yup Lee; Jeffrey Skolnick
Journal:  Biophys J       Date:  2008-05-16       Impact factor: 4.033

8.  AAindex: Amino Acid Index Database.

Authors:  S Kawashima; H Ogata; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

9.  IgTM: an algorithm to predict transmembrane domains and topology in proteins.

Authors:  Piedachu Peris; Damián López; Marcelino Campos
Journal:  BMC Bioinformatics       Date:  2008-09-10       Impact factor: 3.169

10.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

View more
  1 in total

1.  Prediction of multi-drug resistance transporters using a novel sequence analysis method.

Authors:  Jason E McDermott; Paul Bruillard; Christopher C Overall; Luke Gosink; Stephen R Lindemann
Journal:  F1000Res       Date:  2015-03-09
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.