Literature DB >> 17028865

Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

Rajani R Joshi1, Vivekanand V Samant.   

Abstract

Identification of structural domains in uncharacterized protein sequences is important in the prediction of protein tertiary folds and functional sites, and hence in designing biologically active molecules. We present a new predictive computational method of classifying a protein into single, two continuous or two discontinuous domains using Bayesian Data Mining. The algorithm requires only the primary sequence and computer-predicted secondary structure. It incorporates correlation patterns between certain 3-dimensional motifs and some local helical folds found conserved in the vicinity of protein domains with high statistical confidence. The prediction of domain-class by this computationally simple and fast method shows good accuracy of prediction-average accuracies 83.3% for single domain, 60% for two continuous and 65.7% for two discontinuous domain proteins. Experiments on the large validation sample show its performance to be significantly better than that of DGS and DomSSEA. Computations of Bayesian probabilities show important features in terms of correlation of certain conserved patterns of secondary folds and tertiary motifs and give new insight. Applications for improved accuracy of predicting domain boundary points relevant to protein structural and functional modeling are also highlighted.

Entities:  

Mesh:

Year:  2006        PMID: 17028865     DOI: 10.1007/s00894-006-0141-z

Source DB:  PubMed          Journal:  J Mol Model        ISSN: 0948-5023            Impact factor:   1.810


  17 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The PSIPRED protein structure prediction server.

Authors:  L J McGuffin; K Bryson; D T Jones
Journal:  Bioinformatics       Date:  2000-04       Impact factor: 6.937

3.  SnapDRAGON: a method to delineate protein structural domains from sequence data.

Authors:  Richard A George; Jaap Heringa
Journal:  J Mol Biol       Date:  2002-02-22       Impact factor: 5.469

4.  Universal similarity measure for comparing protein structures.

Authors:  M R Betancourt; J Skolnick
Journal:  Biopolymers       Date:  2001-10-15       Impact factor: 2.505

5.  DomCut: prediction of inter-domain linker regions in amino acid sequences.

Authors:  Mikita Suyama; Osamu Ohara
Journal:  Bioinformatics       Date:  2003-03-22       Impact factor: 6.937

6.  Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

Authors:  Russell L Marsden; Liam J McGuffin; David T Jones
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

7.  Structure prediction of a multi-domain EF-hand Ca2+ binding protein by PROPAINOR.

Authors:  Subramanian Jyothi; Sourajit M Mustafi; Kandala V R Chary; Rajani R Joshi
Journal:  J Mol Model       Date:  2005-08-11       Impact factor: 1.810

8.  PROMOTIF--a program to identify and analyze structural motifs in proteins.

Authors:  E G Hutchinson; J M Thornton
Journal:  Protein Sci       Date:  1996-02       Impact factor: 6.725

9.  A new approach to clustering the amino acids.

Authors:  L E Stanfel
Journal:  J Theor Biol       Date:  1996-11-21       Impact factor: 2.691

10.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

View more
  1 in total

1.  Quantitative characterization of protein tertiary motifs.

Authors:  Rajani R Joshi; S Sreenath
Journal:  J Mol Model       Date:  2014-01-26       Impact factor: 1.810

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.