Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

Literature DB >> 17028865

Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

Abstract

Identification of structural domains in uncharacterized protein sequences is important in the prediction of protein tertiary folds and functional sites, and hence in designing biologically active molecules. We present a new predictive computational method of classifying a protein into single, two continuous or two discontinuous domains using Bayesian Data Mining. The algorithm requires only the primary sequence and computer-predicted secondary structure. It incorporates correlation patterns between certain 3-dimensional motifs and some local helical folds found conserved in the vicinity of protein domains with high statistical confidence. The prediction of domain-class by this computationally simple and fast method shows good accuracy of prediction-average accuracies 83.3% for single domain, 60% for two continuous and 65.7% for two discontinuous domain proteins. Experiments on the large validation sample show its performance to be significantly better than that of DGS and DomSSEA. Computations of Bayesian probabilities show important features in terms of correlation of certain conserved patterns of secondary folds and tertiary motifs and give new insight. Applications for improved accuracy of predicting domain boundary points relevant to protein structural and functional modeling are also highlighted.

Entities: Disease

Mesh：

Year: 2006 PMID： 17028865 DOI： 10.1007/s00894-006-0141-z

Source DB: PubMed Journal: J Mol Model ISSN： 0948-5023 Impact factor: 1.810

17 in total

Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

1. The Protein Data Bank.

2. The PSIPRED protein structure prediction server.

3. SnapDRAGON: a method to delineate protein structural domains from sequence data.

4. Universal similarity measure for comparing protein structures.

5. DomCut: prediction of inter-domain linker regions in amino acid sequences.

6. Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

7. Structure prediction of a multi-domain EF-hand Ca2+ binding protein by PROPAINOR.

8. PROMOTIF--a program to identify and analyze structural motifs in proteins.

9. A new approach to clustering the amino acids.

10. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

1. Quantitative characterization of protein tertiary motifs.