Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Redundancy-weighting for better inference of protein structural features.

Literature DB >> 24771517

Redundancy-weighting for better inference of protein structural features.

Chen Yanover¹, Natalia Vanetik¹, Michael Levitt¹, Rachel Kolodny¹, Chen Keasar¹.

Abstract

MOTIVATION: Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families.
RESULTS: In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology.

Mesh：

Substances：
Amino Acids
Proteins

Year: 2014 PMID： 24771517 PMCID： PMC4192046 DOI： 10.1093/bioinformatics/btu242

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

33 in total

1. The PSIPRED protein structure prediction server.

Authors: L J McGuffin; K Bryson; D T Jones
Journal: Bioinformatics Date: 2000-04 Impact factor: 6.937

2. PISCES: a protein sequence culling server.

Authors: Guoli Wang; Roland L Dunbrack
Journal: Bioinformatics Date: 2003-08-12 Impact factor: 6.937

3. Assessment of protein models with three-dimensional profiles.

Authors: R Lüthy; J U Bowie; D Eisenberg
Journal: Nature Date: 1992-03-05 Impact factor: 49.962

4. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins.

Authors: S Tanaka; H A Scheraga
Journal: Macromolecules Date: 1976 Nov-Dec Impact factor: 5.985

5. The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors: F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal: J Mol Biol Date: 1977-05-25 Impact factor: 5.469

6. Improved tools for biological sequence comparison.

Authors: W R Pearson; D J Lipman
Journal: Proc Natl Acad Sci U S A Date: 1988-04 Impact factor: 11.205

7. Prediction of protein conformation.

Authors: P Y Chou; G D Fasman
Journal: Biochemistry Date: 1974-01-15 Impact factor: 3.162

8. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

9. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors: W Kabsch; C Sander
Journal: Biopolymers Date: 1983-12 Impact factor: 2.505

10. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

Authors: J Garnier; D J Osguthorpe; B Robson
Journal: J Mol Biol Date: 1978-03-25 Impact factor: 5.469

4 in total

1. Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight.

Authors: Leslie Regad; Jean-Baptiste Chéron; Dhoha Triki; Caroline Senac; Delphine Flatters; Anne-Claude Camproux
Journal: PLoS One Date: 2017-08-17 Impact factor: 3.240