Literature DB >> 24771517

Redundancy-weighting for better inference of protein structural features.

Chen Yanover1, Natalia Vanetik1, Michael Levitt1, Rachel Kolodny1, Chen Keasar1.   

Abstract

MOTIVATION: Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families.
RESULTS: In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2014        PMID: 24771517      PMCID: PMC4192046          DOI: 10.1093/bioinformatics/btu242

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  33 in total

1.  The PSIPRED protein structure prediction server.

Authors:  L J McGuffin; K Bryson; D T Jones
Journal:  Bioinformatics       Date:  2000-04       Impact factor: 6.937

2.  PISCES: a protein sequence culling server.

Authors:  Guoli Wang; Roland L Dunbrack
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

3.  Assessment of protein models with three-dimensional profiles.

Authors:  R Lüthy; J U Bowie; D Eisenberg
Journal:  Nature       Date:  1992-03-05       Impact factor: 49.962

4.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins.

Authors:  S Tanaka; H A Scheraga
Journal:  Macromolecules       Date:  1976 Nov-Dec       Impact factor: 5.985

5.  The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors:  F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal:  J Mol Biol       Date:  1977-05-25       Impact factor: 5.469

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  Prediction of protein conformation.

Authors:  P Y Chou; G D Fasman
Journal:  Biochemistry       Date:  1974-01-15       Impact factor: 3.162

8.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

9.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

10.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

Authors:  J Garnier; D J Osguthorpe; B Robson
Journal:  J Mol Biol       Date:  1978-03-25       Impact factor: 5.469

View more
  4 in total

1.  Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight.

Authors:  Leslie Regad; Jean-Baptiste Chéron; Dhoha Triki; Caroline Senac; Delphine Flatters; Anne-Claude Camproux
Journal:  PLoS One       Date:  2017-08-17       Impact factor: 3.240

2.  Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths.

Authors:  Sergey Nepomnyachiy; Nir Ben-Tal; Rachel Kolodny
Journal:  Proc Natl Acad Sci U S A       Date:  2017-10-19       Impact factor: 11.205

3.  On the evolution of protein-adenine binding.

Authors:  Aya Narunsky; Amit Kessel; Ron Solan; Vikram Alva; Rachel Kolodny; Nir Ben-Tal
Journal:  Proc Natl Acad Sci U S A       Date:  2020-02-20       Impact factor: 11.205

4.  Estimation of model accuracy by a unique set of features and tree-based regressor.

Authors:  Mor Bitton; Chen Keasar
Journal:  Sci Rep       Date:  2022-08-18       Impact factor: 4.996

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.