Literature DB >> 29045866

How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

Pengfei Tian1, Robert B Best2.   

Abstract

Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

Mesh:

Substances:

Year:  2017        PMID: 29045866      PMCID: PMC5647607          DOI: 10.1016/j.bpj.2017.08.039

Source DB:  PubMed          Journal:  Biophys J        ISSN: 0006-3495            Impact factor:   4.033


  82 in total

1.  Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space.

Authors:  E Bornberg-Bauer; H S Chan
Journal:  Proc Natl Acad Sci U S A       Date:  1999-09-14       Impact factor: 11.205

2.  High-resolution structure of the OmpA membrane domain.

Authors:  A Pautsch; G E Schulz
Journal:  J Mol Biol       Date:  2000-04-28       Impact factor: 5.469

3.  Native protein sequences are close to optimal for their structures.

Authors:  B Kuhlman; D Baker
Journal:  Proc Natl Acad Sci U S A       Date:  2000-09-12       Impact factor: 11.205

4.  Designability of protein structures: a lattice-model study using the Miyazawa-Jernigan matrix.

Authors:  Hao Li; Chao Tang; Ned S Wingreen
Journal:  Proteins       Date:  2002-11-15

5.  Structural and energetic consequences of disruptive mutations in a protein core.

Authors:  W A Lim; D C Farruggio; R T Sauer
Journal:  Biochemistry       Date:  1992-05-05       Impact factor: 3.162

6.  Inverse protein folding problem: designing polymer sequences.

Authors:  K Yue; K A Dill
Journal:  Proc Natl Acad Sci U S A       Date:  1992-05-01       Impact factor: 11.205

7.  Chemical, physical, and theoretical kinetics of an ultrafast folding protein.

Authors:  Jan Kubelka; Eric R Henry; Troy Cellmer; James Hofrichter; William A Eaton
Journal:  Proc Natl Acad Sci U S A       Date:  2008-11-25       Impact factor: 11.205

8.  Intestinal fatty acid binding protein: a specific residue in one turn appears to stabilize the native structure and be responsible for slow refolding.

Authors:  K Kim; R Ramanathan; C Frieden
Journal:  Protein Sci       Date:  1997-02       Impact factor: 6.725

9.  Why are some proteins structures so common?

Authors:  S Govindarajan; R A Goldstein
Journal:  Proc Natl Acad Sci U S A       Date:  1996-04-16       Impact factor: 11.205

10.  Structure, dynamics and binding characteristics of the second PDZ domain of PTP-BL.

Authors:  Tine Walma; Christian A E M Spronk; Marco Tessari; Jan Aelen; Jan Schepens; Wiljan Hendriks; Geerten W Vuister
Journal:  J Mol Biol       Date:  2002-03-08       Impact factor: 5.469

View more
  10 in total

1.  Co-Evolutionary Fitness Landscapes for Sequence Design.

Authors:  Pengfei Tian; John M Louis; James L Baber; Annie Aniana; Robert B Best
Journal:  Angew Chem Int Ed Engl       Date:  2018-03-25       Impact factor: 15.336

2.  Size and structure of the sequence space of repeat proteins.

Authors:  Jacopo Marchi; Ezequiel A Galpern; Rocio Espada; Diego U Ferreiro; Aleksandra M Walczak; Thierry Mora
Journal:  PLoS Comput Biol       Date:  2019-08-15       Impact factor: 4.475

3.  Rosetta design with co-evolutionary information retains protein function.

Authors:  Samuel Schmitz; Moritz Ertelt; Rainer Merkl; Jens Meiler
Journal:  PLoS Comput Biol       Date:  2021-01-19       Impact factor: 4.475

4.  Biological factors in the synthetic construction of overlapping genes.

Authors:  Stefan Wichmann; Siegfried Scherer; Zachary Ardern
Journal:  BMC Genomics       Date:  2021-12-11       Impact factor: 3.969

5.  Insertions and deletions in the RNA sequence-structure map.

Authors:  Nora S Martin; Sebastian E Ahnert
Journal:  J R Soc Interface       Date:  2021-10-06       Impact factor: 4.118

6.  Allosteric Inter-Domain Contacts in Bacterial Hsp70 Are Located in Regions That Avoid Insertion and Deletion Events.

Authors:  Michal Gala; Peter Pristaš; Gabriel Žoldák
Journal:  Int J Mol Sci       Date:  2022-03-03       Impact factor: 5.923

7.  Identification of novel functional mini-receptors by combinatorial screening of split-WW domains.

Authors:  Hermann Neitz; Niels Benjamin Paul; Florian R Häge; Christina Lindner; Roman Graebner; Michael Kovermann; Franziska Thomas
Journal:  Chem Sci       Date:  2022-07-14       Impact factor: 9.969

8.  Design of metalloproteins and novel protein folds using variational autoencoders.

Authors:  Joe G Greener; Lewis Moffat; David T Jones
Journal:  Sci Rep       Date:  2018-11-01       Impact factor: 4.379

9.  Exploring the sequence fitness landscape of a bridge between protein folds.

Authors:  Pengfei Tian; Robert B Best
Journal:  PLoS Comput Biol       Date:  2020-10-13       Impact factor: 4.475

10.  Efficient generative modeling of protein sequences using simple autoregressive models.

Authors:  Jeanne Trinquier; Guido Uguzzoni; Andrea Pagnani; Francesco Zamponi; Martin Weigt
Journal:  Nat Commun       Date:  2021-10-04       Impact factor: 14.919

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.