Literature DB >> 25837579

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.

Miguel Arenas1, Agustin Sánchez-Cobos1, Ugo Bastolla2.   

Abstract

Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Keywords:  folding stability; maximum-likelihood estimate; misfolded state; structurally constrained substitution models

Mesh:

Substances:

Year:  2015        PMID: 25837579      PMCID: PMC4833071          DOI: 10.1093/molbev/msv085

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  54 in total

1.  Analyzing site heterogeneity during protein evolution.

Authors:  J M Koshi; R A Goldstein
Journal:  Pac Symp Biocomput       Date:  2001

2.  Why are proteins marginally stable?

Authors:  Darin M Taverna; Richard A Goldstein
Journal:  Proteins       Date:  2002-01-01

3.  Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations.

Authors:  María Silvina Fornasari; Gustavo Parisi; Julian Echave
Journal:  Mol Biol Evol       Date:  2002-03       Impact factor: 16.240

4.  Understanding hierarchical protein evolution from first principles.

Authors:  N V Dokholyan; E I Shakhnovich
Journal:  J Mol Biol       Date:  2001-09-07       Impact factor: 5.469

5.  How to guarantee optimal stability for most representative structures in the Protein Data Bank.

Authors:  U Bastolla; J Farwer; E W Knapp; M Vendruscolo
Journal:  Proteins       Date:  2001-08-01

Review 6.  Molecular phylogenetics: state-of-the-art methods for looking into the past.

Authors:  S Whelan; P Liò; N Goldman
Journal:  Trends Genet       Date:  2001-05       Impact factor: 11.639

7.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

8.  Structural constraints and emergence of sequence patterns in protein evolution.

Authors:  G Parisi; J Echave
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

Review 9.  Understanding protein folding with energy landscape theory. Part II: Quantitative aspects.

Authors:  Steven S Plotkin; José N Onuchic
Journal:  Q Rev Biophys       Date:  2002-08       Impact factor: 5.318

10.  Protein evolution with dependence among codons due to tertiary structure.

Authors:  Douglas M Robinson; David T Jones; Hirohisa Kishino; Nick Goldman; Jeffrey L Thorne
Journal:  Mol Biol Evol       Date:  2003-07-28       Impact factor: 16.240

View more
  16 in total

1.  Site-Specific Amino Acid Distributions Follow a Universal Shape.

Authors:  Mackenzie M Johnson; Claus O Wilke
Journal:  J Mol Evol       Date:  2020-11-24       Impact factor: 2.395

2.  Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation.

Authors:  Qian Jiang; Ashley I Teufel; Eleisha L Jackson; Claus O Wilke
Journal:  Genetics       Date:  2018-01-30       Impact factor: 4.562

Review 3.  Methodologies for Microbial Ancestral Sequence Reconstruction.

Authors:  Miguel Arenas
Journal:  Methods Mol Biol       Date:  2022

Review 4.  Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence.

Authors:  Julian Echave; Claus O Wilke
Journal:  Annu Rev Biophys       Date:  2017-03-15       Impact factor: 12.981

5.  Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus.

Authors:  Ignacio de la Higuera; Cristina Ferrer-Orta; Ana I de Ávila; Celia Perales; Macarena Sierra; Kamalendra Singh; Stefan G Sarafianos; Yves Dehouck; Ugo Bastolla; Nuria Verdaguer; Esteban Domingo
Journal:  Genome Biol Evol       Date:  2017-05-01       Impact factor: 3.416

6.  Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models.

Authors:  Jesse D Bloom
Journal:  Biol Direct       Date:  2017-01-17       Impact factor: 4.540

7.  Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution.

Authors:  María José Jiménez-Santos; Miguel Arenas; Ugo Bastolla
Journal:  PeerJ       Date:  2018-10-05       Impact factor: 2.984

8.  Trends in substitution models of molecular evolution.

Authors:  Miguel Arenas
Journal:  Front Genet       Date:  2015-10-26       Impact factor: 4.599

Review 9.  Using the Mutation-Selection Framework to Characterize Selection on Protein Sequences.

Authors:  Ashley I Teufel; Andrew M Ritchie; Claus O Wilke; David A Liberles
Journal:  Genes (Basel)       Date:  2018-08-13       Impact factor: 4.096

10.  Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics.

Authors:  Qiqing Tao; Jose Barba-Montoya; Louise A Huuki; Mary Kathleen Durnan; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2020-06-01       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.