Literature DB >> 24532780

A penalized-likelihood method to estimate the distribution of selection coefficients from phylogenetic data.

Asif U Tamuri1, Nick Goldman, Mario dos Reis.   

Abstract

We develop a maximum penalized-likelihood (MPL) method to estimate the fitnesses of amino acids and the distribution of selection coefficients (S = 2Ns) in protein-coding genes from phylogenetic data. This improves on a previous maximum-likelihood method. Various penalty functions are used to penalize extreme estimates of the fitnesses, thus correcting overfitting by the previous method. Using a combination of computer simulation and real data analysis, we evaluate the effect of the various penalties on the estimation of the fitnesses and the distribution of S. We show the new method regularizes the estimates of the fitnesses for small, relatively uninformative data sets, but it can still recover the large proportion of deleterious mutations when present in simulated data. Computer simulations indicate that as the number of taxa in the phylogeny or the level of sequence divergence increases, the distribution of S can be more accurately estimated. Furthermore, the strength of the penalty can be varied to study how informative a particular data set is about the distribution of S. We analyze three protein-coding genes (the chloroplast rubisco protein, mammal mitochondrial proteins, and an influenza virus polymerase) and show the new method recovers a large proportion of deleterious mutations in these data, even under strong penalties, confirming the distribution of S is bimodal in these real data. We recommend the use of the new MPL approach for the estimation of the distribution of S in species phylogenies of protein-coding genes.

Entities:  

Keywords:  chloroplast; fitness effects; influenza; mitochondria; penalized likelihood; selection coefficient

Mesh:

Year:  2014        PMID: 24532780      PMCID: PMC4012484          DOI: 10.1534/genetics.114.162263

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  35 in total

1.  Assessing an unknown evolutionary process: effect of increasing site-specific knowledge through taxon addition.

Authors:  D D Pollock; W J Bruno
Journal:  Mol Biol Evol       Date:  2000-12       Impact factor: 16.240

2.  Analysis of catalytic residues in enzyme active sites.

Authors:  Gail J Bartlett; Craig T Porter; Neera Borkakoti; Janet M Thornton
Journal:  J Mol Biol       Date:  2002-11-15       Impact factor: 5.469

Review 3.  Mutational fitness effects in RNA and single-stranded DNA viruses: common patterns revealed by site-directed mutagenesis studies.

Authors:  Rafael Sanjuán
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-06-27       Impact factor: 6.237

4.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds.

Authors:  Ziheng Yang; Bruce Rannala
Journal:  Mol Biol Evol       Date:  2005-09-21       Impact factor: 16.240

5.  Population genetics without intraspecific data.

Authors:  Jeffrey L Thorne; Sang Chul Choi; Jiaye Yu; Paul G Higgs; Hirohisa Kishino
Journal:  Mol Biol Evol       Date:  2007-04-29       Impact factor: 16.240

6.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

7.  Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage.

Authors:  Ziheng Yang; Rasmus Nielsen
Journal:  Mol Biol Evol       Date:  2008-01-03       Impact factor: 16.240

8.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.

Authors:  K Sjölander; K Karplus; M Brown; R Hughey; A Krogh; I S Mian; D Haussler
Journal:  Comput Appl Biosci       Date:  1996-08

9.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution.

Authors:  D D Pollock; W R Taylor
Journal:  Protein Eng       Date:  1997-06

10.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Authors:  M Hasegawa; H Kishino; T Yano
Journal:  J Mol Evol       Date:  1985       Impact factor: 2.395

View more
  26 in total

1.  How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher-Wright mutation-selection framework.

Authors:  Mario Dos Reis
Journal:  Biol Lett       Date:  2015-04       Impact factor: 3.703

2.  The relationship between dN/dS and scaled selection coefficients.

Authors:  Stephanie J Spielman; Claus O Wilke
Journal:  Mol Biol Evol       Date:  2015-01-08       Impact factor: 16.240

3.  A new parameter-rich structure-aware mechanistic model for amino acid substitution during evolution.

Authors:  Peter B Chi; Dohyup Kim; Jason K Lai; Nadia Bykova; Claudia C Weber; Jan Kubelka; David A Liberles
Journal:  Proteins       Date:  2017-12-12

4.  Extensively Parameterized Mutation-Selection Models Reliably Capture Site-Specific Selective Constraint.

Authors:  Stephanie J Spielman; Claus O Wilke
Journal:  Mol Biol Evol       Date:  2016-08-10       Impact factor: 16.240

5.  Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations.

Authors:  Aaron P Ragsdale; Alec J Coffman; PingHsun Hsieh; Travis J Struck; Ryan N Gutenkunst
Journal:  Genetics       Date:  2016-03-30       Impact factor: 4.562

6.  Site-Specific Amino Acid Distributions Follow a Universal Shape.

Authors:  Mackenzie M Johnson; Claus O Wilke
Journal:  J Mol Evol       Date:  2020-11-24       Impact factor: 2.395

7.  A phylogenetic approach for weighting genetic sequences.

Authors:  Nicola De Maio; Alexander V Alekseyenko; William J Coleman-Smith; Fabio Pardi; Marc A Suchard; Asif U Tamuri; Jakub Truszkowski; Nick Goldman
Journal:  BMC Bioinformatics       Date:  2021-05-28       Impact factor: 3.169

8.  Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.

Authors:  Stephanie J Spielman; Claus O Wilke
Journal:  PLoS One       Date:  2015-09-23       Impact factor: 3.240

9.  An experimentally determined evolutionary model dramatically improves phylogenetic fit.

Authors:  Jesse D Bloom
Journal:  Mol Biol Evol       Date:  2014-05-24       Impact factor: 16.240

10.  Sequence entropy of folding and the absolute rate of amino acid substitutions.

Authors:  Richard A Goldstein; David D Pollock
Journal:  Nat Ecol Evol       Date:  2017-10-23       Impact factor: 15.460

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.