Literature DB >> 18852096

Phylogenetic mixture models for proteins.

Si Quang Le1, Nicolas Lartillot, Olivier Gascuel.   

Abstract

Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution.We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TREEBASE.We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TREEBASE test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18852096      PMCID: PMC2607422          DOI: 10.1098/rstb.2008.0180

Source DB:  PubMed          Journal:  Philos Trans R Soc Lond B Biol Sci        ISSN: 0962-8436            Impact factor:   6.237


  32 in total

1.  Models of natural mutations including site heterogeneity.

Authors:  J M Koshi; R A Goldstein
Journal:  Proteins       Date:  1998-08-15

2.  PASSML: combining evolutionary inference and protein secondary structure prediction.

Authors:  P Liò; N Goldman; J L Thorne; D T Jones3
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

3.  Modeling the covarion hypothesis of nucleotide substitution.

Authors:  C Tuffley; M Steel
Journal:  Math Biosci       Date:  1998-01-01       Impact factor: 2.144

4.  Combining protein evolution and secondary structure.

Authors:  J L Thorne; N Goldman; D T Jones
Journal:  Mol Biol Evol       Date:  1996-05       Impact factor: 16.240

5.  Assessing the impact of secondary structure and solvent accessibility on protein evolution.

Authors:  N Goldman; J L Thorne; D T Jones
Journal:  Genetics       Date:  1998-05       Impact factor: 4.562

6.  Modeling residue usage in aligned protein sequences via maximum likelihood.

Authors:  W J Bruno
Journal:  Mol Biol Evol       Date:  1996-12       Impact factor: 16.240

7.  The HSSP database of protein structure-sequence alignments.

Authors:  R Schneider; A de Daruvar; C Sander
Journal:  Nucleic Acids Res       Date:  1997-01-01       Impact factor: 16.971

8.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

Authors:  O Gascuel
Journal:  Mol Biol Evol       Date:  1997-07       Impact factor: 16.240

9.  A Hidden Markov Model approach to variation among sites in rate of evolution.

Authors:  J Felsenstein; G A Churchill
Journal:  Mol Biol Evol       Date:  1996-01       Impact factor: 16.240

10.  A mutation data matrix for transmembrane proteins.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  FEBS Lett       Date:  1994-02-21       Impact factor: 4.124

View more
  77 in total

1.  Widespread horizontal gene transfer from double-stranded RNA viruses to eukaryotic nuclear genomes.

Authors:  Huiquan Liu; Yanping Fu; Daohong Jiang; Guoqing Li; Jiatao Xie; Jiasen Cheng; Youliang Peng; Said A Ghabrial; Xianhong Yi
Journal:  J Virol       Date:  2010-09-01       Impact factor: 5.103

Review 2.  Probabilistic models of eukaryotic evolution: time for integration.

Authors:  Nicolas Lartillot
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-09-26       Impact factor: 6.237

3.  Mutational effects on stability are largely conserved during protein evolution.

Authors:  Orr Ashenberg; L Ian Gong; Jesse D Bloom
Journal:  Proc Natl Acad Sci U S A       Date:  2013-12-09       Impact factor: 11.205

4.  Transcriptional response of bathypelagic marine bacterioplankton to the Deepwater Horizon oil spill.

Authors:  Adam R Rivers; Shalabh Sharma; Susannah G Tringe; Jeffrey Martin; Samantha B Joye; Mary Ann Moran
Journal:  ISME J       Date:  2013-08-01       Impact factor: 10.302

5.  Evolution of cyclic amidohydrolases: a highly diversified superfamily.

Authors:  Matthieu Barba; Nicolas Glansdorff; Bernard Labedan
Journal:  J Mol Evol       Date:  2013-08-27       Impact factor: 2.395

6.  Phylogeny of gammaproteobacteria.

Authors:  Kelly P Williams; Joseph J Gillespie; Bruno W S Sobral; Eric K Nordberg; Eric E Snyder; Joshua M Shallom; Allan W Dickerman
Journal:  J Bacteriol       Date:  2010-03-05       Impact factor: 3.490

7.  Increased viral read counts and metagenomic full genome characterization of porcine astrovirus 4 and Posavirus 1 in sows in a swine farm with unexplained neonatal piglet diarrhea.

Authors:  Steven Van Borm; Kevin Vanneste; Qiang Fu; Dominiek Maes; Alexandra Schoos; Eline Vallaey; Frank Vandenbussche
Journal:  Virus Genes       Date:  2020-09-03       Impact factor: 2.332

8.  Widespread endogenization of densoviruses and parvoviruses in animal and human genomes.

Authors:  Huiquan Liu; Yanping Fu; Jiatao Xie; Jiasen Cheng; Said A Ghabrial; Guoqing Li; Youliang Peng; Xianhong Yi; Daohong Jiang
Journal:  J Virol       Date:  2011-07-27       Impact factor: 5.103

9.  Characterization of Function of the GlgA2 Glycogen/Starch Synthase in Cyanobacterium sp. Clg1 Highlights Convergent Evolution of Glycogen Metabolism into Starch Granule Aggregation.

Authors:  Derifa Kadouche; Mathieu Ducatez; Ugo Cenci; Catherine Tirtiaux; Eiji Suzuki; Yasunori Nakamura; Jean-Luc Putaux; Amandine Durand Terrasson; Sandra Diaz-Troya; Francisco Javier Florencio; Maria Cecilia Arias; Alexander Striebeck; Monica Palcic; Steven G Ball; Christophe Colleoni
Journal:  Plant Physiol       Date:  2016-05-19       Impact factor: 8.340

10.  Functional Conservation and Divergence of daf-22 Paralogs in Pristionchus pacificus Dauer Development.

Authors:  Gabriel V Markov; Jan M Meyer; Oishika Panda; Alexander B Artyukhin; Marc Claaßen; Hanh Witte; Frank C Schroeder; Ralf J Sommer
Journal:  Mol Biol Evol       Date:  2016-04-28       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.