Literature DB >> 21169378

phangorn: phylogenetic analysis in R.

Klaus Peter Schliep1.   

Abstract

SUMMARY: phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. AVAILABILITY: phangorn can be obtained through the CRAN homepage http://cran.r-project.org/web/packages/phangorn/index.html. phangorn is licensed under GPL 2.

Entities:  

Mesh:

Year:  2010        PMID: 21169378      PMCID: PMC3035803          DOI: 10.1093/bioinformatics/btq706

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

With more than 20 packages devoted to phylogenetics, the R software (R Development Core Team, 2009) has become a standard in phylogenetic analysis (see http://cran.r-project.org/web/views/Phylogenetics.html for an overview). However so far it was only possible to estimate phylogenetic trees with distance methods in R. The phangorn package permits to estimate maximum likelihood (ML) and maximum parsimony (MP) trees. Besides reconstructing phylogenies, the package also focuses on assessing the congruence of different trees.

2 METHODS

The phangorn package interacts with several other R-packages, especially with the ape package (Paradis ). From ape, phangorn inherits the tree format (class phylo which has become a standard), which allows use of the excellent plotting facilities within ape. phangorn defines its own data format to store character sequences, but offers functions to convert between formats from other packages (ape and seqinr) or with common data structures (data.frame and matrix) in R. The data format is kept very general allowing nucleotides (DNA, RNA), amino acids and general character states defined by the user. For example, it is easy to define a format for nucleotide data where gaps are coded as a fifth state or for binary data. All the different ML and MP functions described below can handle these general character states. MP is an optimality criterion for which the preferred tree is the tree that requires the least changes to explain some data. In phangorn, the Fitch and Sankoff algorithms are available to compute the parsimony score. For heuristic tree searches the parsimony ratchet (Nixon, 1999) is implemented. Indices based on parsimony like the consistency and retention indices and the inference of ancestral sequences are also provided. The ML function pml returns an object of class pml containing all the information about the model, the tree and data. The function optim.pml allows to optimize the tree topology, the edge lengths as well as all model parameters (e.g. rate matrices or base frequencies). The speed and accuracy of phylogenetic reconstruction by ML are comparable to PhyML (Guindon and Gascuel, 2003) using nearest neighbor interchange (NNI) rearrangements (see Supplementary Materials). As the results are stored in memory it is possible to further investigate, plot or summarize these objects. The following lines compute and display (Fig. 1) a phylogenetic tree based on the data of Rokas using a GTR + Γ(4) + I model (Kelchner and Thomas, 2007):
Fig. 1.

phylogenetic tree with bootstrap support on the edges for the data of Rokas .

phylogenetic tree with bootstrap support on the edges for the data of Rokas . data(yeast) tree <- NJ(dist.logDet(yeast)) fit <- pml(tree, yeast, k=4, inv= .2) fit <- optim.pml(fit, optNni=TRUE, optGamma=TRUE, optInv=TRUE, model=“GTR”) BS <- bootstrap.pml (fit, optNni=TRUE) plotBS(fit$tree, BS, type = “phylogram”) For nucleotide data all models implemented in ModelTest (Posada, 2008) are available (e.g. “JC” or “GTR”). Moreover any reversible model can be specified by the user for different character states. For amino acids, the main common rate matrices are provided, e.g. WAG (Whelan and Goldman, 2001) or LG (Le and Gascuel, 2008). Additionally rate matrices can also be estimated. For instance Mathews used the function optim.pml to infer a phytochrome amino acid transition matrix. There are several methods implemented to compare different ML models with for example likelihood ratio-tests, AIC or BIC as in ModelTest or the SH-test (Shimodaira and Hasegawa, 1999). As phangorn is implemented in the high-level language R it is easy to extend the general ML framework. phangorn also contains mixture models (Pagel and Meade, 2004) and partition models. The function pmlPart allows estimation of partitioned ML models and has a flexible yet simple formula interface. For example, the command pmlPart(edge + Q ∼ rate + bf, fit) specifies which parameters are optimized in each partition individually (here the rate parameter and the base frequencies) or for all partitions together (the edge weights of the tree and rate matrix Q). phangorn eases the analysis of splits. For instance, the Hadamard conjugation (Hendy, 2005) is a helpful tool to analyze relations between observed sequence patterns (spectra) and edge weights. The edge weight spectra can be constructed from DNA or binary data or from a distance matrix. These spectra can be visualized using a Lento plot (Lento ) to present the supporting and conflicting signals for the splits of a dataset (Fig. 2). Splits can easily be exported to SpectroNet (Huber ) or Splitsgraph (Huson and Bryant, 2006) and visualized as a network.
Fig. 2.

Lento plot of the edge weights from sequence spectrum for the data of Rokas . On the x-axis the splits or edges are represented by the dots overlying the graph. The bars above the axis indicate the edge weights or the support of a split, bars below represent the conflict with this split, i.e. the sum of the edge weights which are incompatible with this split.

Lento plot of the edge weights from sequence spectrum for the data of Rokas . On the x-axis the splits or edges are represented by the dots overlying the graph. The bars above the axis indicate the edge weights or the support of a split, bars below represent the conflict with this split, i.e. the sum of the edge weights which are incompatible with this split. phangorn is distributed with two tutorials. The first explains how to perform phylogenetic analysis (in R type vignette(“Trees”)) and the second vignette(“phangorn-specials”) shows how to define data with general character states and to estimate rate matrices for those states. phangorn depends only on other R packages which are also available from the CRAN repository and is portable to run on different operating systems. Since phangorn is written in R, results can be easily extended and further processed using the graphical and statistical capabilities of R.

3 CONCLUSION

phangorn offers a wide range of methods to reconstruct phylogenies, to compare phylogenetic trees, to test different phylogenetic models and perform split analysis to evaluate conflicting phylogenetic signal. Moreover the phangorn package provides a flexible framework for prototyping new phylogenetic methods.
  12 in total

1.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

2.  Genome-scale approaches to resolving incongruence in molecular phylogenies.

Authors:  Antonis Rokas; Barry L Williams; Nicole King; Sean B Carroll
Journal:  Nature       Date:  2003-10-23       Impact factor: 49.962

3.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Authors:  Stéphane Guindon; Olivier Gascuel
Journal:  Syst Biol       Date:  2003-10       Impact factor: 15.683

4.  Spectronet: a package for computing spectra and median networks.

Authors:  Katharina T Huber; Michael Langton; David Penny; Vincent Moulton; Michael Hendy
Journal:  Appl Bioinformatics       Date:  2002

5.  Application of phylogenetic networks in evolutionary studies.

Authors:  Daniel H Huson; David Bryant
Journal:  Mol Biol Evol       Date:  2005-10-12       Impact factor: 16.240

Review 6.  Model use in phylogenetics: nine key questions.

Authors:  Scot A Kelchner; Michael A Thomas
Journal:  Trends Ecol Evol       Date:  2006-10-17       Impact factor: 17.712

7.  An improved general amino acid replacement matrix.

Authors:  Si Quang Le; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2008-03-26       Impact factor: 16.240

8.  Use of spectral analysis to test hypotheses on the origin of pinnipeds.

Authors:  G M Lento; R E Hickson; G K Chambers; D Penny
Journal:  Mol Biol Evol       Date:  1995-01       Impact factor: 16.240

9.  APE: Analyses of Phylogenetics and Evolution in R language.

Authors:  Emmanuel Paradis; Julien Claude; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

10.  A duplicate gene rooting of seed plants and the phylogenetic position of flowering plants.

Authors:  Sarah Mathews; Mark D Clements; Mark A Beilstein
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-02-12       Impact factor: 6.237

View more
  828 in total

1.  Individual and Site-Specific Variation in a Biogeographical Profile of the Coyote Gastrointestinal Microbiota.

Authors:  Scott Sugden; Colleen Cassady St Clair; Lisa Y Stein
Journal:  Microb Ecol       Date:  2020-06-27       Impact factor: 4.552

2.  Quantitative Analysis of Synthetic Cell Lineage Tracing Using Nuclease Barcoding.

Authors:  Stephanie Tzouanas Schmidt; Stephanie M Zimmerman; Jianbin Wang; Stuart K Kim; Stephen R Quake
Journal:  ACS Synth Biol       Date:  2017-03-10       Impact factor: 5.110

3.  Asynchronous origins of ectomycorrhizal clades of Agaricales.

Authors:  Martin Ryberg; P Brandon Matheny
Journal:  Proc Biol Sci       Date:  2011-12-14       Impact factor: 5.349

4.  Next-Generation Sequencing and Comparative Analysis of Sequential Outbreaks Caused by Multidrug-Resistant Acinetobacter baumannii at a Large Academic Burn Center.

Authors:  Hajime Kanamori; Christian M Parobek; David J Weber; David van Duin; William A Rutala; Bruce A Cairns; Jonathan J Juliano
Journal:  Antimicrob Agents Chemother       Date:  2015-12-07       Impact factor: 5.191

5.  Convergent evolution of reduced eggshell conductance in avian brood parasites.

Authors:  Stephanie C McClelland; Gabriel A Jamie; Katy Waters; Lara Caldas; Claire N Spottiswoode; Steven J Portugal
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-04-01       Impact factor: 6.237

6.  Craniodental and Postcranial Characters of Non-Avian Dinosauria Often Imply Different Trees.

Authors:  Yimeng Li; Marcello Ruta; Matthew A Wills
Journal:  Syst Biol       Date:  2020-07-01       Impact factor: 15.683

7.  Genomic diversity in switchgrass (Panicum virgatum): from the continental scale to a dune landscape.

Authors:  Geoffrey P Morris; Paul P Grabowski; Justin O Borevitz
Journal:  Mol Ecol       Date:  2011-11-08       Impact factor: 6.185

8.  The Effect of Inoculation of a Diazotrophic Bacterial Consortium on the Indigenous Bacterial Community Structure of Sugarcane Apoplast Fluid.

Authors:  Carlos M Dos-Santos; Náthalia V S Ribeiro; Stefan Schwab; José I Baldani; Marcia S Vidal
Journal:  Curr Microbiol       Date:  2021-06-25       Impact factor: 2.188

9.  Nutritional regulation in mixotrophic plants: new insights from Limodorum abortivum.

Authors:  Alessandro Bellino; Anna Alfani; Marc-André Selosse; Rossella Guerrieri; Marco Borghetti; Daniela Baldantoni
Journal:  Oecologia       Date:  2014-05-11       Impact factor: 3.225

10.  Genomic sequences of six botulinum neurotoxin-producing strains representing three clostridial species illustrate the mobility and diversity of botulinum neurotoxin genes.

Authors:  Theresa J Smith; Karen K Hill; Gary Xie; Brian T Foley; Charles H D Williamson; Jeffrey T Foster; Shannon L Johnson; Olga Chertkov; Hazuki Teshima; Henry S Gibbons; Lauren A Johnsky; Mark A Karavis; Leonard A Smith
Journal:  Infect Genet Evol       Date:  2014-12-06       Impact factor: 3.342

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.