Literature DB >> 24451623

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Alexandros Stamatakis1.   

Abstract

MOTIVATION: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community.
RESULTS: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available.

Mesh:

Year:  2014        PMID: 24451623      PMCID: PMC3998144          DOI: 10.1093/bioinformatics/btu033

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets under maximum likelihood. Its major strength is a fast maximum likelihood tree search algorithm that returns trees with good likelihood scores. Since the last RAxML paper (Stamatakis, 2006), it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. In the following, I will present some of the most notable new features and extensions of RAxML.

2 NEW FEATURES

2.1 Bootstrapping and support values

RAxML offers four different ways to obtain bootstrap support. It implements the standard non-parametric bootstrap and also the so-called rapid bootstrap (Stamatakis ), which is a standard bootstrap search that relies on algorithmic shortcuts and approximations to speed up the search process. It also offers an option to calculate the so-called SH-like support values (Guindon ). I recently implemented a method that allows for computing RELL (Resampling Estimated Log Likelihoods) bootstrap support as described by Minh . Apart from this, RAxML also offers a so-called bootstopping option (Pattengale ). When this option is used, RAxML will automatically determine how many bootstrap replicates are required to obtain stable support values.

2.2 Models and data types

Apart from DNA and protein data, RAxML now also supports binary, multi-state morphological and RNA secondary structure data. It can correct for ascertainment bias (Lewis, 2001) for all of the above data types. This might be useful not only for morphological data matrices that only contain variable sites but also for alignments of SNPs. The number of available protein substitution models has been significantly extended and comprises a general time reversible (GTR) model, as well as the computationally more complex LG4M and LG4X models (Le ). RAxML can also automatically determine the best-scoring protein substitution model. Finally, a new option for conducting a maximum likelihood estimate of the base frequencies has become available.

2.3 Parallel versions

RAxML offers a fine-grain parallelization of the likelihood function for multi-core systems via the PThreads-based version and a coarse-grain parallelization of independent tree searches via MPI (Message Passing Interface). It also supports coarse-grain/fine-grain parallelism via the hybrid MPI/PThreads version (Pfeiffer and Stamatakis, 2010). Note that, for extremely large analyses on supercomputers, using the dedicated sister program ExaML [Exascale Maximum Likelihood (Stamatakis and Aberer, 2013)] is recommended.

2.4 Post-analysis of trees

RAxML offers a plethora of post-analysis functions for sets of trees. Apart from standard statistical significance tests, it offers efficient (and partially parallelized) operations for computing Robinson–Foulds distances, as well as extended majority rule, majority rule and strict consensus trees (Aberer ). Beyond this, it implements a method for identifying the so-called rogue taxa (Pattengale ), and I recently implemented options for calculating the TC (Tree Certainty) and IC (Internode Certainty) measures as introduced by Salichos and Rokas (2013). Finally, there is the new plausibility checker option (Dao ) that allows computing the RF distances between a huge phylogeny with tens of thousands of taxa and several smaller more accurate reference phylogenies that contain a strict subset of the taxa in the huge tree. This option can be used to automatically assess the quality of huge trees that can not be inspected by eye.

2.5 Analyzing next-generation sequencing data

RAxML offers two algorithms for preparing and analyzing next-generation sequencing data. A sliding-window approach (unpublished) is available to assess which regions of a gene (e.g. 16S) exhibit strong and stable phylogenetic signal to support decisions about which regions to amplify. Apart from that, RAxML also implements parsimony and maximum likelihood flavors of the evolutionary placement algorithm [EPA (Berger )] that places short reads into a given reference phylogeny obtained from full-length sequences to determine the evolutionary origin of the reads. It also offers placement support statistics for those reads by calculating likelihood weights. This option can also be used to place fossils into a given phylogeny (Berger and Stamatakis, 2010) or to insert different outgroups into the tree a posteriori, that is, after the inference of the ingroup phylogeny.

2.6 Vector intrinsics

RAxML uses manually inserted and optimized x86 vector intrinsics to accelerate the parsimony and likelihood calculations. It supports SSE3, AVX and AVX2 (using fused multiply-add instructions) intrinsics. For a small single-gene DNA alignment using the Γ model of rate heterogeneity, the unvectorized version of RAxML requires 111.5 s, the SSE3 version 84.4 s and the AVX version 66.22 s to complete a simple tree search on an Intel i7-2620 M core running at 2.70 GHz under Ubuntu Linux. The differences between AVX and AVX2 are less pronounced and are typically below 5% run time improvement.

2.7 Saving memory

Because memory shortage is becoming an issue due to the growing dataset sizes, RAxML implements an option for reducing memory footprints and potentially run times on large phylogenomic datasets with missing data. The memory savings are proportional to the amount of missing data in the alignment (Izquierdo-Carrasco )

2.8 Miscellaneous new options

RAxML offers options to conduct fast and more superficial tree searches on datasets with tens of thousands of taxa. It can also compute marginal ancestral states and offers an algorithm for rooting trees. Furthermore, it implements a sequential, PThreads-parallelized and MPI-parallelized algorithm for computing all quartets or a subset of quartets for a given alignment.

3 USER SUPPORT AND FUTURE WORK

User support is provided via the RAxML Google group at: https://groups.google.com/forum/?hl=en#!forum/raxml. The RAxML source code contains a comprehensive manual and there is a step-by-step tutorial with some basic commands available at http://www.exelixis-lab.org/web/software/raxml/hands_on.html. Further resources are available via the RAxML software page at http://www.exelixis-lab.org/web/software/raxml/ Future work includes the continued maintenance of RAxML, the adaptation to novel computer architectures and the implementation of novel models and datatypes, in particular codon models.
  11 in total

1.  A likelihood approach to estimating phylogeny from discrete morphological character data.

Authors:  P O Lewis
Journal:  Syst Biol       Date:  2001 Nov-Dec       Impact factor: 15.683

2.  Modeling protein evolution with several amino acid replacement matrices depending on site rates.

Authors:  Si Quang Le; Cuong Cao Dang; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2012-04-06       Impact factor: 16.240

3.  How many bootstrap replicates are necessary?

Authors:  Nicholas D Pattengale; Masoud Alipour; Olaf R P Bininda-Emonds; Bernard M E Moret; Alexandros Stamatakis
Journal:  J Comput Biol       Date:  2010-03       Impact factor: 1.479

4.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Authors:  Stéphane Guindon; Jean-François Dufayard; Vincent Lefort; Maria Anisimova; Wim Hordijk; Olivier Gascuel
Journal:  Syst Biol       Date:  2010-03-29       Impact factor: 15.683

5.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

6.  A rapid bootstrap algorithm for the RAxML Web servers.

Authors:  Alexandros Stamatakis; Paul Hoover; Jacques Rougemont
Journal:  Syst Biol       Date:  2008-10       Impact factor: 15.683

7.  Uncovering hidden phylogenetic consensus in large data sets.

Authors:  Nicholas D Pattengale; Andre J Aberer; Krister M Swenson; Alexandros Stamatakis; Bernard M E Moret
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Jul-Aug       Impact factor: 3.710

8.  Inferring ancient divergences requires genes with strong phylogenetic signals.

Authors:  Leonidas Salichos; Antonis Rokas
Journal:  Nature       Date:  2013-05-08       Impact factor: 49.962

9.  Ultrafast approximation for phylogenetic bootstrap.

Authors:  Bui Quang Minh; Minh Anh Thi Nguyen; Arndt von Haeseler
Journal:  Mol Biol Evol       Date:  2013-02-15       Impact factor: 16.240

10.  Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees.

Authors:  Fernando Izquierdo-Carrasco; Stephen A Smith; Alexandros Stamatakis
Journal:  BMC Bioinformatics       Date:  2011-12-13       Impact factor: 3.169

View more
  2000 in total

1.  Genome-wide discovery of epistatic loci affecting antibiotic resistance in Neisseria gonorrhoeae using evolutionary couplings.

Authors:  Benjamin Schubert; Rohan Maddamsetti; Jackson Nyman; Maha R Farhat; Debora S Marks
Journal:  Nat Microbiol       Date:  2018-12-03       Impact factor: 17.745

2.  A re-evaluation of diversity of the Aporocotylidae Odhner, 1912 in Siganus fuscescens (Houttuyn) (Perciformes: Siganidae) and associated species.

Authors:  Xena Brooks; Thomas H Cribb; Russell Q-Y Yong; Scott C Cutmore
Journal:  Syst Parasitol       Date:  2017-08-02       Impact factor: 1.431

3.  Genomic epidemiology of Klebsiella pneumoniae in Italy and novel insights into the origin and global evolution of its resistance to carbapenem antibiotics.

Authors:  Stefano Gaiarsa; Francesco Comandatore; Paolo Gaibani; Marta Corbella; Claudia Dalla Valle; Sara Epis; Erika Scaltriti; Edoardo Carretto; Claudio Farina; Maria Labonia; Maria Paola Landini; Stefano Pongolini; Vittorio Sambri; Claudio Bandi; Piero Marone; Davide Sassera
Journal:  Antimicrob Agents Chemother       Date:  2014-11-03       Impact factor: 5.191

4.  Widespread ancient whole-genome duplications in Malpighiales coincide with Eocene global climatic upheaval.

Authors:  Liming Cai; Zhenxiang Xi; André M Amorim; M Sugumaran; Joshua S Rest; Liang Liu; Charles C Davis
Journal:  New Phytol       Date:  2018-07-21       Impact factor: 10.151

5.  Silica bodies in leaves of neotropical Podostemaceae: taxonomic and phylogenetic perspectives.

Authors:  Filipe G C M da Costa; Denise E Klein; C Thomas Philbrick; Claudia P Bove
Journal:  Ann Bot       Date:  2018-12-31       Impact factor: 4.357

6.  Salmonella enterica Serovar Hvittingfoss in Bar-Tailed Godwits (Limosa lapponica) from Roebuck Bay, Northwestern Australia.

Authors:  Hannah G Smith; David C Bean; Jane Hawkey; Rohan H Clarke; Richard Loyn; Jo-Ann Larkins; Chris Hassell; Mary Valcanis; William Pitchers; Andrew R Greenhill
Journal:  Appl Environ Microbiol       Date:  2020-09-17       Impact factor: 4.792

7.  The concluding chapter: recircumscription of Goodenia (Goodeniaceae) to include four allied genera with an updated infrageneric classification.

Authors:  Kelly A Shepherd; Brendan J Lepschi; Eden A Johnson; Andrew G Gardner; Emily B Sessa; Rachel S Jabaily
Journal:  PhytoKeys       Date:  2020-07-07       Impact factor: 1.635

8.  Flexible ammonia handling strategies using both cutaneous and branchial epithelia in the highly ammonia-tolerant Pacific hagfish.

Authors:  Alexander M Clifford; Alyssa M Weinrauch; Susan L Edwards; Michael P Wilkie; Greg G Goss
Journal:  Am J Physiol Regul Integr Comp Physiol       Date:  2017-05-17       Impact factor: 3.619

9.  Occurrence of Belonolaimus in Sinaloa, Northwestern Mexico: A New Report on Distribution and Host Range.

Authors:  Manuel Mundo-Ocampo; J G Baldwin; T J Pereira; J R Camacho-Baez; A D Armenta-Bojorquez; M Camacho-Haro; J O Becker
Journal:  J Nematol       Date:  2017-03       Impact factor: 1.402

10.  Two new and one known species of Tergestia Stossich, 1899 (Trematoda: Fellodistomidae) with novel molecular characterisation for the genus.

Authors:  Nicholas Q-X Wee; Scott C Cutmore; Russell Q-Y Yong; Thomas H Cribb
Journal:  Syst Parasitol       Date:  2017-09-02       Impact factor: 1.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.