Literature DB >> 31070718

RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.

Alexey M Kozlov1, Diego Darriba1, Tomáš Flouri1, Benoit Morel1, Alexandros Stamatakis1,2.   

Abstract

MOTIVATION: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.
RESULTS: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric.
AVAILABILITY AND IMPLEMENTATION: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 31070718      PMCID: PMC6821337          DOI: 10.1093/bioinformatics/btz305

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

RAxML (Stamatakis, 2014) is a popular maximum likelihood (ML) tree inference tool which has been developed and supported by our group for the last 15 years. More recently, we also released ExaML (Kozlov ), a dedicated code for analyzing genome-scale datasets on supercomputers. ExaML implements the core tree search functionality of RAxML and scales to thousands of CPU cores. Other widely used ML inference tools are, for instance, IQ-Tree (Nguyen ), PhyML (Guindon ) and FastTree (Price ). Here, we introduce our new code called RAxML-NG (RAxML Next Generation). It combines the strengths and concepts of RAxML and ExaML, and offers several additional improvements which we describe in the next section.

2 New features and optimizations

2.1 Evolutionary model extensions

While RAxML/ExaML only fully supported the General Time Reversible (GTR) model of DNA substitution, RAxML-NG now supports all 22 ‘classical’ GTR-derived models. All model parameters (including branch lengths) can be either optimized or fixed to user-specified values. RAxML-NG also offers the following features: edge-proportional branch length estimation for multi-gene alignments, FreeRate model of rate heterogeneity (Yang, 1995), per-rate scalers in the Γ model of rate heterogeneity to prevent numerical underflow on large trees.

2.2 Search algorithm modifications

The subtree enumeration method used in RAxML/ExaML occasionally skipped promising topological moves; this has now been fixed in RAxML-NG (see Supplementary Material for details). Further, RAxML-NG employs a two-step L-BFGS-B method (Fletcher, 1987) to optimize the parameters of the LG4X model (Le ). This approach (first introduced in IQ-Tree) is usually faster and more stable than the sequential optimization using Brent’s method in RAxML/ExaML.

2.3 Transfer bootstrap

RAxML-NG can compute the novel branch support metric called transfer bootstrap expectation (TBE) recently proposed in (Lemoine ). When compared with the classic Felsenstein bootstrap, TBE is less sensitive to individual misplaced taxa in replicate trees, and thus better suited to reveal well-supported deep splits in large trees with thousands of taxa.

2.4 Phylogenetic terraces

Certain patterns of missing data in multi-gene alignments can yield multiple tree topologies with identical likelihood scores—a phenomenon known as terraces in tree space (Sanderson ). RAxML-NG employs the recently released terraphast library (Biczok ) to assess if the inferred best-scoring ML tree resides on a terrace, and report the size of that terrace.

2.5 Performance and scalability

In RAxML-NG, we further optimized the vectorized likelihood computation kernels and eliminated known sequential bottlenecks of RAxML. We also integrated an optimization technique for likelihood calculations known as site repeats (Kobert ) which yields runtime improvements of 10–60%. Finally, RAxML-NG implements several features for enhancing parallel efficiency, previously only available in ExaML: efficient fine-grained parallelization with MPI or MPI+pthreads, binary input file format (compressed alignment), restart from a checkpoint, improved load balancing for multi-gene alignments (Kobert )

2.6 Usability

Several RAxML-NG features aim to improve usability and avoid common pitfalls: auto-detection of CPU instruction set and number of cores, recommendation for the optimal number of threads, automatic restart from the last checkpoint after program interruption, search progress reporting in the log file etc.

2.7 Modularization

RAxML and ExaML are large monolithic codes. This hindered maintenance, extension and code reuse. In RAxML-NG, we encapsulated the phylogenetic likelihood kernels and numerical optimization routines in two libraries: libpll (https://github.com/xflouris/libpll-2) and pll-modules (https://github.com/ddarriba/pll-modules), respectively. Both libraries include unit tests and are also being used by other software tools developed in our lab such as ModelTest-NG and EPA-NG (Barbera ). This yields our likelihood computation code more error-proof than in RAxML/ExaML.

3 Evaluation

A recent evaluation of fast ML-based methods (Zhou ) showed that IQTree yields the best tree inference accuracy, closely followed by RAxML/ExaML. Thus, we benchmarked RAxML-NG against these three programs on the collection of empirical datasets used by Zhou et al. RAxML-NG found the best-scoring tree for the highest number of datasets (19/21) among all programs tested, while being 1.3× to 4.5× faster. Furthermore, it scales to the large number of cores with a parallel efficiency of up to 125% (see Supplementary Material for details). In summary, RAxML-NG is clearly superior to RAxML/ExaML, and thus we recommend that the users of these codes upgrade as soon as possible. Comparison to IQTree yielded mixed results: although RAxML-NG is generally faster and returns higher-scoring trees on taxon-rich alignments, IQTree results show much lower variance. Hence, on alignments with strong phylogenetic signal, IQTree may require fewer replicate searches than RAxML-NG to find the best-scoring tree.

4 Availability and user support

The RAxML-NG source code as well as pre-compiled binaries for Linux and MacOS are available at https://github.com/amkozlov/raxml-ng. RAxML-NG is also available as a web service (maintained by the Vital-IT unit of the Swiss Institute of Bioinformatics) at https://raxml-ng.vital-it.ch/. An up-to-date user manual is available at https://github.com/amkozlov/raxml-ng/wiki. User support is provided via the RAxML Google group at: https://groups.google.com/forum/#!forum/raxml.

5 Future work

In future versions of RAxML-NG, we plan to add site heterogeneity models such as RAxML-CAT (Stamatakis, 2006) and PhyloBayes-CAT (Le ), as well as non-reversible context-dependent models of evolution (Baele ). Furthermore, we plan to explore orthogonal parallelization schemes (across tree nodes and/or topological moves), for leveraging the capabilities of modern parallel hardware and more efficiently analyzing datasets with thousands of taxa. Click here for additional data file.
  15 in total

1.  Modeling protein evolution with several amino acid replacement matrices depending on site rates.

Authors:  Si Quang Le; Cuong Cao Dang; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2012-04-06       Impact factor: 16.240

2.  Using non-reversible context-dependent evolutionary models to study substitution patterns in primate non-coding sequences.

Authors:  Guy Baele; Yves Van de Peer; Stijn Vansteelandt
Journal:  J Mol Evol       Date:  2010-07-11       Impact factor: 2.395

3.  Empirical profile mixture models for phylogenetic reconstruction.

Authors:  Le Si Quang; Olivier Gascuel; Nicolas Lartillot
Journal:  Bioinformatics       Date:  2008-08-21       Impact factor: 6.937

4.  Terraces in phylogenetic tree space.

Authors:  Michael J Sanderson; Michelle M McMahon; Mike Steel
Journal:  Science       Date:  2011-06-16       Impact factor: 47.728

5.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

6.  A space-time process model for the evolution of DNA sequences.

Authors:  Z Yang
Journal:  Genetics       Date:  1995-02       Impact factor: 4.562

7.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

8.  IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.

Authors:  Lam-Tung Nguyen; Heiko A Schmidt; Arndt von Haeseler; Bui Quang Minh
Journal:  Mol Biol Evol       Date:  2014-11-03       Impact factor: 16.240

9.  Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets.

Authors:  Xiaofan Zhou; Xing-Xing Shen; Chris Todd Hittinger; Antonis Rokas
Journal:  Mol Biol Evol       Date:  2018-02-01       Impact factor: 16.240

10.  Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations.

Authors:  K Kobert; A Stamatakis; T Flouri
Journal:  Syst Biol       Date:  2017-03-01       Impact factor: 9.160

View more
  512 in total

1.  FAD2 Gene Radiation and Positive Selection Contributed to Polyacetylene Metabolism Evolution in Campanulids.

Authors:  Tao Feng; Ya Yang; Lucas Busta; Edgar B Cahoon; Hengchang Wang; Shiyou Lü
Journal:  Plant Physiol       Date:  2019-08-16       Impact factor: 8.340

2.  Signatures of Relaxed Selection in the CYP8B1 Gene of Birds and Mammals.

Authors:  Sagar Sharad Shinde; Lokdeep Teekas; Sandhya Sharma; Nagarjun Vijay
Journal:  J Mol Evol       Date:  2019-08-01       Impact factor: 2.395

3.  A Tetratricopeptide Repeat Protein Regulates Carotenoid Biosynthesis and Chromoplast Development in Monkeyflowers (Mimulus).

Authors:  Lauren E Stanley; Baoqing Ding; Wei Sun; Fengjuan Mou; Connor Hill; Shilin Chen; Yao-Wu Yuan
Journal:  Plant Cell       Date:  2020-03-04       Impact factor: 11.277

4.  The complex evolutionary history of sulfoxide synthase in ovothiol biosynthesis.

Authors:  Marco Gerdol; Marco Sollitto; Alberto Pallavicini; Immacolata Castellano
Journal:  Proc Biol Sci       Date:  2019-11-27       Impact factor: 5.349

5.  Gene duplications and phylogenomic conflict underlie major pulses of phenotypic evolution in gymnosperms.

Authors:  Gregory W Stull; Xiao-Jian Qu; Caroline Parins-Fukuchi; Ying-Ying Yang; Jun-Bo Yang; Zhi-Yun Yang; Yi Hu; Hong Ma; Pamela S Soltis; Douglas E Soltis; De-Zhu Li; Stephen A Smith; Ting-Shuang Yi
Journal:  Nat Plants       Date:  2021-07-19       Impact factor: 15.793

6.  The Quaternary evolutionary history of Bristol rock cress (Arabis scabra, Brassicaceae), a Mediterranean element with an outpost in the north-western Atlantic region.

Authors:  Marcus A Koch; Johanna Möbus; Clara A Klöcker; Stephanie Lippert; Laura Ruppert; Christiane Kiefer
Journal:  Ann Bot       Date:  2020-06-19       Impact factor: 4.357

7.  Molecular evidence of new freshwater turtle blood flukes (Digenea: Spirorchiidae) in the intermediate snail host Biomphalaria occidentalis Paraense, 1981 in an urban aquatic ecosystem in Brazil.

Authors:  Juliana Rosa Matias Ciccheto; Bruno Henrique Mioto Stabile; Fábio Fermino; Thomaz Mansini Carrenho Fabrin; Alessandra Valéria de Oliveira; Ricardo Massato Takemoto; Rodrigo Junio da Graça
Journal:  Parasitol Res       Date:  2020-11-08       Impact factor: 2.289

8.  Proteomic Characterization of Lignocellulolytic Enzymes Secreted by the Insect-Associated Fungus Daldinia decipiens oita, Isolated from a Forest in Northern Japan.

Authors:  Chiaki Hori; Ruopu Song; Kazuki Matsumoto; Ruy Matsumoto; Benjamin B Minkoff; Shuzo Oita; Hideho Hara; Taichi E Takasuka
Journal:  Appl Environ Microbiol       Date:  2020-04-01       Impact factor: 4.792

9.  Re-examination of the phylogenetic relationships within the Gyliauchenidae Fukui, 1929 (Digenea) based on morphological and molecular evidence with a proposal for Paragyliaucheninae n. subfam. and a description of Flagellotrema convolutum Ozaki, 1936.

Authors:  Yasser F M Karar; Charles K Blend; Refaat M A Khalifa; Hemely Abdel-Shafy Hassan; Hoda S Mohamadain; Norman O Dronen
Journal:  Syst Parasitol       Date:  2019-08-02       Impact factor: 1.431

10.  Partitiviruses Infecting Drosophila melanogaster and Aedes aegypti Exhibit Efficient Biparental Vertical Transmission.

Authors:  Shaun T Cross; Bernadette L Maertens; Tillie J Dunham; Case P Rodgers; Ali L Brehm; Megan R Miller; Alissa M Williams; Brian D Foy; Mark D Stenglein
Journal:  J Virol       Date:  2020-09-29       Impact factor: 5.103

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.