Literature DB >> 26130081

FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program.

Vincent Lefort1, Richard Desper1, Olivier Gascuel2.   

Abstract

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange. The new 2.0 version also includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. FastME is available using several interfaces: Command-line (to be integrated in pipelines), PHYLIP-like, and a Web server (http://www.atgc-montpellier.fr/fastme/).
© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  (balanced) minimum evolution; NNI and SPR topological moves; distance-based; fast algorithms; phylogeny inference

Mesh:

Year:  2015        PMID: 26130081      PMCID: PMC4576710          DOI: 10.1093/molbev/msv150

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Distance algorithms infer phylogenies from matrices of pairwise distances among taxa. These algorithms are fast and have been shown to be fairly accurate using both real and simulated data (e.g., Kuhner and Felsenstein 1994). Moreover, they account for probabilistic modeling of substitutions while estimating evolutionary distances. Even if they are not as accurate as likelihood-based methods, these algorithms are still widely used due to their speed and simplicity, as assessed by the high number of citations for Neighbor Joining (NJ, Saitou and Nei 1987; see also Studier and Keppler 1988): Approximately 2,000 in 2014 (Web of Science). NJ is a greedy algorithm that builds trees by iterative agglomeration of taxa. Gascuel and Steel (2006) showed that the criterion being minimized by NJ is the balanced version of minimum evolution (BME), which estimates the tree length using Pauplin’s formula (2000). We proposed fast, BME-based algorithms (Desper and Gascuel 2002, 2004) to 1) construct an initial tree using greedy taxon insertion and 2) perform topological moves, namely Nearest Neighbor Interchanges (NNIs), to improve an initial (e.g., NJ) tree. These algorithms were implemented in FastME 1.0 and were shown to improve accuracy substantially in comparison to NJ’s (e.g., Vinh and von Haeseler 2005), while having a similar computational cost. A related NNI-based approach, using profiles of ancestral sequences instead of a distance matrix, was proposed by Price et al. (2009) and implemented in FastTree1. FastME has been developed over the past several years: Subtree Pruning and Regrafting (SPR) topological moves are available in FastME 2.0. SPR consists of removing a subtree from the initial tree and reinserting this subtree by dividing any of the remaining branches in the initial tree. We thus have alternative trees to improve the initial tree, where n is the number of taxa. The best SPR is selected and the procedure is iterated until no more improving SPR is found. SPRs are more powerful than NNIs (with alternative trees) and have been shown to be useful in a number of contexts and studies (e.g., with maximum-likelihood [ML]-based tree building; Guindon et al. 2010). Our algorithm first precomputes the average distance between every pair of subtrees of the initial topology; this can be achieved in time. Then, the criterion value for any new tree obtained by SPR is computed in constant time, meaning that the total cost of the SPR-based tree search is , where k is the number of iterations. As k is usually smaller than n, the computational cost is similar to that of NJ, that is, . Experiments with real data (both DNA and proteins) show that a substantial gain is obtained, compared with NJ and NJ+NNIs; the best alternative is FastTree1, which (quickly) infers trees that are less fitted than NJ+SPR’s regarding minimum evolution, but have similar likelihood value with DNA sequences. Details on our SPR algorithm and these experiments are provided in Supplementary Material online. A number of tree-building algorithms have been added, to infer an initial tree or to improve that tree (or any input tree) with topological moves. These algorithms seek to optimize BME, but also the Ordinary Least Square version of minimum evolution (OLSME; Rzhetsky and Nei 1993), which may be relevant with nonsequence data. These algorithms and their properties are summarized in table 1.
Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
TargetMethod
DNAp-distanceGeneralAnalytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)

Proteinp-distanceGeneralAnalytical formula
F81-likeGeneralAnalytical formula
LG (Le, MBE, 2008)GeneralML estimation
WAG (Whelan, MBE, 2001)GeneralML estimation
JTT (Jones, CABIOS, 1992)GeneralML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)GeneralML estimation
DCMut (Kosiol, MBE, 2004)GeneralML estimation
CpRev (Adachi, J. Mol. Evol., 2000)ChloroplastML estimation
MtREV (Adachi, J. Mol. Evol., 1996)MitochondriaML estimation
RtREV (Dimmic, J. Mol. Evol., 2002)RetrovirusML estimation
HIVb/w (Nickle, PLoS One, 2007)HIVML estimation
FLU (Dang et al., BMC Evol. Biol., 2010)FluML estimation

Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n.

The calculation of evolutionary distance matrices from DNA and protein sequences is also available. For DNA, most models having an analytical solution (e.g., TN93) have been implemented. For protein sequences, we use standard ML-based estimations, combined with a number of rate matrices (e.g., JTT [Jones, Taylor, and Thorton]) to accommodate various data sets (mitochondria, virus, etc.). In both cases, distances can be estimated assuming a continuous gamma distribution of rates across sites with user-defined parameter. Models and options are summarized in table 1. Bootstrapping and analysis of multiple data sets can be performed within a single run. FastME 2.0 implements Felsenstein’s bootstrap, where pseudo trees are built from resampled alignments and compared with the original tree obtained from the input alignment. Users can also submit a unique file containing multiple alignments (e.g., corresponding to different genes in phylogenomics studies) and launch tree construction for all of them using the same program options. Bootstrapping is a highly parallelizable task. The same holds for distance estimations. FastME 2.0 provides parallel computing for these two tasks using the OpenMP API. When compiling FastME, users can choose to obtain a mono-thread or a parallel binary. They may then set, on the command line, the number of cores to be used. FastME 2.0 includes a menu-driven PHYLIP-like interface, and a command-line interface, to be typically integrated in phylogenomics pipelines. A Web server is also available for occasional users. FastME is an open-source C program, with binaries available for the three main operating systems. Substitution Models and Algorithms Available in FastME 2.0. Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n. FastME 2.0 is thus a comprehensive program, including all required tools (numerous algorithms, distance estimation with various models, bootstrapping) to infer phylogenies using a distance approach. Source code, binaries, Web server, user guide, examples, benchmark data sets, etc., are available from http://www.atgc-montpellier.fr/fastme/ (last accessed July 14, 2015).

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
  11 in total

1.  Direct calculation of a tree length using a distance matrix.

Authors:  Y Pauplin
Journal:  J Mol Evol       Date:  2000-07       Impact factor: 2.395

2.  Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.

Authors:  Richard Desper; Olivier Gascuel
Journal:  J Comput Biol       Date:  2002       Impact factor: 1.479

3.  Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.

Authors:  Richard Desper; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2003-12-23       Impact factor: 16.240

4.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Authors:  Stéphane Guindon; Jean-François Dufayard; Vincent Lefort; Maria Anisimova; Wim Hordijk; Olivier Gascuel
Journal:  Syst Biol       Date:  2010-03-29       Impact factor: 15.683

Review 5.  Neighbor-joining revealed.

Authors:  Olivier Gascuel; Mike Steel
Journal:  Mol Biol Evol       Date:  2006-07-28       Impact factor: 16.240

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

7.  A note on the neighbor-joining algorithm of Saitou and Nei.

Authors:  J A Studier; K J Keppler
Journal:  Mol Biol Evol       Date:  1988-11       Impact factor: 16.240

8.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.

Authors:  M K Kuhner; J Felsenstein
Journal:  Mol Biol Evol       Date:  1994-05       Impact factor: 16.240

9.  Shortest triplet clustering: reconstructing large phylogenies using representative sets.

Authors:  Le Sy Vinh; Arndt von Haeseler
Journal:  BMC Bioinformatics       Date:  2005-04-08       Impact factor: 3.169

10.  FastTree: computing large minimum evolution trees with profiles instead of a distance matrix.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  Mol Biol Evol       Date:  2009-04-17       Impact factor: 16.240

View more
  286 in total

1.  Contemporary loss of migration in monarch butterflies.

Authors:  Ayşe Tenger-Trolander; Wei Lu; Michelle Noyes; Marcus R Kronforst
Journal:  Proc Natl Acad Sci U S A       Date:  2019-06-24       Impact factor: 11.205

2.  Streptomyces adelaidensis sp. nov., an actinobacterium isolated from the root of Callitris preissii with potential for plant growth-promoting properties.

Authors:  Onuma Kaewkla; Chanwit Suriyachadkun; Christopher Milton Mathew Franco
Journal:  Arch Microbiol       Date:  2021-04-19       Impact factor: 2.552

3.  Multiple Origin but Single Domestication Led to Oryza sativa.

Authors:  Jae Young Choi; Michael D Purugganan
Journal:  G3 (Bethesda)       Date:  2018-03-02       Impact factor: 3.154

4.  Functional Evolution of Proteins.

Authors:  Jonathan Catazaro; Adam Caprez; David Swanson; Robert Powers
Journal:  Proteins       Date:  2019-02-19

5.  Micromonospora veneta sp. nov., an endophytic actinobacterium with potential for nitrogen fixation and for bioremediation.

Authors:  Onuma Kaewkla; Chanwit Suriyachadkun; Christopher Milton Mathew Franco
Journal:  Arch Microbiol       Date:  2021-03-22       Impact factor: 2.552

6.  Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.

Authors:  Tandy Warnow; Siavash Mirarab
Journal:  Methods Mol Biol       Date:  2021

7.  Substrates of Peltigera Lichens as a Potential Source of Cyanobionts.

Authors:  Catalina Zúñiga; Diego Leiva; Margarita Carú; Julieta Orlando
Journal:  Microb Ecol       Date:  2017-03-27       Impact factor: 4.552

8.  Lytic KFS-SE2 phage as a novel bio-receptor for Salmonella Enteritidis detection.

Authors:  In Young Choi; Cheonghoon Lee; Won Keun Song; Sung Jae Jang; Mi-Kyung Park
Journal:  J Microbiol       Date:  2019-01-31       Impact factor: 3.422

9.  APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.

Authors:  Metin Balaban; Shahab Sarmashghi; Siavash Mirarab
Journal:  Syst Biol       Date:  2020-05-01       Impact factor: 15.683

10.  Genome mining Streptomyces sp. KCTC 0041BP as a producer of dihydrochalcomycin.

Authors:  Chung Thanh Nguyen; Adzemye Fovennso Bridget; Van Thuy Thi Pham; Hue Thi Nguyen; Tae-Su Kim; Jae Kyung Sohng
Journal:  Appl Microbiol Biotechnol       Date:  2021-06-17       Impact factor: 4.813

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.