Literature DB >> 26130081

FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program.

Vincent Lefort¹, Richard Desper¹, Olivier Gascuel².

Abstract

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange. The new 2.0 version also includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. FastME is available using several interfaces: Command-line (to be integrated in pipelines), PHYLIP-like, and a Web server (http://www.atgc-montpellier.fr/fastme/).

Entities: Chemical

Keywords: (balanced) minimum evolution; NNI and SPR topological moves; distance-based; fast algorithms; phylogeny inference

Mesh：

Year: 2015 PMID： 26130081 PMCID： PMC4576710 DOI： 10.1093/molbev/msv150

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Distance algorithms infer phylogenies from matrices of pairwise distances among taxa. These algorithms are fast and have been shown to be fairly accurate using both real and simulated data (e.g., Kuhner and Felsenstein 1994). Moreover, they account for probabilistic modeling of substitutions while estimating evolutionary distances. Even if they are not as accurate as likelihood-based methods, these algorithms are still widely used due to their speed and simplicity, as assessed by the high number of citations for Neighbor Joining (NJ, Saitou and Nei 1987; see also Studier and Keppler 1988): Approximately 2,000 in 2014 (Web of Science). NJ is a greedy algorithm that builds trees by iterative agglomeration of taxa. Gascuel and Steel (2006) showed that the criterion being minimized by NJ is the balanced version of minimum evolution (BME), which estimates the tree length using Pauplin’s formula (2000). We proposed fast, BME-based algorithms (Desper and Gascuel 2002, 2004) to 1) construct an initial tree using greedy taxon insertion and 2) perform topological moves, namely Nearest Neighbor Interchanges (NNIs), to improve an initial (e.g., NJ) tree. These algorithms were implemented in FastME 1.0 and were shown to improve accuracy substantially in comparison to NJ’s (e.g., Vinh and von Haeseler 2005), while having a similar computational cost. A related NNI-based approach, using profiles of ancestral sequences instead of a distance matrix, was proposed by Price et al. (2009) and implemented in FastTree1. FastME has been developed over the past several years: Subtree Pruning and Regrafting (SPR) topological moves are available in FastME 2.0. SPR consists of removing a subtree from the initial tree and reinserting this subtree by dividing any of the remaining branches in the initial tree. We thus have alternative trees to improve the initial tree, where n is the number of taxa. The best SPR is selected and the procedure is iterated until no more improving SPR is found. SPRs are more powerful than NNIs (with alternative trees) and have been shown to be useful in a number of contexts and studies (e.g., with maximum-likelihood [ML]-based tree building; Guindon et al. 2010). Our algorithm first precomputes the average distance between every pair of subtrees of the initial topology; this can be achieved in time. Then, the criterion value for any new tree obtained by SPR is computed in constant time, meaning that the total cost of the SPR-based tree search is , where k is the number of iterations. As k is usually smaller than n, the computational cost is similar to that of NJ, that is, . Experiments with real data (both DNA and proteins) show that a substantial gain is obtained, compared with NJ and NJ+NNIs; the best alternative is FastTree1, which (quickly) infers trees that are less fitted than NJ+SPR’s regarding minimum evolution, but have similar likelihood value with DNA sequences. Details on our SPR algorithm and these experiments are provided in Supplementary Material online. A number of tree-building algorithms have been added, to infer an initial tree or to improve that tree (or any input tree) with topological moves. These algorithms seek to optimize BME, but also the Ordinary Least Square version of minimum evolution (OLSME; Rzhetsky and Nei 1993), which may be relevant with nonsequence data. These algorithms and their properties are summarized in table 1.

Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
		Target	Method
DNA	p-distance	General	Analytical formula
	RY symmetric
	RY
	JC69 (Jukes, Mam. Prot. Metab., 1969)
	K2P (Kimura, J. Mol. Evol., 1980)
	F81 (Felsenstein, J. Mol. Evol., 1981)
	F84 (Felsenstein, Evolution, 1984)
	TN93 (Tamura, MBE, 1993)
	LogDet (Lockhart, MBE, 1994)

Protein	p-distance	General	Analytical formula
	F81-like	General	Analytical formula
	LG (Le, MBE, 2008)	General	ML estimation
	WAG (Whelan, MBE, 2001)	General	ML estimation
	JTT (Jones, CABIOS, 1992)	General	ML estimation
	Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)	General	ML estimation
	DCMut (Kosiol, MBE, 2004)	General	ML estimation
	CpRev (Adachi, J. Mol. Evol., 2000)	Chloroplast	ML estimation
	MtREV (Adachi, J. Mol. Evol., 1996)	Mitochondria	ML estimation
	RtREV (Dimmic, J. Mol. Evol., 2002)	Retrovirus	ML estimation
	HIVb/w (Nickle, PLoS One, 2007)	HIV	ML estimation
	FLU (Dang et al., BMC Evol. Biol., 2010)	Flu	ML estimation

Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n.

The calculation of evolutionary distance matrices from DNA and protein sequences is also available. For DNA, most models having an analytical solution (e.g., TN93) have been implemented. For protein sequences, we use standard ML-based estimations, combined with a number of rate matrices (e.g., JTT [Jones, Taylor, and Thorton]) to accommodate various data sets (mitochondria, virus, etc.). In both cases, distances can be estimated assuming a continuous gamma distribution of rates across sites with user-defined parameter. Models and options are summarized in table 1. Bootstrapping and analysis of multiple data sets can be performed within a single run. FastME 2.0 implements Felsenstein’s bootstrap, where pseudo trees are built from resampled alignments and compared with the original tree obtained from the input alignment. Users can also submit a unique file containing multiple alignments (e.g., corresponding to different genes in phylogenomics studies) and launch tree construction for all of them using the same program options. Bootstrapping is a highly parallelizable task. The same holds for distance estimations. FastME 2.0 provides parallel computing for these two tasks using the OpenMP API. When compiling FastME, users can choose to obtain a mono-thread or a parallel binary. They may then set, on the command line, the number of cores to be used. FastME 2.0 includes a menu-driven PHYLIP-like interface, and a command-line interface, to be typically integrated in phylogenomics pipelines. A Web server is also available for occasional users. FastME is an open-source C program, with binaries available for the three main operating systems. Substitution Models and Algorithms Available in FastME 2.0. Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n. FastME 2.0 is thus a comprehensive program, including all required tools (numerous algorithms, distance estimation with various models, bootstrapping) to infer phylogenies using a distance approach. Source code, binaries, Web server, user guide, examples, benchmark data sets, etc., are available from http://www.atgc-montpellier.fr/fastme/ (last accessed July 14, 2015).

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

11 in total

1. Direct calculation of a tree length using a distance matrix.

Authors: Y Pauplin
Journal: J Mol Evol Date: 2000-07 Impact factor: 2.395

2. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.

Authors: Richard Desper; Olivier Gascuel
Journal: J Comput Biol Date: 2002 Impact factor: 1.479

3. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.

Authors: Richard Desper; Olivier Gascuel
Journal: Mol Biol Evol Date: 2003-12-23 Impact factor: 16.240

4. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Authors: Stéphane Guindon; Jean-François Dufayard; Vincent Lefort; Maria Anisimova; Wim Hordijk; Olivier Gascuel
Journal: Syst Biol Date: 2010-03-29 Impact factor: 15.683

Review 5. Neighbor-joining revealed.

Authors: Olivier Gascuel; Mike Steel
Journal: Mol Biol Evol Date: 2006-07-28 Impact factor: 16.240

6. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors: N Saitou; M Nei
Journal: Mol Biol Evol Date: 1987-07 Impact factor: 16.240

7. A note on the neighbor-joining algorithm of Saitou and Nei.

Authors: J A Studier; K J Keppler
Journal: Mol Biol Evol Date: 1988-11 Impact factor: 16.240

8. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.

Authors: M K Kuhner; J Felsenstein
Journal: Mol Biol Evol Date: 1994-05 Impact factor: 16.240

9. Shortest triplet clustering: reconstructing large phylogenies using representative sets.

Authors: Le Sy Vinh; Arndt von Haeseler
Journal: BMC Bioinformatics Date: 2005-04-08 Impact factor: 3.169

10. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: Mol Biol Evol Date: 2009-04-17 Impact factor: 16.240

286 in total

1. Contemporary loss of migration in monarch butterflies.

Authors: Ayşe Tenger-Trolander; Wei Lu; Michelle Noyes; Marcus R Kronforst
Journal: Proc Natl Acad Sci U S A Date: 2019-06-24 Impact factor: 11.205

2. Streptomyces adelaidensis sp. nov., an actinobacterium isolated from the root of Callitris preissii with potential for plant growth-promoting properties.

Authors: Onuma Kaewkla; Chanwit Suriyachadkun; Christopher Milton Mathew Franco
Journal: Arch Microbiol Date: 2021-04-19 Impact factor: 2.552

3. Multiple Origin but Single Domestication Led to Oryza sativa.

Authors: Jae Young Choi; Michael D Purugganan
Journal: G3 (Bethesda) Date: 2018-03-02 Impact factor: 3.154

4. Functional Evolution of Proteins.

Authors: Jonathan Catazaro; Adam Caprez; David Swanson; Robert Powers
Journal: Proteins Date: 2019-02-19

5. Micromonospora veneta sp. nov., an endophytic actinobacterium with potential for nitrogen fixation and for bioremediation.

Authors: Onuma Kaewkla; Chanwit Suriyachadkun; Christopher Milton Mathew Franco
Journal: Arch Microbiol Date: 2021-03-22 Impact factor: 2.552