Literature DB >> 26921390

ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data.

Jaime Huerta-Cepas1, François Serra2, Peer Bork3.   

Abstract

The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  NCBI taxonomy; hypothesis testing; phylogenetics.; phylogenomics; tree comparison; tree visualization

Mesh:

Year:  2016        PMID: 26921390      PMCID: PMC4868116          DOI: 10.1093/molbev/msw046

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


The Environment for Tree Exploration (ETE) is a toolkit developed to facilitate the computation, analysis and visualization of phylogenetic data. ETE provides a comprehensive Python programming library (API) that allows researchers to automate common tasks in comparative genomics. Since its first release (Huerta-Cepas et al. 2010), ETE has been widely used as a computational framework to perform numerous phylogenomic analyses, including characterizing newly sequenced genomes (Richards et al. 2010; Wang et al. 2014), extracting information from large sets of phylogenetic trees (Derelle and Lang 2012; Chiapello et al. 2015; Marcet-Houben and Gabaldón 2015) and developing third party tools and databases (Zhang et al. 2013; Huerta-Cepas et al. 2014; Szitenberg et al. 2015). Here, we describe the latest version of the software (ETE v3), featuring a significantly improved API library and a novel collection of standalone tools. While the API continues to offer full programmatic control on data analysis and visualization, the new standalone tools facilitate the use of common phylogenetic methods at the genomic scale. We here describe the most notable additions.

Tree Building

The ete-build tool provides a unified interface to wrap the execution of reproducible phylogenetic workflows, comprising the reconstruction of gene–trees and supermatrix-based species trees. To do so, ETE relies on a versioned collection of external tools that are transparently installed and executed upon request. A single command is used to configure and launch complex phylogenetic pipelines, covering sequence alignment, trimming, substitution-model testing, tree inference, and image rendering (fig. 1). In addition, the supermatrix-based reconstruction mode permits to build and concatenate multiple sequence alignments with ease, simplifying the inference of species trees based on multiple genes. Advanced options allow to automatically switch from amino-acid to nucleotide alignments based on sequence identity, resuming the execution of workflows, or even testing multiple strategies in parallel. As an example, a single command line can be used to test several alignment methodologies or phylogenetic inference programs simultaneously, making the tool particularly suitable to run phylogenomic pipelines. Notably, ETE-build was recently used to compute over one million phylogenetic trees for the EggNOG v4.5 database (Huerta-Cepas et al. 2016).
F

Several phylogenetic tree images generated using the ETE toolkit. (A) Gene tree reconstructed using ete-build. The figure shows the relationships between several P53 genes together with their aligned sequences visualized in condensed format. (B) Tree image generated by ete-evol for three models fitted to a classical example (Bielawski and Yang 2004). (i) The line chart on top of the alignment indicates the omega estimates for sites as calculated by the SLR software. (ii) The bar chart at the bottom part shows the dn/ds ratio for each site under the M2 site-model from CodeML. Line colors in both charts indicate the significance of assigning a site to a given class of positive selection (i.e., red for P-value <0.01 and orange for P-value <0.05). (iii) The color and size of tree nodes represent the dn/ds ratio estimated for tree branches using the free-ratio model from CodeML. Blue small circles indicate a ratio between 0.2 and 1, medium yellow nodes indicate a ratio >1, and big red nodes for infinite values. Note that the right side panel allows users to select the models to be displayed, and even starting new runs using predefined models. (C) Portion of a recently published bird species tree (Jarvis et al. 2014) annotated with gene–tree support values (blue spheres), custom node labeling (first aligned column) and taxonomic information (next aligned columns). (D) Example of a phylogenetic tree visualized with a sequence alignment and domain composition as used in the eggNOG database (Huerta-Cepas et al. 2016).

Several phylogenetic tree images generated using the ETE toolkit. (A) Gene tree reconstructed using ete-build. The figure shows the relationships between several P53 genes together with their aligned sequences visualized in condensed format. (B) Tree image generated by ete-evol for three models fitted to a classical example (Bielawski and Yang 2004). (i) The line chart on top of the alignment indicates the omega estimates for sites as calculated by the SLR software. (ii) The bar chart at the bottom part shows the dn/ds ratio for each site under the M2 site-model from CodeML. Line colors in both charts indicate the significance of assigning a site to a given class of positive selection (i.e., red for P-value <0.01 and orange for P-value <0.05). (iii) The color and size of tree nodes represent the dn/ds ratio estimated for tree branches using the free-ratio model from CodeML. Blue small circles indicate a ratio between 0.2 and 1, medium yellow nodes indicate a ratio >1, and big red nodes for infinite values. Note that the right side panel allows users to select the models to be displayed, and even starting new runs using predefined models. (C) Portion of a recently published bird species tree (Jarvis et al. 2014) annotated with gene–tree support values (blue spheres), custom node labeling (first aligned column) and taxonomic information (next aligned columns). (D) Example of a phylogenetic tree visualized with a sequence alignment and domain composition as used in the eggNOG database (Huerta-Cepas et al. 2016).

Testing Evolutionary Hypotheses

Measuring selective pressures on molecular sequences is a common task in evolutionary biology. Softwares such as CodeML (Yang 2007) or SLR (Massingham and Goldman 2005) provide the statistical and computational framework to perform these analyses. However, the use of such tools at the genomic scale requires substantial work on data preparation, on experimental design, and on results interpretation. To aid in these tasks, the ete-evol tool automates CodeML/SLR-based analyses by using pre-configured evolutionary models and directly producing a graphical representation of the results. These pre-configured models include site (Yang et al. 2000; Massingham and Goldman 2005), branch (Yang and Nielsen 2002), branch-site (Zhang et al. 2005), and clade (Yang and Nielsen 2002; Bielawski and Yang 2004) models. For instance, ete-evol can test, in parallel, and with a single call, the differential selective pressures along each branch in a given phylogeny. Importantly, fitted models are compared using a built-in likelihood ratio test. Evolutionary measures from the best-fitting models are then plotted (or interactively visualized) by mapping the predicted selective pressures acting on sites and branches into the tested topology, as well as on the multiple sequence alignment (fig. 1). For convenience, raw output files produced by CodeML and SLR can also be visualized using ete-evol.

Comparing Trees

ETE v3 provides three measures to compute distances between trees, namely the Robinson–Foulds distance (Robinson and Foulds 1981), a branch congruence measure (%) and the TreeKO Speciation distance (Marcet-Houben and Gabaldón 2011). In contrast to existing software (Felsenstein 2005; Soria-Carrasco et al. 2007), ete-compare calculates all three distances at the same time; it accepts trees varying in size and containing duplication events; it allows filtering branches with low support; and it is optimized for comparing large datasets. In addition, ete-compare can provide a detailed list of the differences and coincidences among the compared trees for further analysis. Conveniently, the TreeKO method for splitting gene trees into duplication-free subtrees has been optimized and integrated into ETE’s API library, thereby enabling its use for other tests. For instance, ETE allows summarizing the phylogenetic signal (i.e., gene tree support) from an heterogenous sample of gene trees using a species tree topology as reference (fig. 1).

Taxonomy Databases

Efficient queries to the NCBI-taxonomy database (Benson et al. 2014) are now available through the ete-ncbiquery tool or the relevant methods in the API. Extracting pruned subtrees, converting NCBI taxids into their corresponding scientific names, obtaining full lineage tracks, or annotating user-trees with taxonomic data, are common tasks that can be easily performed with the ete-ncbiquery tool. Importantly, all queries are carried out locally, avoiding unnecessary lags and permitting the integration of the tool into genomic and metagenomic pipelines. Finally, other ETE-tools and methods are available that aid in routine tasks such as format conversion, topology manipulation, and custom visualization of trees linked to multiple sequence alignments (fig. 1).

Conclusions

Although several software packages are available for the standalone exploration of trees (Letunic and Bork 2007; Huson and Scornavacca 2012; Asnicar et al. 2015) and the programmatic manipulation of data (Paradis et al. 2004; Knight et al. 2007; Sukumaran and Holder 2010; Vos et al. 2011; Talevich et al. 2012), ETE offers a unified framework to compute and analyze genome-wide collections of evolutionary data while providing unique visualization capabilities. Moreover, with the recent addition of the command line tools, ETE has significantly broadened its scope, simplifying many common tasks in phylogenomics for both expert and casual users.
  27 in total

1.  Codon-substitution models for heterogeneous selection pressure at amino acid sites.

Authors:  Z Yang; R Nielsen; N Goldman; A M Pedersen
Journal:  Genetics       Date:  2000-05       Impact factor: 4.562

2.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages.

Authors:  Ziheng Yang; Rasmus Nielsen
Journal:  Mol Biol Evol       Date:  2002-06       Impact factor: 16.240

3.  Detecting amino acid sites under positive selection and purifying selection.

Authors:  Tim Massingham; Nick Goldman
Journal:  Genetics       Date:  2005-01-16       Impact factor: 4.562

4.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

5.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation.

Authors:  Ivica Letunic; Peer Bork
Journal:  Bioinformatics       Date:  2006-10-18       Impact factor: 6.937

6.  The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees.

Authors:  Víctor Soria-Carrasco; Gerard Talavera; Javier Igea; Jose Castresana
Journal:  Bioinformatics       Date:  2007-09-22       Impact factor: 6.937

7.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

8.  A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution.

Authors:  Joseph P Bielawski; Ziheng Yang
Journal:  J Mol Evol       Date:  2004-07       Impact factor: 2.395

9.  APE: Analyses of Phylogenetics and Evolution in R language.

Authors:  Emmanuel Paradis; Julien Claude; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

10.  PyCogent: a toolkit for making sense from sequence.

Authors:  Rob Knight; Peter Maxwell; Amanda Birmingham; Jason Carnes; J Gregory Caporaso; Brett C Easton; Michael Eaton; Micah Hamady; Helen Lindsay; Zongzhi Liu; Catherine Lozupone; Daniel McDonald; Michael Robeson; Raymond Sammut; Sandra Smit; Matthew J Wakefield; Jeremy Widmann; Shandy Wikman; Stephanie Wilson; Hua Ying; Gavin A Huttley
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  542 in total

1.  Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone.

Authors:  Alison Cloutier; Timothy B Sackton; Phil Grayson; Michele Clamp; Allan J Baker; Scott V Edwards
Journal:  Syst Biol       Date:  2019-11-01       Impact factor: 15.683

2.  The SrrAB two-component system regulates Staphylococcus aureus pathogenicity through redox sensitive cysteines.

Authors:  Nitija Tiwari; Marisa López-Redondo; Laura Miguel-Romero; Katarina Kulhankova; Michael P Cahill; Phuong M Tran; Kyle J Kinney; Samuel H Kilgore; Hassan Al-Tameemi; Christine A Herfst; Stephen W Tuffs; John R Kirby; Jeffery M Boyd; John K McCormick; Wilmara Salgado-Pabón; Alberto Marina; Patrick M Schlievert; Ernesto J Fuentes
Journal:  Proc Natl Acad Sci U S A       Date:  2020-04-30       Impact factor: 11.205

3.  A hypervariable mitochondrial protein coding sequence associated with geographical origin in a cosmopolitan bloom-forming alga, Heterosigma akashiwo.

Authors:  Aiko Higashi; Satoshi Nagai; Sergio Seoane; Shoko Ueki
Journal:  Biol Lett       Date:  2017-04       Impact factor: 3.703

4.  Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors:  Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal:  Mol Biol Evol       Date:  2017-10-01       Impact factor: 16.240

5.  DNA Conformation Induces Adaptable Binding by Tandem Zinc Finger Proteins.

Authors:  Anamika Patel; Peng Yang; Matthew Tinkham; Mihika Pradhan; Ming-An Sun; Yixuan Wang; Don Hoang; Gernot Wolf; John R Horton; Xing Zhang; Todd Macfarlan; Xiaodong Cheng
Journal:  Cell       Date:  2018-03-15       Impact factor: 41.582

6.  Molecular bases of an alternative dual-enzyme system for light color acclimation of marine Synechococcus cyanobacteria.

Authors:  Théophile Grébert; Adam A Nguyen; Suman Pokhrel; Kes Lynn Joseph; Morgane Ratin; Louison Dufour; Bo Chen; Allissa M Haney; Jonathan A Karty; Jonathan C Trinidad; Laurence Garczarek; Wendy M Schluchter; David M Kehoe; Frédéric Partensky
Journal:  Proc Natl Acad Sci U S A       Date:  2021-03-02       Impact factor: 11.205

7.  Selection Has Countered High Mutability to Preserve the Ancestral Copy Number of Y Chromosome Amplicons in Diverse Human Lineages.

Authors:  Levi S Teitz; Tatyana Pyntikova; Helen Skaletsky; David C Page
Journal:  Am J Hum Genet       Date:  2018-08-02       Impact factor: 11.025

8.  Molecular Evolution of rbcL in Orthotrichales (Bryophyta): Site Variation, Adaptive Evolution, and Coevolutionary Patterns of Amino Acid Replacements.

Authors:  Moisès Bernabeu; Josep A Rosselló
Journal:  J Mol Evol       Date:  2021-02-20       Impact factor: 2.395

9.  Improved bacterial recombineering by parallelized protein discovery.

Authors:  Timothy M Wannier; Akos Nyerges; Helene M Kuchwara; Márton Czikkely; Dávid Balogh; Gabriel T Filsinger; Nathaniel C Borders; Christopher J Gregg; Marc J Lajoie; Xavier Rios; Csaba Pál; George M Church
Journal:  Proc Natl Acad Sci U S A       Date:  2020-05-28       Impact factor: 11.205

10.  Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow.

Authors:  Milan Malinsky; Hannes Svardal; Alexandra M Tyers; Eric A Miska; Martin J Genner; George F Turner; Richard Durbin
Journal:  Nat Ecol Evol       Date:  2018-11-19       Impact factor: 15.460

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.