Literature DB >> 23661681

STRAW: Species TRee Analysis Web server.

Timothy I Shaw1, Zheng Ruan, Travis C Glenn, Liang Liu.   

Abstract

The coalescent methods for species tree reconstruction are increasingly popular because they can accommodate coalescence and multilocus data sets. Herein, we present STRAW, a web server that offers workflows for reconstruction of phylogenies of species using three species tree methods-MP-EST, STAR and NJst. The input data are a collection of rooted gene trees (for STAR and MP-EST methods) or unrooted gene trees (for NJst). The output includes the estimated species tree, modified Robinson-Foulds distances between gene trees and the estimated species tree and visualization of trees to compare gene trees with the estimated species tree. The web sever is available at http://bioinformatics.publichealth.uga.edu/SpeciesTreeAnalysis/.

Entities:  

Mesh:

Year:  2013        PMID: 23661681      PMCID: PMC3692081          DOI: 10.1093/nar/gkt377

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Understanding phylogenetic relationships among taxa and genes is critical to the correct interpretation of many issues in biology, ranging from systematics to infectious diseases. As phylogenomic data become increasingly available, it has been hoped that the tree of life would be resolved using genome-scale data (1). One of the challenges facing phylogenomic analysis is the observation of a tremendous amount of variation in gene trees estimated from multilocus sequence data (2). This observation stimulated research on the estimation of species-level phylogenies (i.e. species trees) by taking into account variation at the level of individual genes (3–6). The past few years have witnessed a fast expansion of species tree reconstruction methods. Phylogenetic programs MP-EST (7), STAR (8), NJst (9) developed under the coalescent model (3) have been widely used for estimating species-level phylogenies (10). A major strength of these three methods is that they are computationally tractable, even for data sets that are large (10), and thus are amendable to making an open resource for the research community with only modest hardware requirements. Many additional phylogenetic programs have been developed for species tree reconstruction, such as *BEAST (11), BEST (4) and STEM (5), but these methods are computationally intensive and thus are not amenable to an open resource built on modest hardware. MP-EST, STAR and NJst use gene trees estimated from DNA sequence data to infer species trees. Uncertainty of the estimated gene trees is incorporated in estimation of species trees using bootstrap techniques. In the MP-EST method, species trees are estimated from a collection of rooted gene trees by maximizing a pseudo-likelihood function of triplets in the species tree. The STAR method uses average ranks of gene coalescence times to build species trees from a set of rooted gene trees. The STAR method is implemented by building a Neighbor Joining (NJ) tree (12) from a distance matrix in which the entries are twice the average ranks across gene trees. In contrast to MP-EST and STAR, the NJst method is able to use unrooted gene trees to infer the phylogenies of species. All three methods can quickly estimate species trees even for large-scale phylogenomic data and they are statistically consistent under the coalescent model (13). The three methods are fairly robust to a limited amount of horizontal transfer as well as deviations from a molecular clock because some small values of coalescence times due to horizontal transfer or rate variation in particular genes do not have major effects on the average ranks and the frequencies of gene tree triplets when the number of genes is moderate or large (10). A comparison of the three methods is given in Table 1.
Table 1.

Comparison of three coalescent-based species tree reconstruction methods available to users of STRAW

MP-ESTSTARNJst
InputRooted binary gene treesRooted binary gene treesUnrooted binary gene trees
Can estimate topology?YesYesYes
Can estimate branch lengths?YesNoNo
Branch unitsCoalescence unitsNANA
Runtime (50 taxa 100 genes)1656 s9 s46 s
Programming languageCRR
Reference number8910
Comparison of three coalescent-based species tree reconstruction methods available to users of STRAW

WEB SERVER

The Species TRee Analysis Web server (STRAW) provides a user-friendly web interface specifically for MP-EST, STAR and NJst analyses. STRAW consists of a series of species tree algorithms and input data processing and analysis visualization tools including (i) rooting gene trees with outgroup species, (ii) building STAR, MP-EST, NJst trees, (iii) comparing gene trees with the estimated species tree and (iv) bootstrap analyses. The MP-EST algorithm is written in the C programming language and is available as a standalone binary at http://code.google.com/p/mp-est/, whereas STAR and NJst are implemented in an R package (Phybase) available at http://code.google.com/p/phybase/. The STRAW web server is implemented through a combination of php, perl and java programs. The front end of the server is implemented through standard HTML markup language using javascript and the jQuery library. The server runs as a dedicated Linux machine with eight 2.8 GHz Intel i7 processor cores and 8 GB of RAM.

Server input and workflow

For the MP-EST and STAR methods, the input gene trees must be bifurcating rooted trees in Newick format, for example, the ML trees generated from PHYML (14), RAXML (15) or PHYLIP (16), and rooted with the outgroup species. The input gene trees for NJst are either rooted trees or unrooted trees. The MP-EST and STAR methods can handle missing taxa in gene trees. Thus, it is fine if some genes for some of the species are missing in the input data. The user must provide a species–allele table to indicate the relationship between alleles and species (i.e. which alleles belong to which species). For example, the following gene trees have taxa A1–6. (((((A1:0.1,A2:0.7):0.1,A3:0.5):0.1,A4:0.2):0.9,A6:0.4):0.1,A5:0.8); (((((A1:0.2,A2:0.2):0.1,A4:0.3):0.1,A5:0.7):0.2,A3:0.1):0.1,A6:0.7); (((((A2:0.4,A1:0.1):0.1,A6:0.7):0.1,A3:0.8):0.1,A5:0.1):0.1,A4:0.6); Suppose A1 and A2 were sampled from Human, A3 and A4 were sampled from Ape, A5 was sampled from Gorilla and A6 was sampled from Chimpanzee. Then the species–allele table should be (row order is arbitrary) Each line specifies ‘the species name’, ‘number of alleles’ and ‘the names of the alleles’. To assist users with construction of species allele tables, the program SpeciesAlleleTableCreator can generate an example input file, which assumes a one to one correspondence between species to allele information. The SpeciesAlleleTableCreator program is designed for the user to edit the allele information before passing it to the species tree algorithms (Figure 1). Under the circumstance that no Species Allele Table is provided to the species tree algorithms, the program will assume the name for each allele as individual species (one to one correspondence between species and alleles). For MP-EST and STAR methods, a rooted tree is required as input. Thus, we provide functionality for rooting the tree via the program RerootTreeInput (Figure 1). The user will need to indicate the outgroup for rooting the tree. Bootstraps of gene trees can be uploaded to the server through a zip folder. Each file in the zip folder contains bootstrapped gene trees for a single gene. We implement a multilocus bootstrap method based on Seo et al. (17).
Figure 1.

Workflow for Species Tree Construction. To run the species tree algorithm, Newick gene trees and species allele information needs to be provided. We provide the user the capability to create a species to allele table. For MP-EST and STAR, gene trees need to be rerooted to particular outgroup before running.

Workflow for Species Tree Construction. To run the species tree algorithm, Newick gene trees and species allele information needs to be provided. We provide the user the capability to create a species to allele table. For MP-EST and STAR, gene trees need to be rerooted to particular outgroup before running.

Server output

The output of the STAR, MP-EST and NJst analyses includes the estimated species trees in Newick format, which are also presented to the user via a web page containing a circular phylogenetic tree generated by jsPhyloSVG (18). The SVG phylogenetic tree is downloadable by the user for publication purposes. Figure 2A is an example showing the NJst-generated species tree from data of Shaw et al. (19), including 2378 gene loci for bat, cow, dog, horse, human and mouse. As part of the output, we generate a report to compare each gene tree against the estimated species tree (Figure 2B). Within the report, we computed the Robinson and Foulds (RF) topological distance (20) between gene trees and the estimated species tree. The RF topological distance measures the tree similarity; the lower the number the greater the similarity between the gene tree and the estimated species tree. We modified the RF distance to allow missing taxa by first finding the common taxa that appear on both trees, then both trees are pruned to have only the common taxa and finally the RF distance is calculated for the two pruned trees with the same set of taxa. We also include gene tree species tree comparison plot. The gene tree species tree comparison plot uses function cophyloplot from an R package APE (21) and plot two trees face to face with links between the tips (Figure 2C). For MP-EST we calculate triple distance between gene trees and the estimated species tree. The server provides an additional functionality of comparing the gene tree and species tree using ‘compareInter2tips’ Bio.Python (22). Gene trees with conflicting branches (with species tree) are colored blue, and branches that are the same are colored gray (Figure 2D).
Figure 2.

Species tree and gene tree for the Jamaican Fruit Bat compared with human, mouse, cow, horse and dog. (A) A NJst tree from 2378 gene loci placing bats sister to Perissodactyla, Cetartiodactyla and Carnivora. (B) A table is presented listing the RF distance, triple distance and number of missing taxa. (C) We also place gene tree and species tree side by side with matching node tip mapped to each other. (D) For each gene tree, mismatching branches (compared with species tree) are colored blue, and similar branches are colored gray.

Species tree and gene tree for the Jamaican Fruit Bat compared with human, mouse, cow, horse and dog. (A) A NJst tree from 2378 gene loci placing bats sister to Perissodactyla, Cetartiodactyla and Carnivora. (B) A table is presented listing the RF distance, triple distance and number of missing taxa. (C) We also place gene tree and species tree side by side with matching node tip mapped to each other. (D) For each gene tree, mismatching branches (compared with species tree) are colored blue, and similar branches are colored gray.

CONCLUSION

STRAW is a useful web application for estimating species trees. The server provides a user-friendly web interface for three coalescent programs (MP-EST, STAR, NJst), along with phylogenetic tools for visualizing trees, calculating tree distances and rooting gene trees. Our web server tools are most useful in species with disagreeing gene trees and it is able to make significant contribution in resolving the systematic problem of heterogeneity in gene trees in terms of topology or branch length. Through the different web server results, we can help develop hypotheses for distinguishing deep coalescence and branch length heterogeneity for both gene trees and species trees alike. The server does not require registration and provides open access to the research community.

FUNDING

The web server is graciously maintained on hardware of the University of Georgia’s College of Public Health. Funding for open access charge: Start-up funds from University of Georgia for early career promotion (to L.L.); National Science Foundation (DEB-1242241 and DEB-1136626 to T.C.G.; DMS-1222745 to L.L.). Conflict of interest statement. None declared.
Human2A1 A2
Ape2A3 A4
Gorilla1A5
Chimpanzee1A6
  20 in total

1.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Genetics       Date:  2003-08       Impact factor: 4.562

2.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees.

Authors:  A Stamatakis; T Ludwig; H Meier
Journal:  Bioinformatics       Date:  2004-12-17       Impact factor: 6.937

3.  Calculating bootstrap probabilities of phylogeny using multilocus sequence data.

Authors:  Tae-Kun Seo
Journal:  Mol Biol Evol       Date:  2008-02-14       Impact factor: 16.240

4.  Broad phylogenomic sampling improves resolution of the animal tree of life.

Authors:  Casey W Dunn; Andreas Hejnol; David Q Matus; Kevin Pang; William E Browne; Stephen A Smith; Elaine Seaver; Greg W Rouse; Matthias Obst; Gregory D Edgecombe; Martin V Sørensen; Steven H D Haddock; Andreas Schmidt-Rhaesa; Akiko Okusu; Reinhardt Møbjerg Kristensen; Ward C Wheeler; Mark Q Martindale; Gonzalo Giribet
Journal:  Nature       Date:  2008-03-05       Impact factor: 49.962

5.  Is a new and general theory of molecular systematics emerging?

Authors:  Scott V Edwards
Journal:  Evolution       Date:  2009-01       Impact factor: 3.694

6.  BEST: Bayesian estimation of species trees under the coalescent model.

Authors:  Liang Liu
Journal:  Bioinformatics       Date:  2008-09-17       Impact factor: 6.937

7.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

8.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence.

Authors:  Laura S Kubatko; Bryan C Carstens; L Lacey Knowles
Journal:  Bioinformatics       Date:  2009-02-10       Impact factor: 6.937

9.  APE: Analyses of Phylogenetics and Evolution in R language.

Authors:  Emmanuel Paradis; Julien Claude; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

10.  Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis).

Authors:  Timothy I Shaw; Anuj Srivastava; Wen-Chi Chou; Liang Liu; Ann Hawkinson; Travis C Glenn; Rick Adams; Tony Schountz
Journal:  PLoS One       Date:  2012-11-15       Impact factor: 3.240

View more
  26 in total

1.  Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes.

Authors:  Steven B Cannon; Michael R McKain; Alex Harkess; Matthew N Nelson; Sudhansu Dash; Michael K Deyholos; Yanhui Peng; Blake Joyce; Charles N Stewart; Megan Rolf; Toni Kutchan; Xuemei Tan; Cui Chen; Yong Zhang; Eric Carpenter; Gane Ka-Shu Wong; Jeff J Doyle; Jim Leebens-Mack
Journal:  Mol Biol Evol       Date:  2014-10-27       Impact factor: 16.240

2.  A transcriptome-based resolution for a key taxonomic controversy in Cupressaceae.

Authors:  Kangshan Mao; Markus Ruhsam; Yazhen Ma; Sean W Graham; Jianquan Liu; Philip Thomas; Richard I Milne; Peter M Hollingsworth
Journal:  Ann Bot       Date:  2019-01-01       Impact factor: 4.357

3.  A transcriptome-based study on the phylogeny and evolution of the taxonomically controversial subfamily Apioideae (Apiaceae).

Authors:  Jun Wen; Yan Yu; Deng-Feng Xie; Chang Peng; Qing Liu; Song-Dong Zhou; Xing-Jin He
Journal:  Ann Bot       Date:  2020-05-13       Impact factor: 4.357

4.  Origin and Evolution of Diploid and Allopolyploid Camelina Genomes Were Accompanied by Chromosome Shattering.

Authors:  Terezie Mandáková; Milan Pouch; Jordan R Brock; Ihsan A Al-Shehbaz; Martin A Lysak
Journal:  Plant Cell       Date:  2019-08-26       Impact factor: 11.277

5.  Nuclear and Mitochondrial Phylogenomics of the Sifakas Reveal Cryptic Variation in the Diademed Sifaka.

Authors:  Melissa T R Hawkins; Carolyn A Bailey; Allyshia M Brown; Jen Tinsman; Ryan A Hagenson; Ryan R Culligan; Adena G Barela; Jean C Randriamanana; Jean F Ranaivoarisoa; John R Zaonarivelo; Edward E Louis
Journal:  Genes (Basel)       Date:  2022-06-07       Impact factor: 4.141

6.  Cryptic diversity in Rhampholeon boulengeri (Sauria: Chamaeleonidae), a pygmy chameleon from the Albertine Rift biodiversity hotspot.

Authors:  Daniel F Hughes; Krystal A Tolley; Mathias Behangana; Wilber Lukwago; Michele Menegon; J Maximilian Dehling; Jan Stipala; Colin R Tilbury; Arshad M Khan; Chifundera Kusamba; Eli Greenbaum
Journal:  Mol Phylogenet Evol       Date:  2017-12-02       Impact factor: 4.286

7.  Molecular phylogeny of the subfamily Stevardiinae Gill, 1858 (Characiformes: Characidae): classification and the evolution of reproductive traits.

Authors:  Andréa T Thomaz; Dahiana Arcila; Guillermo Ortí; Luiz R Malabarba
Journal:  BMC Evol Biol       Date:  2015-07-21       Impact factor: 3.260

8.  A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: A phylogenomic example from Heuchera (Saxifragaceae).

Authors:  Ryan A Folk; Jennifer R Mandel; John V Freudenstein
Journal:  Appl Plant Sci       Date:  2015-08-14       Impact factor: 1.936

9.  How Many Genes are Needed to Resolve Phylogenetic Incongruence?

Authors:  Bin Ai; Ming Kang
Journal:  Evol Bioinform Online       Date:  2015-08-10       Impact factor: 1.625

10.  PhyBin: binning trees by topology.

Authors:  Ryan R Newton; Irene L G Newton
Journal:  PeerJ       Date:  2013-10-22       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.