| Literature DB >> 24990571 |
Todd H Oakley1, Markos A Alexandrou, Roger Ngo, M Sabrina Pankey, Celia K C Churchill, William Chen, Karl B Lopker.
Abstract
BACKGROUND: Phylogenetic tools and 'tree-thinking' approaches increasingly permeate all biological research. At the same time, phylogenetic data sets are expanding at breakneck pace, facilitated by increasingly economical sequencing technologies. Therefore, there is an urgent need for accessible, modular, and sharable tools for phylogenetic analysis.Entities:
Mesh:
Year: 2014 PMID: 24990571 PMCID: PMC4227113 DOI: 10.1186/1471-2105-15-230
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
New tool wrappers developed thus far for phylogenetic analyses in Galaxy
| Get data | Get GB | * | Grab Genbank data from a text list of accession numbers |
| | Get GB sp | * | Grab all GenBank data from a text list of species |
| | PhyLoTA with TaxID | * | Pull all genetic data from PhyLoTA using a GenBank Taxonomy ID |
| | Generate from PhyLoTA | * | Pull phylogenies and genetic data from PhyLoTA with species list |
| | GenBank strip | * | Extracts sequences from GenBank files by gene name |
| | GB gene summary | * | Summarizes gene names in a GenBank flatfile |
| | Get Sequences | * | Creates a file of selected sequences |
| Orthologs | EvolMap | [ | Uses species tree and gene distances to determine orthologs and paralogs |
| | HaMStR | [ | Pulls orthologous genes from an input file based on HMM gene models |
| | HMMbuild | [ | Constructs Hidden Markov Models from aligned sequences |
| | HMMsearch | [ | Searches for similar genes using HMM models |
| Alignment | Phytab-MUSCLE | [ | Implements MUSCLE multiple sequence alignment for multiple gene families in parallel |
| | Phytab-PRANK | [ | Implements PRANK phylogeny aware multiple sequence alignment |
| | Mview | [ | Converts an aligned sequences file in fasta format to html for visualization |
| | Phytab-MAFFT | [ | Implements MUSCLE multiple sequence alignment for multiple gene families in parallel |
| | Alicut and Aliscore | [ | Implements Alicut and Aliscore to prune ambiguous alignments for multiple gene families in parallel |
| | Gblocks | [ | Implements gblocks to prune ambiguous alignments |
| | Phytab- Similar Sequence Remover | * | Removes percentage of similar sequences using Phytab input |
| | Sequence Gap Remover | * | Removes gaps from columns of an aligned phylip file |
| | Trimming Sites | [ | Allows user to delete sites from an alignment based on percentage threshold |
| | Phylocatenator | * | Concatenates phytab datasets based on user-specified criteria and writes phylipE format. Also produces partition file for RAxML |
| | Fasconcat | [ | Concatenates input sequence files using Phylip, Clustal or FASTA input |
| Phyloconversion | tnt2table | * | Converts TNT file format from Morphobank into phytab format |
| | fasta2phylipE | * | Converts fasta format to phylipE format |
| | Beautifyfasta | * | Converts fasta interleaved format to sequential |
| | Addstring2fashead | * | Converts fasta file with sequences from same species and gene family to phytab format |
| | Length Outliers | * | Identifies sequences shorter than average in FASTA file |
| | Vert_tree_format | [ | Convert between phylogenetic tree file formats |
| | Prune Phytab using list | * | Filters Phytab dataset based on user provided list |
| | Removes Phytab dupes | * | Finds duplicates in Phytab file |
| Phylogenies | RAxML | [ | Implements maximum likelihood (ML) search for optimal phylogeny |
| | Phytab-RAxML-Parsimony | [ | Searches for MP phylogeny of multiple data partitions simultaneously |
| | Phytab-RAxML | [ | Searches for ML phylogeny of multiple data partitions simultaneously |
| | Phytab-RAxML using starting trees | [ | Optimizes branch lengths on a starting tree. Multiple partitions simultaneously |
| | BEAST | [ | Executes xml for Bayesian phylogenetic analysis |
| | RAxML-Place Fossil | [ | Finds fossil position on a tree using morphological data and input phylogeny |
| | NJst | [ | Produces species tree from input of multiple gene trees |
| | RAxML Place reads | [ | Uses RAxML to place sequence reads onto an existing phylogeny |
| | RAxML Parsimony | [ | Uses RAxML to calculate a parsimony tree |
| | Phytab clearcut | [ | Generate Neighbor Joining phylogeny. Input can be FASTA or Phytab format |
| | ProtTest | [ | Selection of best-fit models of protein evolution |
| | jModelTest | [ | Selection of best-fit models of nucleotide evolution |
| | tab2trees | * | Produces phylogeny graphics, one tree per page, from multiple data partitions or data sets |
| Phylographics | PDpairs | * | Calculates phylogenetic distances for pairs of species on a phylogeny |
| Phylostatistics | Phytab LB pruner | | Identify genes on very long branches |
| | Long Branch Finder | * | Identifies terminal branches on multiple gene trees which exceed a threshold |
| | Phylomatic | [ | Implements phylomatic program |
| | Tree Support | [ | Calculates support for nodes of a single tree (bootstrap) using a file of multiple trees |
| | Branch Attachment Frequency | [ | Identifies lineage movement in a set of trees |
| | Leaf Stability | [ | Reports leaf stability indices for taxa in tree/trees |
| | TreeAnnotator | [ | Calculates summary statistics from posterior distribution of bayesian trees |
| | Prune Taxa | [ | Removing taxa from a tree or multiple trees |
| | Thinning Trees | [ | Sub-sample trees from a posterior distribution |
| SHtest | [ | Uses RAxML to compute an SHtest to compare trees |
*Tools developed for Osiris.
Figure 1Phylocatenator matrix. Part of the output from Phylocatenator is shown as an html table representing gene coverage across species. The table contains the gene name, the model assigned to each partition (if that information is provided by the user prior to while running Phylocatenator), and the presence (black) or absence (white) of the gene for each taxon.
Figure 2Workflow. Here we show a workflow constructed in Galaxy’s workflow editor. The analysis starts with the input dataset (1), which in this instance would be an unaligned tab-delimited, four-column phytab file. The raw phytab file then gets aligned (2) using phytab-MUSCLE, which will implement a multiple sequence alignment on each gene individually. After alignment, we implement a masking step (3) using phytab-aliscorecut, which will remove ambiguous regions separately from each gene. The data are now ready to be concatenated using phylocatenator (4A), and used to reconstruct a phylogeny with RAxML (5A). Alternatively or simultaneously, the data from step 3 can be used to estimate a separate phylogeny for each gene using phytab-RAxML (4B), which stores all gene trees in tabular format. Subsequently, gene trees can be used to estimate a species tree using NJst (5B), and/or all gene trees can be plotted individually for visual inspection using tab2trees (5C). Finally, any resulting tree file in Newick format can be plotted using TreeVector (6).
Figure 3All tools. Here we show the various analyses Osiris in Galaxy is capable of. Different tools depicted in this figure can be combined to create complex phylogenetic workflows starting with a wide range of input files. Tools for which we created wrappers as part of this publication are in italics, while existing tools already available in Galaxy are denoted with an asterisk. Each box depicts an analysis category or different stage in a potential workflow. Lines connecting the boxes show different ways these tools can be combined based on input and output formats.