Literature DB >> 16503967

Taxonomic colouring of phylogenetic trees of protein sequences.

Gareth Palidwor1, Emmanuel G Reynaud, Miguel A Andrade-Navarro.   

Abstract

BACKGROUND: Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common.
RESULTS: We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree.
CONCLUSION: PhyloView can be used at http://www.ogic.ca/projects/phyloview/. A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16503967      PMCID: PMC1386715          DOI: 10.1186/1471-2105-7-79

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Phylogenetic trees based upon multiple sequence alignments of proteins from many species are commonly used to determine the evolutionary relationships between homologous sequences, which can give insights into the evolution of a protein family and the functional specificity of the members of the family [1]. It is expected that the phylogenetic tree reflects the events of duplication and speciation of proteins. Through speciation, related proteins in different organisms are generated that reflect the taxonomic relations of those organisms. However, cases of phylogenetic associations being at odds with known taxonomy can be interesting anomalies worthy of investigation, perhaps indicating problems in the generation of the multiple sequence alignment or events of lateral gene transfer [2]. Also, events of gene duplication (that may define subfamilies with subtle variations of the common functional theme of the family) can be observed by the repetition of a taxonomic structure at multiple places in the tree. Unfortunately, available software for rendering phylogenetic trees does not provide a simple means of automatically retrieving taxonomic information for the sequences represented in the tree, or of graphically representing arbitrary taxonomic properties of trees, thus allowing the study of the relation of a phylogenetic tree with the taxonomic relations between the species represented in the tree. PhyloView was developed in an attempt to address this limitation and provide a simple and generic means of doing this.

Implementation

PhyloView was written as a web application in Perl 5.6.1 using the Perl modules BioPerl [3], Parse::RecDescent, SVG and CGI. Full source code for the program is available at the PhyloView web site [4], has been deposited at SourceForge.net [5] and is licensed under the GPL [6].

Results and discussion

Upon initial loading the script provides a form where the user may upload a phylogenetic tree in Newick (New Hampshire) format, used by most phylogenetic packages (e.g., [7]). The tree should contain SwissProt, SpTrembl [8], or GenBank GI protein identifiers [9] in the leaf node names. The associated records are then dynamically retrieved from the appropriate database online over the internet and the associated taxonomic information is extracted. Processing times for the initial upload of a tree vary with the number of sequences in the tree, the load on the server and the response speed of the public databases queried. For example, a tree with 600 sequences (which in our experience is pretty large for a phylogenetic tree) takes about one minute to load. Subsequent processing times for the same tree tend to be much shorter as the taxonomic information associated with the protein sequence identifiers is cached and no further queries of public databases are required. To allow users to show their own identifiers in the tree, we use the following internal format for names: "DBID*YOURID" where DBID is the database identifier used by PhyloView to extract the taxonomic information, YOURID is the identifier to be displayed in the tree, and * is a user defined separator (the default symbol is "/"). Optionally, the DBID can be removed from the rendered tree leaving the user identifiers. The provided Newick tree is parsed by the Perl RecDescent module. The protein IDs of the Newick tree leaves are extracted and the associated taxonomic identities are extracted from the SwissProt, SpTrembl, and GenBank databases. A taxonomic tree is then generated containing the aggregate taxonomic information of the tree leaves and represented on the resulting web page as clickable tree menu with a JavaScript colour picker at every node (see Figure 1). This mechanism allows the user to associate colours to arbitrary taxonomic groups, with the initial defaults being Eukarya:blue, Bacteria:green, Archaea:red, and unknown:black.
Figure 1

PhyloView web interface. The phylogenetic tree is input on the top left window. Bottom-left: summary of the taxonomic levels present in the tree (with number of sequences in each within brackets) that can be expanded and contracted at will. A colour picker allows the association of a colour with any taxonomic level. Right: interpretation of a tree. In this example, a multiple sequence alignment of putative transcription initiation factor 2, gamma subunit, and related sequences, is used to illustrate PhyloView (the example is available at the web site). Colouring chosen is: Archaea:red; Bacteria:pink; Cyanobacteria:light pink; Eukarya:blue; Viridiplantae:green; Mammals:light blue. Repeating phylogenetic structures make obvious the existence of two subfamilies (IF2G, and a hypothetical IF2P), and the presence of three outliers (top: three GTPases of unknown function, wrongly included in the alignment). The plant sequence that groups with the Cyanobacteria (IF2C_ARATH) is a chloroplast IF2G. The eukaryotic members that group with bacteria (IF2M) are mitochondrial IF2Gs. Recent duplications of mammalian IF2Gs are also apparent.

PhyloView web interface. The phylogenetic tree is input on the top left window. Bottom-left: summary of the taxonomic levels present in the tree (with number of sequences in each within brackets) that can be expanded and contracted at will. A colour picker allows the association of a colour with any taxonomic level. Right: interpretation of a tree. In this example, a multiple sequence alignment of putative transcription initiation factor 2, gamma subunit, and related sequences, is used to illustrate PhyloView (the example is available at the web site). Colouring chosen is: Archaea:red; Bacteria:pink; Cyanobacteria:light pink; Eukarya:blue; Viridiplantae:green; Mammals:light blue. Repeating phylogenetic structures make obvious the existence of two subfamilies (IF2G, and a hypothetical IF2P), and the presence of three outliers (top: three GTPases of unknown function, wrongly included in the alignment). The plant sequence that groups with the Cyanobacteria (IF2C_ARATH) is a chloroplast IF2G. The eukaryotic members that group with bacteria (IF2M) are mitochondrial IF2Gs. Recent duplications of mammalian IF2Gs are also apparent. Once the colours have been chosen, resubmitting the form will render a new tree where the various nodes and branches are coloured based upon the above choice. The taxonomic colouring algorithm is such that every branch of the tree receives the colour assigned to the taxonomic group with most members under that branch. In case of a tie between assignments, the more specific one is given precedence (for example, Viridiplantae over Eukarya). Colouring of a given branch only happens if more than 50% of the sequences under that branch belong to a single taxonomic group with an assigned colour. Mouse-over of the phylogenetic tree leaf nodes in SVG mode creates floating tool-tip type output with full taxonomic information for the sequence. The preferred form of output for the tree is an SVG image. SVG is an XML based standard for vector graphics. Though not natively supported by most browsers, a number of plug-ins is freely available, for example the Adobe SVG viewer [10]. We plan to extend PhyloView as a visualization framework for enhancing sequence phylogenetic tree images with associated data. We welcome feedback and proposals for additional features from users.

Conclusion

PhyloView is the first web server dedicated to colouring according to taxonomy of phylogenetic trees. There is other software that may be used to attain similar results but with considerably more effort. For example, Mesquite [11] (an open source modular software system for evolutionary analysis written in Java) and MacClade [12] (a commercial computer program for phylogenetic analysis that runs only on MacOS), allow the manual colouring of the branches of a phylogenetic tree, but these are complicated general purpose programs and achieving this is a laborious and complicated process. PhyloView is intended to streamline and simplify this, allowing the user to rapidly explore different combinations of colours and taxonomic partitions for the best visual result.

Availability and requirements

PhyloView requires Internet Explorer with the Adobe SVG viewer plug-in and can be used at [4]. Source code is available from that location as well.

Authors' contributions

MA and ER conceived the tool. GP implemented the tool. GP and MA drafted the manuscript. All authors tested the tool during its development, and read and approved the final manuscript. MA was previously known as Miguel A. Andrade.
  7 in total

1.  PhyloDraw: a phylogenetic tree drawing system.

Authors:  J H Choi; H Y Jung; H S Kim; H G Cho
Journal:  Bioinformatics       Date:  2000-11       Impact factor: 6.937

2.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

3.  Interactive analysis of phylogeny and character evolution using the computer program MacClade.

Authors:  W P Maddison; D R Maddison
Journal:  Folia Primatol (Basel)       Date:  1989       Impact factor: 1.246

4.  Molecules as documents of evolutionary history.

Authors:  E Zuckerkandl; L Pauling
Journal:  J Theor Biol       Date:  1965-03       Impact factor: 2.691

5.  The Universal Protein Resource (UniProt).

Authors:  Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

6.  Database resources of the National Center for Biotechnology Information.

Authors:  David L Wheeler; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Wolfgang Helmberg; David L Kenton; Oleg Khovayko; David J Lipman; Thomas L Madden; Donna R Maglott; James Ostell; Joan U Pontius; Kim D Pruitt; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Steven T Sherry; Karl Sirotkin; Grigory Starchenko; Tugba O Suzek; Roman Tatusov; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

7.  Do orthologous gene phylogenies really support tree-thinking?

Authors:  E Bapteste; E Susko; J Leigh; D MacLeod; R L Charlebois; W F Doolittle
Journal:  BMC Evol Biol       Date:  2005-05-24       Impact factor: 3.260

  7 in total
  7 in total

1.  TreeVector: scalable, interactive, phylogenetic trees for the web.

Authors:  Ralph Pethica; Gary Barker; Tim Kovacs; Julian Gough
Journal:  PLoS One       Date:  2010-01-28       Impact factor: 3.240

Review 2.  Ancient Evolution and Recent Evolution Converge for the Biodegradation of Cyanuric Acid and Related Triazines.

Authors:  Jennifer L Seffernick; Lawrence P Wackett
Journal:  Appl Environ Microbiol       Date:  2016-01-04       Impact factor: 4.792

3.  jsPhyloSVG: a javascript library for visualizing interactive and vector-based phylogenetic trees on the web.

Authors:  Samuel A Smits; Cleber C Ouverney
Journal:  PLoS One       Date:  2010-08-18       Impact factor: 3.240

4.  ColorPhylo: A Color Code to Accurately Display Taxonomic Classifications.

Authors:  Sylvain Lespinats; Bernard Fertil
Journal:  Evol Bioinform Online       Date:  2011-11-13       Impact factor: 1.625

5.  PHY.FI: fast and easy online creation and manipulation of phylogeny color figures.

Authors:  Jakob Fredslund
Journal:  BMC Bioinformatics       Date:  2006-06-22       Impact factor: 3.169

6.  MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

Authors:  Shu-Chuan Chen; Aaron Ogata
Journal:  PLoS One       Date:  2015-03-31       Impact factor: 3.240

7.  Detection and diversity of the mannosylerythritol lipid (MEL) gene cluster and lipase A and B genes of Moesziomyces antarcticus isolated from terrestrial sites chronically contaminated with crude oil in Trinidad.

Authors:  Amanda C Ramdass; Sephra N Rampersad
Journal:  BMC Microbiol       Date:  2022-02-04       Impact factor: 3.605

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.