| Literature DB >> 26819408 |
Roland F Schwarz1, Asif U Tamuri2, Marek Kultys3, James King3, James Godwin3, Ana M Florescu3, Jörg Schultz4, Nick Goldman5.
Abstract
Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).Entities:
Mesh:
Year: 2016 PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Sequence Bundles (top) and pLogo (bottom) representation of an alignment of 1000 sequences with 500 instances of AAAAA and 500 instances of TTTTT. (B) The same visualizations rendered on a 1000 sequence alignment with 500 instances of AATTT and 500 instances of TTAAA. The pLogo representations in (A and B) are identical and reflect only the identical sitewise nucleotide frequencies in the two examples. (The different ordering of letters is a result of the chosen genomic background [human whole-genome].) Sequence Bundles clearly show the two sequence motifs in each case, because the nucleotides remain connected in the visualization.
Figure 2.(A) Alvis's Sequence Bundles visualization of the haloacid dehalogenase family. The bundle shows three sequence groups in different colours. Horizontal dependencies are immediately visible. For example, all Ciona sequences (selected in red) have a Met in position 241 and also exclusively have a Glu residue in position 246 and a His at position 248. This information is not available from the standard sequence logo (below). Above the bundle, green shaded markers indicate which sites are most likely responsible for the grouping. In agreement with the original paper (23), site 241 (marked with an orange triangle) is found as being most significant. (B) Alvis's rendering of the associated phylogenetic tree. The group colours match those in the bundle. (C) CA scatterplot computed by Alvis on the same MSA. Sequences are displayed as green points, sites as blue crosses. Selection of sites and sequences in this plot (red) induces the highlighting of the corresponding sites and sequences in the alignment and bundle views. Residue Met-241 is identified as significantly associated with Ciona.
Figure 3.Alvis visualizes an alignment of 1224 mammalian nucleotidyl cyclases. Sequences containing Glu in position 146 are selected (red). None of these sequences contain Asp within the functionally correlated site 305. Further differently conserved sites like 307 and 308 also become apparent.
Figure 4.(A) The visualization of horizontal dependencies is a powerful tool to quickly investigate an alignment for co-dependent sites. Asp-10-containing sequences of calmodulin-dependent protein kinase II show no co-clustering at any of the other non-fully conserved positions. A strong preference for Asp-10 sequences to not have an Arg at position 5 however is visible and statistically significant (binomial test P-value 0.0053). (B) Representation of the same sequences with the pLogo software. In this version Asp-10 is ‘fixed’ (or conditioned on), also showing that there is no correlation between Asp-10 and position 5. However, the preference for avoiding Asp-5 remains hidden. The unfixed version (not shown) fails to capture the sequence motif altogether.