Literature DB >> 22595210

CodonLogo: a sequence logo-based viewer for codon patterns.

Virag Sharma1, David P Murphy, Gregory Provan, Pavel V Baranov.   

Abstract

MOTIVATION: Conserved patterns across a multiple sequence alignment can be visualized by generating sequence logos. Sequence logos show each column in the alignment as stacks of symbol(s) where the height of a stack is proportional to its informational content, whereas the height of each symbol within the stack is proportional to its frequency in the column. Sequence logos use symbols of either nucleotide or amino acid alphabets. However, certain regulatory signals in messenger RNA (mRNA) act as combinations of codons. Yet no tool is available for visualization of conserved codon patterns.
RESULTS: We present the first application which allows visualization of conserved regions in a multiple sequence alignment in the context of codons. CodonLogo is based on WebLogo3 and uses the same heuristics but treats codons as inseparable units of a 64-letter alphabet. CodonLogo can discriminate patterns of codon conservation from patterns of nucleotide conservation that appear indistinguishable in standard sequence logos. AVAILABILITY: The CodonLogo source code and its implementation (in a local version of the Galaxy Browser) are available at http://recode.ucc.ie/CodonLogo and through the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22595210      PMCID: PMC3389775          DOI: 10.1093/bioinformatics/bts295

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

‘Sequence logos’ are simple graphical representations of conserved elements in multiple sequence alignments. Sequence logos were first introduced by Tom Schneider and colleagues (Schneider and Stephens, 1990). However, the popularity of sequence logos was greatly boosted by the advent of WebLogo (Crooks ), which provides a web-based interface for sequence logo generation. WebLogo allows the processing of multiple sequence alignments and generates a logo where each column of the alignment is represented by a stack of letters. The height of the entire stack is proportional to its informational content (maximum—2 bits for nucleotides and 4.32 bits for amino acids), whereas the height of each symbol is proportional to its frequency. Sequence logos inspired development of several other tools that use principles of Shannon's information theory (Shannon, 1948) for graphical visualization of conserved biological elements. For example, RNALogo (Chang ) allows visualization of conservation of nucleotides in the context of secondary RNA structures diagrams. CorreLogo (Bindewald ) generates 3D images that represent not only local conservation of nucleotides but also mutual information, thus allowing for visualization of double-stranded regions in RNA structures, the characteristic signature of which is compensatory mutations (Dixon and Hillis, 1993). BLogo (Li ) allows one to visualize both overrepresented and underrepresented symbols in multiple alignments. Logopaint improves visualization of patterns within alignments of coding regions by removing distortion caused by unequal evolutionary rates for synonymous and non-synonymous substitutions (Schreiber and Brown, 2002). We have been able to identify 13 different tools (data not shown), freely available through the Web that are closely related to the idea behind sequence logos. Despite the impressive fertility of sequence logos, we have not been able to find a single tool that enables visualization of codon patterns. Codons have specific biological meaning during translation. Codons are the units interacting with transfer RNAs (tRNAs) during protein sequence decoding, and on numerous occasions the meaning of synonymous codons is not the same. Synonymous codon substitutions could have drastic effects on such phenomena as programmed ribosomal frameshifting (Baranov ; Namy ), and they also could affect speed (Tuller ) and accuracy of translation (Drummond and Wilke, 2008). Moreover, altered combinations of codons could greatly affect the overall efficiency of translation (Coleman ). Therefore, it is clear that the patterns of codons have biological significance. However, as we show, standard sequence logos are unable to discriminate between conserved patterns of codons and conserved patterns of nucleotides if the nucleotide composition of multiple alignment columns is the same. To overcome the current limitations of sequence logos, we have developed a new tool that we have named CodonLogo.

2 ALGORITHM AND IMPLEMENTATION

CodonLogo is based on WebLogo3. The source code for WebLogo3 (http://WebLogo.threeplusone.com/) has been modified so that the information content is determined across three consecutive columns instead of a single column treating each codon as a member of a 64-symbol alphabet. The information content of a particular codon column in a multiple sequence alignment is determined according to where p is the relative frequency of the ith codon in the particular column of the alignment. IC can be adjusted for background compositional bias and small sample correction. Background models can be provided as frequencies of codons and three (for Homo sapiens, Saccharomyces cerevisiae and Escherichia coli) are distributed with CodonLogo. The source code of CodonLogo is freely available at http://recode.ucc.ie/CodonLogo. CodonLogo also can be used without the need for local installation through the Galaxy browser interface (Blankenberg ; Giardine ; Goecks ) that is available through the above URL and the Galaxy Tool Shed repository at http://toolshed.g2.bx.psu.edu/. CodonLogo requires the CoreBio and NumPy libraries to run locally. As input, CodonLogo accepts multiple alignments in a variety of formats (nbrf, fasta, clustal, phylip, genbank, stockholm, msf, nexus and table format). CodonLogo generates output images in either png, eps, pdf or jpeg. While using CodonLogo, users can specify a reading frame for separating nucleotide sequences on codons. It is possible to limit generation of CodonLogo images to a particular subsection of a multiple alignment.

3 PERFORMANCE

We illustrate the advantages of CodonLogo in comparison to sequence logos in Figure 1. Figure 1A shows an artificial situation, where two alignments are compared. Both alignments have the same nucleotide composition and the same frequency of nucleotides per column. However, in one alignment, codons are conserved (same codons occur in the same column), while in the other alignment codons appear only once in a column. Standard sequence logos are identical for both alignments; however, CodonLogo is able to discriminate between the two situations. Figure 1B illustrates a real example, where the use of CodonLogo is beneficial. In this example, the CodonLogo output was generated for a subsection (857 sequences) of a multiple alignment of insertion sequences from the IS407 family containing a site of programmed ribosomal frameshifting that differs among individual IS elements (Sharma ). As it can be seen in Figure 1B, CodonLogo successfully captures conservation of the patterns. We found this program to be useful in visualization of patterns responsible for recoding as those identified in a recent study (Sharma ).
Fig. 1.

Performance of CodonLogo. (A) In this example, two different multiple alignments are shown with the same nucleotide composition per column. CodonLogo is capable of distinguishing between two different situations that appear indistinguishable with WebLogo. (B) CodonLogo output for an alignment of 857 insertion sequences from the IS407 family requiring programmed ribosomal frameshifting for their expression (see text). CodonLogo output was produced in three different frames as indicated. The site of programmed ribosomal frameshifting is highlighted

Performance of CodonLogo. (A) In this example, two different multiple alignments are shown with the same nucleotide composition per column. CodonLogo is capable of distinguishing between two different situations that appear indistinguishable with WebLogo. (B) CodonLogo output for an alignment of 857 insertion sequences from the IS407 family requiring programmed ribosomal frameshifting for their expression (see text). CodonLogo output was produced in three different frames as indicated. The site of programmed ribosomal frameshifting is highlighted
  16 in total

Review 1.  Recoding: translational bifurcations in gene expression.

Authors:  Pavel V Baranov; Raymond F Gesteland; John F Atkins
Journal:  Gene       Date:  2002-03-20       Impact factor: 3.688

2.  WebLogo: a sequence logo generator.

Authors:  Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

3.  Galaxy: a platform for interactive large-scale genome analysis.

Authors:  Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko
Journal:  Genome Res       Date:  2005-09-16       Impact factor: 9.043

4.  Virus attenuation by genome-scale changes in codon pair bias.

Authors:  J Robert Coleman; Dimitris Papamichail; Steven Skiena; Bruce Futcher; Eckard Wimmer; Steffen Mueller
Journal:  Science       Date:  2008-06-27       Impact factor: 47.728

5.  A pilot study of bacterial genes with disrupted ORFs reveals a surprising profusion of protein sequence recoding mediated by ribosomal frameshifting and transcriptional realignment.

Authors:  Virag Sharma; Andrew E Firth; Ivan Antonov; Olivier Fayet; John F Atkins; Mark Borodovsky; Pavel V Baranov
Journal:  Mol Biol Evol       Date:  2011-06-14       Impact factor: 16.240

6.  Sequence logos: a new way to display consensus sequences.

Authors:  T D Schneider; R M Stephens
Journal:  Nucleic Acids Res       Date:  1990-10-25       Impact factor: 16.971

7.  Ribosomal RNA secondary structure: compensatory mutations and implications for phylogenetic analysis.

Authors:  M T Dixon; D M Hillis
Journal:  Mol Biol Evol       Date:  1993-01       Impact factor: 16.240

Review 8.  Reprogrammed genetic decoding in cellular gene expression.

Authors:  Olivier Namy; Jean-Pierre Rousset; Sawsan Napthine; Ian Brierley
Journal:  Mol Cell       Date:  2004-01-30       Impact factor: 17.970

9.  CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments.

Authors:  Eckart Bindewald; Thomas D Schneider; Bruce A Shapiro
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  RNALogo: a new approach to display structural RNA alignment.

Authors:  Tzu-Hao Chang; Jorng-Tzong Horng; Hsien-Da Huang
Journal:  Nucleic Acids Res       Date:  2008-05-21       Impact factor: 16.971

View more
  5 in total

1.  BlockLogo: visualization of peptide and sequence motif conservation.

Authors:  Lars Rønn Olsen; Ulrich Johan Kudahl; Christian Simon; Jing Sun; Christian Schönbach; Ellis L Reinherz; Guang Lan Zhang; Vladimir Brusic
Journal:  J Immunol Methods       Date:  2013-08-31       Impact factor: 2.303

2.  ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

Authors:  Alberto I Roca
Journal:  BMC Proc       Date:  2014-08-28

3.  Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs.

Authors:  Marek Kultys; Lydia Nicholas; Roland Schwarz; Nick Goldman; James King
Journal:  BMC Proc       Date:  2014-08-28

4.  CoCoView - A codon conservation viewer via sequence logos.

Authors:  Beatriz Rodrigues Estevam; Diego Mauricio Riaño-Pachón
Journal:  MethodsX       Date:  2022-07-29

5.  Examination of the Glycine Betaine-Dependent Methylotrophic Methanogenesis Pathway: Insights Into Anaerobic Quaternary Amine Methylotrophy.

Authors:  Adam J Creighbaum; Tomislav Ticak; Shrameeta Shinde; Xin Wang; Donald J Ferguson
Journal:  Front Microbiol       Date:  2019-11-07       Impact factor: 5.640

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.