Literature DB >> 25897125

The iceLogo web server and SOAP service for determining protein consensus sequences.

Davy Maddelein¹, Niklaas Colaert¹, Iain Buchanan¹, Niels Hulstaert¹, Kris Gevaert¹, Lennart Martens².

Abstract

The iceLogo web server and SOAP service implement the previously published iceLogo algorithm. iceLogo builds on probability theory to visualize protein consensus sequences in a format resembling sequence logos. Peptide sequences are compared against a reference sequence set that can be tailored to the studied system and the used protocol. As such, not only over- but also underrepresented residues can be visualized in a statistically sound manner, which further allows the user to easily analyse and interpret conserved sequence patterns in proteins. The web application and SOAP service can be found free and open to all users without the need for a login on http://iomics.ugent.be/icelogoserver/main.html.

Entities: Chemical Gene Species

Mesh：

Year: 2015 PMID： 25897125 PMCID： PMC4489316 DOI： 10.1093/nar/gkv385

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The development of high throughput methods for analysing oligonucleotides and proteins led to the discovery of large amounts of sequence-based information. These data can contain conserved sequence patterns which may explain specificities of the studied processes. In 1990 Schneider and Stephens described a method to visualize and analyse such conserved patterns (1). These so-called sequence logos are histogram-like presentations where every bar is a stack of letters (being amino acids or nucleotides) and are created with a group of sequences of the same length as the input. The height of a stack is calculated by Shannon's information theory. This takes into account the maximum number of possible different residues (4 different nucleotides or 20 different amino acids) and the observed frequencies of these residues on that position in the experimental multiple sequence alignment. The size of one residue in such a stack thus reflects the frequency of this residue at a given position. A web-based application, WebLogo, implements this sequence logo algorithm (2). Despite its overall usefulness and wide adoption by the scientific community, this method has two major shortcomings. First, the experimental set is not compared with a reference set. This means that the reference is implicitly assumed to be a fixed and equal contribution (25% for a nucleic acid and 5% for an amino acid) for every residue, and this clearly does not reflect reality. Slogos (oligonucleotides) and Plogos (proteins) attempt to address this issue by providing the user with the ability to set a fixed frequency for every residue, and so create a corrected sequence logo still relying on Shannon's information theory (3). Second, while over-represented residues in a consensus sequence are clearly visible in a sequence logo, the equally important underrepresented residues in a consensus sequence are not at all visualized and are therefore readily overlooked. If two experimental sets are available, the differences in both over- and under-represented residues can however be visualized by the TwoSampleLogo web application (4). Several other methods also adapt sequence logos to better visualize specific aspects of oligonucleotide or protein sequence patterns (5–10). The iceLogo algorithm however, not only resolves these problems but also creates additional, complementary visualizations that ease the analysis of protein consensus sequences (11).

IceLogoServer

Here we present the implementation of the iceLogo algorithm in a web server and a SOAP service. The web application is designed to make the creation of rich and precise iceLogo visualizations very easy for users, while the SOAP service is aimed at developers who want to transparently implement the iceLogo algorithm in their own software.

ALGORITHM

Below we describe the functionality of the iceLogo server, but a full description of the iceLogo algorithm can be found in reference (11) and in the online manual. Two ways of creating a reference set are implemented: a static and a dynamic reference set. While the static method is available in both the web application and the SOAP service, the more complete but more complex dynamic method can only be used via the SOAP service. The static method takes either a list of reference sequences as input, thus providing different residue frequencies for different positions or can be selected from pre-calculated, species-specific proteome frequencies, yielding identical residue frequencies for each position in the alignment. The dynamic method on the other hand performs a Monte Carlo sampling strategy to create a reference set on-the-fly from a species-specific FASTA protein database. The amino acids or peptides can be randomly extracted from the FASTA database, or can be derived following more complex methods, including sampling peptides at a specific position. This for instance allows sampling from only terminal peptides (i.e. peptides within a certain distance from the amino or carboxy terminus of the protein) that are proven to have a different composition than internal peptides (12). The algorithm will then calculate significances (Z-scores) for the amino acids in the experimental set using the frequencies of the amino acids in the reference set and the sample size obtained. The results can then be visualized in different ways, as detailed below.

VISUALIZATIONS

Six different visualizations exist that provide comprehensive and complementary views on the available information. The iceLogo plot attempts to visualize a consensus sequence in a rich and precise manner similar to sequence logos, but with two changes: the use of a reference set allows iceLogos to rely on probability to find and visualize only significantly different residue frequencies in the experimental set and iceLogo also provides the visualization of significantly underrepresented residues, indicating non or less tolerated residues in the consensus sequence. These latter residues are plotted below the abscissa in the iceLogo. The second visualization is a corrected sequence logo, similar to the output of Plogos (3). Indeed, since iceLogo can extract position-specific frequencies from the reference set, the sequence logo height at every position can be corrected with the actual sequence bias at that location. The third visualization is a variant of the normal sequence logo. The entire graph space is used to represent the amino acids at each position. The amino acids themselves are represented as their percentual abundance on each given location. This reduces the impact of a heavily up or downregulated amino acid on a certain position in favour of the relative impact of each amino acid on each location. The fourth visualization provided by the tool is a heat map view that shows all amino acid occurrences and significances for all positions in a single image. The heat map is drawn as a two-dimensional matrix in which every row represents a residue and every column a position. Every cell in this matrix is coloured according to the representation of the residue at that position: a cell is black if it is not significantly represented, or a shade of green or red for significantly up or down-represented residues, respectively. The fifth visualization displays specific amino acid factors like charge and hydrophobicity (or any other physicochemical or biochemical parameter of the 544 possibilities from the AAIndex 1 database (13)) in an amino acid parameter graph. This graph thus visualizes a common parameter in the context of the reference set. Finally, the sixth visualization is parallel with the previous and visualizes the correlation between a substitution matrix and the positional amino acids. An example of each of the six different visualizations is given in Figure 1, where 123 mouse granzyme C cleavage sites are compared with the mouse proteome as a static reference set (14).

Figure 1.

123 substrates of the mouse granzyme B protease (that cleaves at the carboxyl-terminus of an aspartate) are used the display the versatility of visualization methods supported by the web server and the SOAP service. The processing site is shown as an iceLogo (upper left corner), as a corrected sequence logo (upper left corner), as a filled sequence logo (middle left), as a heat map (middle right), as an amino acid parameter graph displaying the hydropathy of the residues (lower left corner) and a correlation graph showing the consensus hydrophobicity index. The human subset of the UniProtKB/Swiss-Prot database was used to calculate amino acid frequencies for the reference set. These different visualization methods clearly provide more detailed information concerning the processing site than a sequence logo alone.

AVAILABILITY

The iceLogo web application can be found at http://iomics.ugent.be/icelogoserver/main.html. The intuitive design of the web page should enable users to quickly become acquainted with its interface. The only obligatory input is a list of sequences that are expected to share residue-related features. The reference set can be created by specifying a list of reference sequences or via the selection of a species-specific proteome constructed from the UniProtKB/Swiss-Prot protein database (15). Other parameters concerning the visualization type, colour of the residues, P-value etc. can be set before generating the visualization. The online manual provides various samples covering the different visualization methods and adjustable parameters that can serve as a guide to users. The created visualizations can be viewed and downloaded in various image file formats (JPEG, TIFF, PNG, PDF and SVG). The SOAP service can be programmatically accessed via the SOAP protocol on http://iomics.ugent.be/icelogoserver/services/icelogo. A WS-I complaint document/literal-wrapped WSDL file describing the various methods of the SOAP service can be found on http://iomics.ugent.be/icelogoserver/IceLogo.wsdl. Additionally, the available methods and their parameters are also explained on the iceLogo website. The SOAP service generates the iceLogo results as lightweight, xml based SVG images. Both the iceLogo algorithm and the IceLogoServer are published under the permissive Apache 2 open source licence (http://www.apache.org/licenses/LICENSE-2.0.html) and the source is available via git from https://github.com/compomics/icelogo and https://github.com/compomics/icelogoserver, respectively. A preassembled web archive (WAR) file can also be found on the latter website, making it very easy to create a local, customized iceLogo web application or SOAP service if desired. For most users however, we recommend using the well-maintained and fully tested web server described here. A sample Java SOAP client is briefly described on the website and more examples (including an implementation of a client that converts SVG to JPEG, TIFF, PNG or PDF images) can be found in the iceLogoServer code.

USAGE STATISTICS

The iceLogo web server has been available online continuously since 2010. The recent usage statistics provided in Table 1 highlight the popularity of the web service. The JPEG export format is clearly the most popular, with PNG and PDF taking up the second and third place. TIFF and SVG are less popular, despite the usefulness of these formats for inclusion in publications.

Table 1.

iceLogo website usage statistics in average number of iceLogos created per month, split by generated image type

	JPEG	PNG	SVG	TIFF	PDF	Total
Average number of generated iceLogos per month	558	72	23	40	52	746

Statistics are calculated over the 15-month period from October 2013 up to and including December January 2014.

CONCLUSION

The web application and SOAP service presented here implement a stable and popular online version of the iceLogo algorithm. The web application allows the user to create comprehensive protein consensus sequence visualizations easily in an intuitive web environment, thus bypassing the hassle of downloading and installing a local program. Due to the platform and language independency of the SOAP architecture, bioinformaticians can use the iceLogo visualization in their own software without the need to use the java library containing the iceLogo algorithm.

15 in total

1. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments.

Authors: Vladimir Vacic; Lilia M Iakoucheva; Predrag Radivojac
Journal: Bioinformatics Date: 2006-04-21 Impact factor: 6.937

2. BLogo: a tool for visualization of bias in biological sequences.

Authors: Wencheng Li; Bo Yang; Shaoguang Liang; Yonghua Wang; Chris Whiteley; Yicheng Cao; Xiaoning Wang
Journal: Bioinformatics Date: 2008-08-04 Impact factor: 6.937

3. Structure of granzyme C reveals an unusual mechanism of protease autoinhibition.

Authors: Dion Kaiserman; Ashley M Buckle; Petra Van Damme; James A Irving; Ruby H P Law; Antony Y Matthews; Tanya Bashtannyk-Puhalovich; Chris Langendorf; Philip Thompson; Joël Vandekerckhove; Kris Gevaert; James C Whisstock; Phillip I Bird
Journal: Proc Natl Acad Sci U S A Date: 2009-03-19 Impact factor: 11.205

4. Improved visualization of protein consensus sequences by iceLogo.

Authors: Niklaas Colaert; Kenny Helsens; Lennart Martens; Joël Vandekerckhove; Kris Gevaert
Journal: Nat Methods Date: 2009-11 Impact factor: 28.547

5. Displaying the information contents of structural RNA alignments: the structure logos.

Authors: J Gorodkin; L J Heyer; S Brunak; G D Stormo
Journal: Comput Appl Biosci Date: 1997-12

6. Proteomics analyses reveal the evolutionary conservation and divergence of N-terminal acetyltransferases from yeast and humans.

Authors: Thomas Arnesen; Petra Van Damme; Bogdan Polevoda; Kenny Helsens; Rune Evjenth; Niklaas Colaert; Jan Erik Varhaug; Joël Vandekerckhove; Johan R Lillehaug; Fred Sherman; Kris Gevaert
Journal: Proc Natl Acad Sci U S A Date: 2009-05-06 Impact factor: 11.205

7. enoLOGOS: a versatile web tool for energy normalized sequence logos.

Authors: Christopher T Workman; Yutong Yin; David L Corcoran; Trey Ideker; Gary D Stormo; Panayiotis V Benos
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

8. AAindex: amino acid index database, progress report 2008.

Authors: Shuichi Kawashima; Piotr Pokarowski; Maria Pokarowska; Andrzej Kolinski; Toshiaki Katayama; Minoru Kanehisa
Journal: Nucleic Acids Res Date: 2007-11-12 Impact factor: 16.971

9. 3dLOGO: a web server for the identification, analysis and use of conserved protein substructures.

Authors: Allegra Via; Daniele Peluso; Pier Federico Gherardini; Emanuele de Rinaldis; Teresa Colombo; Gabriele Ausiello; Manuela Helmer-Citterich
Journal: Nucleic Acids Res Date: 2007-05-08 Impact factor: 16.971

10. RNALogo: a new approach to display structural RNA alignment.

Authors: Tzu-Hao Chang; Jorng-Tzong Horng; Hsien-Da Huang
Journal: Nucleic Acids Res Date: 2008-05-21 Impact factor: 16.971

19 in total

1. PTM-Logo: a program for generation of sequence logos based on position-specific background amino-acid probabilities.

Authors: Thammakorn Saethang; Kenneth Hodge; Chin-Rang Yang; Yue Zhao; Ingorn Kimkong; Mark A Knepper; Trairak Pisitkun
Journal: Bioinformatics Date: 2019-12-15 Impact factor: 6.937

2. Chronic Inflammation Permanently Reshapes Tissue-Resident Immunity in Celiac Disease.

Authors: Toufic Mayassi; Kristin Ladell; Herman Gudjonson; James E McLaren; Dustin G Shaw; Mai T Tran; Jagoda J Rokicka; Ian Lawrence; Jean-Christophe Grenier; Vincent van Unen; Cezary Ciszewski; Matthew Dimaano; Hoda E Sayegh; Vinod Kumar; Cisca Wijmenga; Peter H R Green; Ranjana Gokhale; Hilary Jericho; Carol E Semrad; Stefano Guandalini; Aaron R Dinner; Sonia S Kupfer; Hugh H Reid; Luis B Barreiro; Jamie Rossjohn; David A Price; Bana Jabri
Journal: Cell Date: 2019-02-07 Impact factor: 41.582

3. TAILS N-terminomics and proteomics reveal complex regulation of proteolytic cleavage by O-glycosylation.

Authors: Sarah L King; Christoffer K Goth; Ulrich Eckhard; Hiren J Joshi; Amalie D Haue; Sergey Y Vakhrushev; Katrine T Schjoldager; Christopher M Overall; Hans H Wandall
Journal: J Biol Chem Date: 2018-03-28 Impact factor: 5.157

4. Exploring G protein-coupled receptor signaling networks using SILAC-based phosphoproteomics.

Authors: Grace R Williams; Jennifer R Bethard; Mary N Berkaw; Alexis K Nagel; Louis M Luttrell; Lauren E Ball
Journal: Methods Date: 2015-07-06 Impact factor: 3.608

5. Predicting HLA class II antigen presentation through integrated deep learning.

Authors: Binbin Chen; Michael S Khodadoust; Niclas Olsson; Lisa E Wagar; Ethan Fast; Chih Long Liu; Yagmur Muftuoglu; Brian J Sworder; Maximilian Diehn; Ronald Levy; Mark M Davis; Joshua E Elias; Russ B Altman; Ash A Alizadeh
Journal: Nat Biotechnol Date: 2019-10-14 Impact factor: 54.908

6. Combined Quantification of the Global Proteome, Phosphoproteome, and Proteolytic Cleavage to Characterize Altered Platelet Functions in the Human Scott Syndrome.

Authors: Fiorella A Solari; Nadine J A Mattheij; Julia M Burkhart; Frauke Swieringa; Peter W Collins; Judith M E M Cosemans; Albert Sickmann; Johan W M Heemskerk; René P Zahedi
Journal: Mol Cell Proteomics Date: 2016-08-17 Impact factor: 5.911

7. Dual lysine and N-terminal acetyltransferases reveal the complexity underpinning protein acetylation.

Authors: Willy V Bienvenut; Annika Brünje; Jean-Baptiste Boyer; Jens S Mühlenbeck; Gautier Bernal; Ines Lassowskat; Cyril Dian; Eric Linster; Trinh V Dinh; Minna M Koskela; Vincent Jung; Julian Seidel; Laura K Schyrba; Aiste Ivanauskaite; Jürgen Eirich; Rüdiger Hell; Dirk Schwarzer; Paula Mulo; Markus Wirtz; Thierry Meinnel; Carmela Giglione; Iris Finkemeier
Journal: Mol Syst Biol Date: 2020-07 Impact factor: 11.429

8. Metabolic rewiring of the hypertensive kidney.

Authors: Markus M Rinschen; Oleg Palygin; Carlos Guijas; Amelia Palermo; Nicolas Palacio-Escat; Xavier Domingo-Almenara; Rafael Montenegro-Burke; Julio Saez-Rodriguez; Alexander Staruschenko; Gary Siuzdak
Journal: Sci Signal Date: 2019-12-10 Impact factor: 8.192

9. Antagonism of the mu-delta opioid receptor heterodimer enhances opioid antinociception by activating Src and calcium/calmodulin-dependent protein kinase II signaling.

Authors: Attila Keresztes; Keith Olson; Paul Nguyen; Marissa A Lopez-Pier; Ryan Hecksel; Natalie K Barker; Zekun Liu; Victor Hruby; John Konhilas; Paul R Langlais; John M Streicher
Journal: Pain Date: 2022-01-01 Impact factor: 6.961

10. Lysine acetylome profiling uncovers novel histone deacetylase substrate proteins in Arabidopsis.

Authors: Markus Hartl; Magdalena Füßl; Paul J Boersema; Jan-Oliver Jost; Katharina Kramer; Ahmet Bakirbas; Julia Sindlinger; Magdalena Plöchinger; Dario Leister; Glen Uhrig; Greg Bg Moorhead; Jürgen Cox; Michael E Salvucci; Dirk Schwarzer; Matthias Mann; Iris Finkemeier
Journal: Mol Syst Biol Date: 2017-10-23 Impact factor: 11.429