Literature DB >> 16844984

PONGO: a web server for multiple predictions of all-alpha transmembrane proteins.

Mauro Amico¹, Michele Finelli, Ivan Rossi, Andrea Zauli, Arne Elofsson, Håkan Viklund, Gunnar von Heijne, David Jones, Anders Krogh, Piero Fariselli, Pier Luigi Martelli, Rita Casadio.

Abstract

The annotation efforts of the BIOSAPIENS European Network of Excellence have generated several distributed annotation systems (DAS) with the aim of integrating Bioinformatics resources and annotating metazoan genomes (http://www.biosapiens.info). In this context, the PONGO DAS server (http://pongo.biocomp.unibo.it) provides the annotation on predictive basis for the all-alpha membrane proteins in the human genome, not only through DAS queries, but also directly using a simple web interface. In order to produce a more comprehensive analysis of the sequence at hand, this annotation is carried out with four selected and high scoring predictors: TMHMM2.0, MEMSAT, PRODIV and ENSEMBLE1.0. The stored and pre-computed predictions for the human proteins can be searched and displayed in a graphical view. However the web service allows the prediction of the topology of any kind of putative membrane proteins, regardless of the organism and more importantly with the same sequence profile for a given sequence when required. Here we present a new web server that incorporates the state-of-the-art topology predictors in a single framework, so that putative users can interactively compare and evaluate four predictions simultaneously for a given sequence. Together with the predicted topology, the server also displays a signal peptide prediction determined with SPEP. The PONGO web server is available at http://pongo.biocomp.unibo.it/pongo.

Entities: Chemical Gene Species

Mesh：

Substances：
Membrane Proteins

Year: 2006 PMID： 16844984 PMCID： PMC1538841 DOI： 10.1093/nar/gkl208

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

All-alpha membrane proteins constitute an important part of the cell proteome. Such proteins perform many basic functions including cell signalling, transcription regulation, energy conservation and transformation, and ion exchange. Membrane proteins are difficult to study, since they are inserted into lipid bilayers and expose to the polar outer and inner environments portions of different sizes. It is therefore difficult to purify them in the native, functional form and even more difficult to crystallize them. For such technical reasons, only a small fraction of the Protein Data Bank structures are membrane proteins (<1% of the total number of structures and far less than their estimated abundance in cells) (1). There are a number of computational methods available to predict the topology of membrane proteins, which consists of two basic features: (i) the location of transmembrane domains along the protein chain and (ii) the location of the N- and C-termini with respect to the lipid membrane. Topological models are sufficient in many instances to design simple experiments in order to prove the location of the N- and C-protein termini, that of the inner and outer loops with respect to the membrane plane, and concomitantly the number of transmembrane segments in the chain. However, the best-performing methods are offered at different servers and are endowed with different graphical interfaces. This hampers the direct comparison of predictions, especially for experimentalists interested in comparing their results with computational methods. Currently, two other web servers of which we are aware (2,3) are available and separately comprise two of the predictors that we implement TMHMM2.0 (4) and MEMSAT (5). The novelty of our web server is to include in the same framework ENSEMBLE (6) and PRODIV (7), two powerful methods that became available only recently. Furthermore the available web servers render only the consensus prediction, without allowing a critical discrimination among different predictions that may be useful, considering that different predictors may highlight different properties, depending on the different implementation. Automatic topology annotation for membrane proteins has been included among the efforts of the BIOSAPIENS European Network of Excellence () with the specific aim of taking into consideration different predictors for annotating membrane proteins in the human genome. The common platform for these efforts is the BIOSAPIENS Distributed Annotation Servers (DAS) (). In this context, the PONGO-DAS server () provides topology annotation for the all-alpha membrane proteins of the human genome. This is done using the DAS protocol () to answer at DAS queries that can be seen using specific visualizers such as DASTY (8) or ENSEMBL (9). In order to allow users to browse directly the pre-computed transmembrane annotations, PONGO-DAS provides a simple graphical web interface. The annotation is carried out using four selected predictors, namely ENSEMBLE1.0 (6) and PRODIV (7), in order to allow a direct comparison of the topology prediction for the sequence at hand. The server has also been set up to make it possible to predict the topology of any putative membrane proteins of interest regardless of provenance. The topological models computed by the different predictors can be directly obtained simply by pasting in a box the sequence of interest and looking at the results. Recently developed web technologies (e.g. AJAX) are used to improve the user interface.

MATERIALS AND METHODS

The predictors

The website implements the following predictors: MEMSAT is a new version of the MEMSAT predictor of transmembrane helices in proteins (4). This new version takes advantage of the evolutionary information derived by multiple sequence alignment. This method is based on a dynamic programming approach and statistical parametrization. TMHMM2.0 which is a predictor of transmembrane helices in proteins based on hidden Markov models (5). It has been shown that it performs quite well taking into account that it uses only single sequence information. For this reason it is also very fast. ENSEMBLE1.0 is an ensemble of two hidden Markov models and one neural network (6). ENSEMBLE takes also advantage of the evolutionary information derived by multiple sequence alignment, both for the neural network and the hidden Markov model systems. PRODIV_TMHMM_0.91 is a recent predictor of transmembrane helices in proteins (7) which uses a hidden Markov model similar to TMHMM, but exploits the evolutionary information derived by multiple sequence alignment. SPEP is a signal peptide predictor based on combination of neural networks (10). This predictor has performance similar to the most widely used SignalP (11), and it has been included since it is quite common that signal peptides are mispredicted as transmembrane helices.

Pre-computed transmembrane annotations for the human proteome

In the context of the BIOSAPIENS project, we downloaded the UNIPROT dataset (September 22, 2004) which consists of 33 135 human protein sequences, and for future use also the IPI dataset (August 5, 2004) which consists of 46 782 human proteins. The union of the two datasets comprises 50 600 sequences from the human genome. It is worth noticing that these 50 600 sequences do not include the known splice variants, since those variants are not presently included as unique UNIPROT codes. The choice of annotating UNIPROT sequences is well justified from the fact that ENSEMBL gene products (in contrast to the UNIPROT entries) are not stable; for instance, only 60% of the sequences are common (and conserved) between the ENSEMBL releases 34 and 35, respectively. Finally, UNIPROT provides extensive functional annotation. We then use four state-of-the-art predictors already described in the literature in order to identify the most probable integral membrane proteins. In this way the union of the predictions obtained with the four methods gives a set of likely membrane proteins, while the intersection contains those chains on whose annotation as membrane proteins all the predictors agree upon. The pre-computed annotations are filtered with SPEP before being processed by the transmembrane predictors. This is done since it is quite common that signal peptides are mispredicted as transmembrane helices by all the predictors implemented in our web server. In the case of a positive SPEP answer, the system cuts the corresponding predicted segment and processes the remaining part of the sequence with the four transmembrane predictors.

RESULTS

PONGO has two different usage options: PONGO-DAS accessing the pre-computed annotations using keywords or DAS queries; PONGO-PRED that is an enhanced and modern version of a standard ‘through-the-web’ server application. In the case of PONGO-DAS we implemented two types of result visualization: Through the DAS protocol a web client like ENSEMBL can obtain the list of the annotations for each human protein using its UNIPROT code as requested for the DAS queries (); in order to check this URL the client needs a DAS client such as the one that will be embedded in ENSEMBL pages at EBI and a UNIPROT code for the sequence. An associated URL is sent to the client for visualization. In the second case a user can directly query the database, without using the DAS infrastructure, using the web interface available at (Figure 1) The user can then use a CRC64 (the sequence hash code), or the sequence UNIPROT or IPI code to get the predictions in a graphical view. An example of a sequence filtered with the different predictors is presented together with the detailed sequence annotation (Figure 2). A colour code is used to quickly identify the transmembrane segments.

Figure 1

The PONGO homepage. On the left-side it is possible to follow the status of the different queries and the starting of a new action.

Figure 2

An example of PONGO results for a protein chain endowed with a signal peptide.

by means of a DAS, which is a client–server system in which a single client integrates information from multiple servers. It allows a single machine to gather up genome annotation information from multiple distant web sites, collate the information, and display it to the user in a single view. Little coordination is needed among the various information providers, and a user-friendly interface that allows the search for a specific prediction. Conversely, PONGO-PRED provides the user with a unique framework for membrane protein topology and signal peptide predictions. Another interesting feature of PONGO-PRED is the Javascript-enabled portlet that in real time refreshes the queue status (without reloading of the submission page and with very lightweight Xml HTTP Requests). In particular, on the left-side of the page the user can realise whether her/his submission is a new call, whether it is running (in this case the starting date and an absolute link is provided for bookmarking), or whether it has been processed.

11 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. SPEPlip: the detection of signal peptide and lipoprotein cleavage sites.

Authors: Piero Fariselli; Giacomo Finocchiaro; Rita Casadio
Journal: Bioinformatics Date: 2003-12-12 Impact factor: 6.937

3. In silico prediction of the structure of membrane proteins: is it feasible?

Authors: Rita Casadio; Piero Fariselli; Pier Luigi Martelli
Journal: Brief Bioinform Date: 2003-12 Impact factor: 11.622

4. BPROMPT: A consensus server for membrane protein prediction.

Authors: Paul D Taylor; Teresa K Attwood; Darren R Flower
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

5. An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins.

Authors: Pier Luigi Martelli; Piero Fariselli; Rita Casadio
Journal: Bioinformatics Date: 2003 Impact factor: 6.937

6. Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information.

Authors: Håkan Viklund; Arne Elofsson
Journal: Protein Sci Date: 2004-07 Impact factor: 6.725

7. Improved prediction of signal peptides: SignalP 3.0.

Authors: Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal: J Mol Biol Date: 2004-07-16 Impact factor: 5.469

8. Dasty and UniProt DAS: a perfect pair for protein feature visualization.

Authors: Philip Jones; Nisha Vinod; Thomas Down; Andre Hackmann; Andreas Kahari; Ernst Kretschmann; Antony Quinn; Daniela Wieser; Henning Hermjakob; Rolf Apweiler
Journal: Bioinformatics Date: 2005-05-19 Impact factor: 6.937

9. A model recognition approach to the prediction of all-helical membrane protein structure and topology.

Authors: D T Jones; W R Taylor; J M Thornton
Journal: Biochemistry Date: 1994-03-15 Impact factor: 3.162

10. Ensembl 2006.

Authors: E Birney; D Andrews; M Caccamo; Y Chen; L Clarke; G Coates; T Cox; F Cunningham; V Curwen; T Cutts; T Down; R Durbin; X M Fernandez-Suarez; P Flicek; S Gräf; M Hammond; J Herrero; K Howe; V Iyer; K Jekosch; A Kähäri; A Kasprzyk; D Keefe; F Kokocinski; E Kulesha; D London; I Longden; C Melsopp; P Meidl; B Overduin; A Parker; G Proctor; A Prlic; M Rae; D Rios; S Redmond; M Schuster; I Sealy; S Searle; J Severin; G Slater; D Smedley; J Smith; A Stabenau; J Stalker; S Trevanion; A Ureta-Vidal; J Vogel; S White; C Woodwark; T J P Hubbard
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

13 in total

1. Estimating the length of transmembrane helices using Z-coordinate predictions.

Authors: Costas Papaloukas; Erik Granseth; Håkan Viklund; Arne Elofsson
Journal: Protein Sci Date: 2007-12-20 Impact factor: 6.725

2. Transmembrane topology and signal peptide prediction using dynamic bayesian networks.

Authors: Sheila M Reynolds; Lukas Käll; Michael E Riffle; Jeff A Bilmes; William Stafford Noble
Journal: PLoS Comput Biol Date: 2008-11-07 Impact factor: 4.475

3. TOPCONS: consensus prediction of membrane protein topology.

Authors: Andreas Bernsel; Håkan Viklund; Aron Hennerdal; Arne Elofsson
Journal: Nucleic Acids Res Date: 2009-05-08 Impact factor: 16.971

Review 4. Transport capabilities of eleven gram-positive bacteria: comparative genomic analyses.

Authors: Graciela L Lorca; Ravi D Barabote; Vladimir Zlotopolski; Can Tran; Brit Winnen; Rikki N Hvorup; Aaron J Stonestrom; Elizabeth Nguyen; Li-Wen Huang; David S Kim; Milton H Saier
Journal: Biochim Biophys Acta Date: 2007-02-17

5. An Rh1-GFP fusion protein is in the cytoplasmic membrane of a white mutant strain of Chlamydomonas reinhardtii.

Authors: Corinne Yoshihara; Kentaro Inoue; Denise Schichnes; Steven Ruzin; William Inwood; Sydney Kustu
Journal: Mol Plant Date: 2008-11-14 Impact factor: 13.164

6. Characterization of YmgF, a 72-residue inner membrane protein that associates with the Escherichia coli cell division machinery.

Authors: Gouzel Karimova; Carine Robichon; Daniel Ladant
Journal: J Bacteriol Date: 2008-10-31 Impact factor: 3.490

7. Sequence-based feature prediction and annotation of proteins.

Authors: Agnieszka S Juncker; Lars J Jensen; Andrea Pierleoni; Andreas Bernsel; Michael L Tress; Peer Bork; Gunnar von Heijne; Alfonso Valencia; Christos A Ouzounis; Rita Casadio; Søren Brunak
Journal: Genome Biol Date: 2009-02-02 Impact factor: 13.583