Literature DB >> 19429891

TOPCONS: consensus prediction of membrane protein topology.

Andreas Bernsel¹, Håkan Viklund, Aron Hennerdal, Arne Elofsson.

Abstract

TOPCONS (http://topcons.net/) is a web server for consensus prediction of membrane protein topology. The underlying algorithm combines an arbitrary number of topology predictions into one consensus prediction and quantifies the reliability of the prediction based on the level of agreement between the underlying methods, both on the protein level and on the level of individual TM regions. Benchmarking the method shows that overall performance levels match the best available topology prediction methods, and for sequences with high reliability scores, performance is increased by approximately 10 percentage points. The web interface allows for constraining parts of the sequence to a known inside/outside location, and detailed results are displayed both graphically and in text format.

Entities: Chemical Gene

Mesh：

Substances：
Membrane Proteins

Year: 2009 PMID： 19429891 PMCID： PMC2703981 DOI： 10.1093/nar/gkp363

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Genome-wide estimations indicate that alpha-helical transmembrane (TM) proteins comprise roughly 20–30% of the genes in a typical organism (1,2). These proteins are essential for vital biological functions such as cell communication and signaling, active and passive transport of molecules across the membrane, energy-transduction and cell–cell adhesion. Prediction of membrane protein topology (i.e. the positions and in/out orientation of the membrane-spanning regions) serves to quickly obtain fundamental structural knowledge of TM proteins in silico. For TM proteins, computational methods are particularly important since structural knowledge is difficult to attain experimentally. Therefore, a correctly predicted topology provides an excellent template for further studies in the laboratory and might facilitate and improve functional and structural classification of protein sequences on a genomic level. A number of different methods have been developed over the last decades that predict topology with high accuracy. Many of these methods are freely available as web servers, both individually (2,3) and combining the results from several methods (4). With prediction algorithms based on different principles, it is not a surprising observation that for a fair amount of proteins, different prediction methods disagree about the final result, causing uncertainty about the correct topology. Earlier studies have stated, for example, that topology predictions are more likely to be correct when individual methods agree in their prediction than when they do not (5), but so far only a few attempts have been made at combining individual topology predictions into one consensus prediction (6–8). Here we present TOPCONS, a fundamental algorithm that combines an arbitrary number of topology predictions into one consensus prediction and quantifies the reliability of the prediction based on the level of agreement between the underlying methods, both on the protein level and on the level of individual TM regions. We also present an implementation of TOPCONS as a web-server based on the individual topology prediction methods OCTOPUS (9), PRO-TMHMM and PRODIV-TMHMM (10), SCAMPI-single and SCAMPI-multi (11). During the development a large set of combinations using many different topology predictors as an input to TOPCONS was tested. However, no combination performed significantly better than the one used here and therefore we decided to only use methods developed in house in the current version of the TOPCONS webserver.

METHODS

TOPCONS algorithm

An overview of the different steps of the TOPCONS algorithm is presented in Figure 1. As input TOPCONS uses a set of topology predictions which are combined into a topology profile by letting each residue be represented by three values representing the fraction of methods that predict that residue to be situated in the membrane (M), on the inside of the membrane (i) or on the outside of the membrane (o). This topology profile is used as input to a dynamic programming algorithm similar to a hidden Markov model that has an alphabet consisting of the characters ‘M’, ‘i’ and ‘o’. The final topology corresponds to the highest scoring state path through this model using a Viterbi-like algorithm. In each state, the emission score for the structural category modeled by that state (i, o or M) is equal to 1.0 and for all other structural categories it equals 0.0. All transition probabilities are equal to 1.0. Thus, the final prediction equals the state path with the highest geometric mean score with respect to the topology profile and the grammar of the model.

Figure 1.

TOPCONS workflow: four of the topology predictors make use of multiple sequence information and require a sequence profile as input, created using BLAST (18), whereas the fifth method (SCAMPI-single) only requires the protein sequence. The topology predictions are used to construct a topology profile, which is fed into the TOPCONS hidden Markov model.

Reliability score

A reliability value is calculated for each residue in a sequence by taking the average over a 21 position window of the topology profile value for the consensus prediction of that position (i, o, M). A reliability score on the protein level is calculated by taking the minimum value as calculated above.

Dataset

A dataset of 163 sequences with known topology, compiled by combining the datasets from (10) and (3) and homology reducing at 30% sequence identity using cd-hit (12), was used to evaluate the performance of TOPCONS.

BENCHMARK RESULTS

Using a dataset of 163 sequences with known topology, the performance of TOPCONS was benchmarked against eight other topology prediction algorithms (Table 1). ConPredII (7) is an earlier consensus transmembrane topology prediction approach, and the other methods [OCTOPUS (9), MEMSAT3 (13), SCAMPI (11), HMMTOP (14) and PRO- and PRODIV-TMHMM (10)] are frequently used single topology prediction methods. TOPCONS outperforms the only other consensus prediction method ConPredII, partly because the underlying topology prediction methods are more recent, and achieves accuracy similar to that of the best available individual methods. However, the performance of TOPCONS is not significantly better than the best single server included in the predictions, possibly because the limit of prediction accuracy given the data available today is close to have been reached.

Table 1.

Prediction accuracy for a benchmark set of 163 membrane proteins with known topology

Method	Accuracy (%)
TOPCONS	83
OCTOPUS	82
PRODIV-TMHMM	78
MEMSAT3	76
SCAMPI-multi	75
ConPredII	74
PRO-TMHMM	72
HMMTOP	70
SCAMPI-single	68

The accuracy is measures as the fraction of correct topologies at the protein level. A correct topology should have the correct number of TM regions at approximately correct locations and the correct location of the N and C-termini.

Prediction accuracy for a benchmark set of 163 membrane proteins with known topology The accuracy is measures as the fraction of correct topologies at the protein level. A correct topology should have the correct number of TM regions at approximately correct locations and the correct location of the N and C-termini. Studying the reliability scores (Figure 2); it is evident that higher reliability scores (i.e. in principle the number of methods that agree for the sequence position with least agreement across the protein) correspond to higher probability that a prediction is correct. In our benchmark set, 71% of the sequences achieved reliability scores above 70, and among those sequences, the accuracy of TOPCONS is 93%, i.e. 10 percentage points higher than the overall accuracy for the complete dataset.

Figure 2.

Estimated probability of correct topology prediction as a function of reliability score. Of the sequences in the benchmark set, 71% have a reliability score >70, and among those, 93% of the predictions are correct. The probability of a correct topology prediction (y-axis) was estimated as the prediction accuracy among proteins with reliability score ± 10 from the value given by the x-axis. Compared to an earlier derived reliability score for the individual prediction method TMHMM (15), the reliability score of TOPCONS is similar (although the overall prediction accuracy is higher). A z-value for the reliability scores were calculated using the Wilcoxon rank sum test to get a quantified comparison between the two reliability scores independent of the overall prediction accuracy. The z-value for TMHMM is −6.1 and for TOPCONS −4.8.

THE TOPCONS AND SCAMPI WEBSERVERS

TOPCONS

To make TOPCONS available to a broad audience, a web server implementing the algorithm has been developed and can be freely accessed at http://topcons.net/. Given the amino-acid sequence of a putative membrane protein, the server outputs the predicted topology using the individual methods, as well as the consensus prediction (TOPCONS) (Figure 3). In addition, ZPRED (16) is used to predict the Z-coordinate (i.e. the distance to the membrane center) of each amino acid, and a scale describing the free-energy contributions of translocon-mediated membrane insertion (17) is used to predict a ΔG value for a window of 21 amino acids centered around each position in the sequence. Optionally, parts of the sequence can be constrained to a known Inside/Outside/Membrane-location, by using the Restrainment options, allowing for any combinations of restraints on one or more parts of the sequence. All results are both displayed graphically and are available for download in text format. The BLAST output, which is used as input to the methods, and high-resolution versions of the images are also available for download.

Figure 3.

Example output from the TOPCONS webserver, based on the Bacteriorhodopsin sequence from Halobacterium species (SwissProt-ID: BACR_HALS4). (A) Topologies predicted by the individual methods, predicted Z-coordinates, and predicted ΔG-values across the sequence. (B) The consensus prediction, which is based on the individual methods, and reliability score across the sequence.

SCAMPI

Due to the computational limitations arising from the need to run BLAST (18) (Figure 1), only one sequence per query is allowed using TOPCONS, and the prediction typically takes 10–30 s. For large benchmark sets and full proteome scans, the SCAMPI server, implementing only the single-sequence version of SCAMPI (11), may be used instead, which is able to process around 20 000 protein sequences per minute (http://scampi.cbr.su.se/). Here, prediction results are made available in easily parsable text files.

FUNDING

Swedish Research Council, the Foundation for Strategic Research, EU 6th Framework Program [Biosapiens, Contract LSHG-CT-2004-512092] and 7th Framework program [EDICT Contract No FP7-HEALTH-F4-2007-201924]. Funding for open access charge: EDICT program. Conflict of interest statement. None declared.

18 in total

1. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

2. ZPRED: predicting the distance to the membrane center for residues in alpha-helical membrane proteins.

Authors: Erik Granseth; Håkan Viklund; Arne Elofsson
Journal: Bioinformatics Date: 2006-07-15 Impact factor: 6.937

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. MemO: a consensus approach to the annotation of a protein's membrane organization.

Authors: Melissa J Davis; Fasheng Zhang; Zheng Yuan; Rohan D Teasdale
Journal: In Silico Biol Date: 2006

5. Improving the accuracy of transmembrane protein topology prediction using evolutionary information.

Authors: David T Jones
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

6. Molecular code for transmembrane-helix recognition by the Sec61 translocon.

Authors: Tara Hessa; Nadja M Meindl-Beinker; Andreas Bernsel; Hyun Kim; Yoko Sato; Mirjam Lerch-Bader; IngMarie Nilsson; Stephen H White; Gunnar von Heijne
Journal: Nature Date: 2007-12-13 Impact factor: 49.962

7. Prediction of membrane-protein topology from first principles.

Authors: Andreas Bernsel; Håkan Viklund; Jenny Falk; Erik Lindahl; Gunnar von Heijne; Arne Elofsson
Journal: Proc Natl Acad Sci U S A Date: 2008-05-13 Impact factor: 11.205

8. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms.

Authors: E Wallin; G von Heijne
Journal: Protein Sci Date: 1998-04 Impact factor: 6.725

9. Consensus predictions of membrane protein topology.

Authors: J Nilsson; B Persson; G von Heijne
Journal: FEBS Lett Date: 2000-12-15 Impact factor: 4.124

10. PONGO: a web server for multiple predictions of all-alpha transmembrane proteins.

Authors: Mauro Amico; Michele Finelli; Ivan Rossi; Andrea Zauli; Arne Elofsson; Håkan Viklund; Gunnar von Heijne; David Jones; Anders Krogh; Piero Fariselli; Pier Luigi Martelli; Rita Casadio
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

239 in total

1. Acyl chain specificity of ceramide synthases is determined within a region of 150 residues in the Tram-Lag-CLN8 (TLC) domain.

Authors: Rotem Tidhar; Shifra Ben-Dor; Elaine Wang; Samuel Kelly; Alfred H Merrill; Anthony H Futerman
Journal: J Biol Chem Date: 2011-12-05 Impact factor: 5.157

2. Mycobacterium tuberculosis vitamin K epoxide reductase homologue supports vitamin K-dependent carboxylation in mammalian cells.

Authors: Jian-Ke Tie; Da-Yun Jin; Darrel W Stafford
Journal: Antioxid Redox Signal Date: 2011-11-22 Impact factor: 8.401

Review 3. Peptide signaling in the staphylococci.

Authors: Matthew Thoendel; Jeffrey S Kavanaugh; Caralyn E Flack; Alexander R Horswill
Journal: Chem Rev Date: 2010-12-21 Impact factor: 60.622

4. The flagellar protein FliL is essential for swimming in Rhodobacter sphaeroides.

Authors: Fernando Suaste-Olmos; Clelia Domenzain; José Cruz Mireles-Rodríguez; Sebastian Poggio; Aurora Osorio; Georges Dreyfus; Laura Camarena
Journal: J Bacteriol Date: 2010-10-01 Impact factor: 3.490

5. SpyA is a membrane-bound ADP-ribosyltransferase of Streptococcus pyogenes which modifies a streptococcal peptide, SpyB.

Authors: Natalia Korotkova; Jessica S Hoff; Devon M Becker; John Kyle Heggen Quinn; Laura M Icenogle; Steve L Moseley
Journal: Mol Microbiol Date: 2012-01-30 Impact factor: 3.501

6. Complete topology of the RNF complex from Vibrio cholerae.

Authors: Teri N Hreha; Katherine G Mezic; Henry D Herce; Ellen B Duffy; Anais Bourges; Sergey Pryshchep; Oscar Juarez; Blanca Barquera
Journal: Biochemistry Date: 2015-04-10 Impact factor: 3.162

7. Architectural organization of the metabolic regulatory enzyme ghrelin O-acyltransferase.

Authors: Martin S Taylor; Travis R Ruch; Po-Yuan Hsiao; Yousang Hwang; Pingfeng Zhang; Lixin Dai; Cheng Ran Lisa Huang; Christopher E Berndsen; Min-Sik Kim; Akhilesh Pandey; Cynthia Wolberger; Ronen Marmorstein; Carolyn Machamer; Jef D Boeke; Philip A Cole
Journal: J Biol Chem Date: 2013-09-17 Impact factor: 5.157

8. The Pseudomonas aeruginosa PA14 ABC Transporter NppA1A2BCD Is Required for Uptake of Peptidyl Nucleoside Antibiotics.

Authors: Daniel Pletzer; Yvonne Braun; Svetlana Dubiley; Corinne Lafon; Thilo Köhler; Malcolm G P Page; Michael Mourez; Konstantin Severinov; Helge Weingart
Journal: J Bacteriol Date: 2015-04-27 Impact factor: 3.490

9. Two novel membrane proteins, TcpD and TcpE, are essential for conjugative transfer of pCW3 in Clostridium perfringens.

Authors: Jessica A Wisniewski; Wee L Teng; Trudi L Bannam; Julian I Rood
Journal: J Bacteriol Date: 2014-12-08 Impact factor: 3.490

10. Membrane topology of hedgehog acyltransferase.

Authors: Armine Matevossian; Marilyn D Resh
Journal: J Biol Chem Date: 2014-12-08 Impact factor: 5.157