Literature DB >> 16844988

PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins.

Henry Bigelow1, Burkhard Rost.   

Abstract

PROFtmb predicts transmembrane beta-barrel (TMB) proteins in Gram-negative bacteria. For each query protein, PROFtmb provides both a Z-value indicating that the protein actually contains a membrane barrel, and a four-state per-residue labeling of upward- and downward-facing strands, periplasmic hairpins and extracellular loops. While most users submit individual proteins known to contain TMBs, some groups submit entire proteomes to screen for potential TMBs. Response time is about 4 min for a 500-residue protein. PROFtmb is a profile-based Hidden Markov Model (HMM) with an architecture mirroring the structure of TMBs. The per-residue accuracy on the 8-fold cross-validated testing set is 86% while whole-protein discrimination accuracy was 70 at 60% coverage. The PROFtmb web server includes all source code, training data and whole-proteome predictions from 78 Gram-negative bacterial genomes and is available freely and without registration at http://rostlab.org/services/proftmb.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16844988      PMCID: PMC1538807          DOI: 10.1093/nar/gkl262

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transmembrane beta-barrel (TMB) proteins form a beta-barrel as a single beta-sheet joined at its edges. The sheet is ‘all-next-neighbor’(1), meaning all paired strands are adjacent in sequence. N- and C-termini of TMBs always reside in the periplasm. The architecture can be described as the repeating pattern, where ‘up’ means towards the extracellular side: N-term, [up-strand, outer loop, down-strand, periplasmic hairpin], C-term. PROFtmb, originally published in (2) provides a prediction of residues in these four states (example Figure 1). It exploits statistical features of TMBs including enrichment of beta- and gamma-hairpins in the periplasm, lengths of outer loops, ‘aromatic cuffs’ and the ‘hydrophobic belt’, and follows several design ideas from other Hidden Markov Model (HMM)-based TMB predictors (3,4). PROFtmb predicts TMBs from Gram-negative bacteria only. It does not predict TMBs from mitochondria, chloroplasts or the outer membranes of ‘atypical’ Gram-positive bacteria called mycolata, which have thicker mycolic acid containing outer membranes.
Figure 1

True positive output example. PROFtmb prediction for OMPA from E.coli [PDB: 1g90 (5) chain A], a true TMB. Note that predicted strands have high contrast between state probabilities for a majority of their length.

PROCEDURE AND EXAMPLE OUTPUT

Users submit one or more FASTA-formatted protein sequences. For each sequence, PROFtmb builds a PSI-BLAST profile and runs the prediction, attempting to find the best fit of the protein to its TMB-based architecture, indicated as a Z-value. Results are always returned on a webpage, and take ∼4 min per 500-residue protein. In the case of more than one sequence, an email of the results URL is sent. If the query protein receives Z-value ≥ 4.0, PROFtmb provides a four-state (upward-strand, downward-strand, outer loop, periplasmic loop) per-residue prediction. Graphical output consists of color-coded four state posterior probability plots and amino acid sequence (Figure 1). Amino acid color indicates the final prediction, and usually corresponds to the state with maximum posterior probability, but with ‘corrections’ based on context shown with lighter-weight font [described in ‘Decoding’ section of the Supplementary Data of (2)]. While we did not quantify confidence levels for per-residue prediction, higher Z-values tend to have fewer corrected residues and greater contrast in state posterior probabilities. In the example shown (Figure 1), OMPA from Escherichia coli [PDB: 1g90 (5) chain A] is predicted correctly at high confidence as an eight-stranded TMB. This result is expected, given PROFtmb was trained on very similar sequences. In most predictions on real TMBs, corrected residues are only found at the boundaries between strands and loops. Also, most strand and loop states have the best state close to probability 1. In the second example shown (Figure 2), heme acquisition system protein A from Serratia marcescens, of the gammaproteobacteria class (Gram-negative) illustrates a false positive prediction. It receives a low but above-threshold Z-value of 4.8. In fact, the structure [PDB: 1B2V (6)] consists of a seven-stranded beta-sheet against four α-helices. PROFtmb does correctly predict the locations of five of the strands. Notice that predicted strands four, five and six have poor contrast in posterior probability, indicating a poor fit to the PROFtmb model.
Figure 2

False positive output example. Heme acquisition system protein A (HasA) from Serratia marcescens [PDB: 1B2V (6)], a secreted hemophore with architecture beta-alpha-beta (6)-alpha (2) according to SCOP (7). Predicted strands four, five and six show poor contrast in state probabilities and indicate a poor fit to the model.

Finally, proteins shorter than 140 or longer than 1392 residues receive Z-value −10 000 (data not shown). The lower length of 140 is a conservative estimate of the smallest possible TMB, while the upper bound reflects the limit of our test set for Z-value calibration. Occasionally, PROFtmb will assign Z-value less than four to a known TMB. Unfortunately, in such a case, the fact that it is a TMB can't be used to help produce a reliable per-residue prediction since PROFtmb derives the prediction from sequence alone. This occurred in about 15% of the cases in our test set (see ‘Performance Evaluation’ in ‘Methods’ tab on website).

DISCUSSION

In our original paper (2) we used PSI-BLAST profiles run with options −h 1 (E-value cutoff for inclusion in next pass) and −j 2 (number of iterations), and did not explore the effect of different profiles on PROFtmb accuracy, either for whole protein or per-residue prediction. Since then, we have run 8-fold jackknife tests (leave one out, seven in) on the original SWISS-PROT sequence versions of eight PDB structures (SetTMBfull: 1a0s_P, 1af6_A, 1bt9_A, 1fep_A, 1prn, 1qd5_A, 1qj9_A, 1qjp_A). We built sets of PSI-BLAST profiles with 30 different combinations of settings −h {1, 0.1, 0.01, 0.001, 0.0001, 0.00001} and −j {2, 3, 4, 5, 6} and used each set in a separate jackknifed test. The original Q2 accuracy, with settings −h 1 −j 2 was 86.0%, while the best settings, −h 0.0001 −j 2 achieved 87.3% Q2 accuracy. As a result, we changed the defaults to −h 0.0001 −j 2. Additionally, we now allow the user to select these parameters. We have not estimated the effects of PSI-BLAST settings on whole-protein prediction yet. Currently, Z-value and resulting estimated accuracy and coverage are calibrated from our original sequence-unique set called SetROC, containing a representative set of proteins from SWISS-PROT. As sequence databases are updated, we will periodically re-calibrate Z-values. A cluster plot and resulting accuracy versus coverage curve can be found in the ‘Methods’ section of the website.

DOWNLOADS

Predictions on 78 Gram-negative proteomes are available in the Download section, updated since original publication as follows. First, length-adjusted bits score was replaced by Z-value, which gives slightly improved discrimination on our test set (unpublished data). Second, per-residue predictions were re-run using updated PSI-BLAST profiles, with option −h 0.0001 rather than −h 1. Both changes are expected improvements, but haven't been rigorously tested. Third, the model architecture now explicitly includes BEGIN and END states, representing the beginning and end of the amino acid sequence. This is required for the current version of the software. The PROFtmb software is a general profile-HMM allowing specification of model architecture, encoding and decoding. The training data, consisting of eight TMB sequences with hand-annotated per-residue labeling based on their 3D structures, is available as well. Interested users may download and compile the C++ source code and use PROFtmb with the original training data or modify it. We make it available in the spirit of reproducibility, and encourage interested readers to contact the authors for more detailed advice.
  7 in total

1.  The crystal structure of HasA, a hemophore secreted by Serratia marcescens.

Authors:  P Arnoux; R Haser; N Izadi; A Lecroisey; M Delepierre; C Wandersman; M Czjzek
Journal:  Nat Struct Biol       Date:  1999-06

2.  SCOP database in 2002: refinements accommodate structural genomics.

Authors:  Loredana Lo Conte; Steven E Brenner; Tim J P Hubbard; Cyrus Chothia; Alexey G Murzin
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 3.  Transmembrane beta-barrel proteins.

Authors:  Georg E Schulz
Journal:  Adv Protein Chem       Date:  2003

4.  Structure of outer membrane protein A transmembrane domain by NMR spectroscopy.

Authors:  A Arora; F Abildgaard; J H Bushweller; L K Tamm
Journal:  Nat Struct Biol       Date:  2001-04

5.  A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins.

Authors:  Pier Luigi Martelli; Piero Fariselli; Anders Krogh; Rita Casadio
Journal:  Bioinformatics       Date:  2002       Impact factor: 6.937

6.  Predicting transmembrane beta-barrels in proteomes.

Authors:  Henry R Bigelow; Donald S Petrey; Jinfeng Liu; Dariusz Przybylski; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2004-05-11       Impact factor: 16.971

7.  A Hidden Markov Model method, capable of predicting and discriminating beta-barrel outer membrane proteins.

Authors:  Pantelis G Bagos; Theodore D Liakopoulos; Ioannis C Spyropoulos; Stavros J Hamodrakas
Journal:  BMC Bioinformatics       Date:  2004-03-15       Impact factor: 3.169

  7 in total
  17 in total

1.  Unexpected features of the dark proteome.

Authors:  Nelson Perdigão; Julian Heinrich; Christian Stolte; Kenneth S Sabir; Michael J Buckley; Bruce Tabor; Beth Signal; Brian S Gloss; Christopher J Hammang; Burkhard Rost; Andrea Schafferhans; Seán I O'Donoghue
Journal:  Proc Natl Acad Sci U S A       Date:  2015-11-17       Impact factor: 11.205

2.  Predicting weakly stable regions, oligomerization state, and protein-protein interfaces in transmembrane domains of outer membrane proteins.

Authors:  Hammad Naveed; Ronald Jackups; Jie Liang
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-21       Impact factor: 11.205

3.  Improved prediction of trans-membrane spans in proteins using an Artificial Neural Network.

Authors:  Julia Koehler; Ralf Mueller; Jens Meiler
Journal:  IEEE Symp Comput Intell Bioinforma Comput Biol Proc       Date:  2009-05-15

Review 4.  Computational modeling of membrane proteins.

Authors:  Julia Koehler Leman; Martin B Ulmschneider; Jeffrey J Gray
Journal:  Proteins       Date:  2014-11-19

5.  TMBB-DB: a transmembrane β-barrel proteome database.

Authors:  Thomas C Freeman; William C Wimley
Journal:  Bioinformatics       Date:  2012-07-27       Impact factor: 6.937

6.  CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources.

Authors:  David Goudenège; Stéphane Avner; Céline Lucchetti-Miganeh; Frédérique Barloy-Hubler
Journal:  BMC Microbiol       Date:  2010-03-23       Impact factor: 3.605

7.  Simultaneous prediction of protein secondary structure and transmembrane spans.

Authors:  Julia Koehler Leman; Ralf Mueller; Mert Karakas; Nils Woetzel; Jens Meiler
Journal:  Proteins       Date:  2013-04-10

8.  Structural genomics target selection for the New York consortium on membrane protein structure.

Authors:  Marco Punta; James Love; Samuel Handelman; John F Hunt; Lawrence Shapiro; Wayne A Hendrickson; Burkhard Rost
Journal:  J Struct Funct Genomics       Date:  2009-10-27

9.  An automatic method for identifying surface proteins in bacteria: SLEP.

Authors:  Emanuela Giombini; Massimiliano Orsini; Danilo Carrabino; Anna Tramontano
Journal:  BMC Bioinformatics       Date:  2010-01-20       Impact factor: 3.169

10.  PredictProtein - Predicting Protein Structure and Function for 29 Years.

Authors:  Michael Bernhofer; Christian Dallago; Tim Karl; Venkata Satagopam; Michael Heinzinger; Maria Littmann; Tobias Olenyi; Jiajun Qiu; Konstantin Schütze; Guy Yachdav; Haim Ashkenazy; Nir Ben-Tal; Yana Bromberg; Tatyana Goldberg; Laszlo Kajan; Sean O'Donoghue; Chris Sander; Andrea Schafferhans; Avner Schlessinger; Gerrit Vriend; Milot Mirdita; Piotr Gawron; Wei Gu; Yohan Jarosz; Christophe Trefois; Martin Steinegger; Reinhard Schneider; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.