Literature DB >> 19465378

RHYTHM--a server to predict the orientation of transmembrane helices in channels and membrane-coils.

Alexander Rose¹, Stephan Lorenzen, Andrean Goede, Björn Gruening, Peter W Hildebrand.

Abstract

RHYTHM is a web server that predicts buried versus exposed residues of helical membrane proteins. Starting from a given protein sequence, secondary and tertiary structure information is calculated by RHYTHM within only a few seconds. The prediction applies structural information from a growing data base of precalculated packing files and evolutionary information from sequence patterns conserved in a representative dataset of membrane proteins ('Pfam-domains'). The program uses two types of position specific matrices to account for the different geometries of packing in channels and transporters ('channels') or other membrane proteins ('membrane-coils'). The output provides information on the secondary structure and topology of the protein and specifically on the contact type of each residue and its conservation. This information can be downloaded as a graphical file for illustration, a text file for analysis and statistics and a PyMOL file for modeling purposes. The server can be freely accessed at: URL: http://proteinformatics.de/rhythm.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2009 PMID： 19465378 PMCID： PMC2703963 DOI： 10.1093/nar/gkp418

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

About one third of the presently mapped gene sequences encode for membrane proteins, which are also major targets for pharmaceutical products (1,2). In contrast, only a minor fraction (February 2009, 1.8%) of the protein structures deposited in the protein data bank (PDB) belongs to this structural class (3,4). Due to difficulties in over expression and crystallization, their tertiary structure is often evaluated using computational methods (5–8). Homology modeling may be applied when an appropriate template structure is available (9). In other cases, ab initio or knowledge-based tertiary structure modeling comes into play. There is a high level of predictability regarding secondary structure elements (10–12). New approaches deal with the prediction of the exact lengths of the transmembrane helices (13). Finally, transmembrane topology prediction was optimized applying consensus predictions also identifying signal peptides (14). However, tools that perform or assist in low resolution tertiary structure modeling of helical membrane proteins are still rare (15–19). The growing data on high-resolution structures of helical membrane proteins provide an appropriate base for structural analysis, statistics and the development of knowledge-based prediction methods (12,15–32). The type of packing of α-helices is fundamental for the stabilization and function of all helical membrane proteins (33–36). Residues involved in helix–helix interactions are therefore regularly more conserved than others and are often arranged in specific sequence motifs that reflect the type of packing (23,25,35,37). Right-handed parallel and anti-parallel interactions are typically found in channels (membrane proteins with a functional pore). These interactions are mainly accomplished by weakly polar amino acids (G > S > T > F) that preferably create contacts every fourth residue (23,37,38). Left-handed anti-parallel interactions are predominantly found in membrane-coils. There, large and polar residues (D > S > M > Q) create characteristic contacts every 3.5th residues (23,37,39). The higher conservation of residues involved in helix–helix contacts was applied in methods predicting tertiary structure contacts (40,41). Such applications can be further improved combining conservation criteria with amino acid propensity scales (18,24,32,42). The combination of statistical potentials with fragment-based modeling and energy minimizations were applied for de novo modeling approaches (28,43–45). There are some tools available to predict buried versus exposed regions of transmembrane helices. ProperTM (18), LIPS (16), RANTS (15) and TMX (19) depend on multiple sequence alignments to produce predictions about transmembrane helix orientations or solvent accessibility. However, the quality of prediction by these tools largely depends on the quality of the multiple sequence alignment provided by the user. Due to the small size of several transmembrane protein families, such alignments are not always at hand. Moreover, the output is not always presented in a user-friendly format and thus cannot be directly used for modeling purposes. RHYTHM is the first server that predicts the exposure or burial of transmembrane residues incorporating the structural specificities of channels. The quality of prediction (expressed by AUC-values) of helix–helix contacts rises by 16% to an average value of 76% when the sequence motifs typical for channels are applied, compared to the same approach when a non-specific matrix is taken (23). For our web service, the position-specific matrices were updated using an enlarged data set of input structures. To optimize the sensitivity of helix–helix contact predictions at high specificity thresholds, the matrix prediction method is now combined with a prediction directly applying evolutionary information from ‘Pfam-domains’ (46,47). RHYTHM also integrates the secondary structure prediction tool HMMTOP (48). Thus, after the upload of a single sequence file and the specification of the position specific matrix type (‘channel’ or ‘membrane-coil’) the prediction for tertiary structure contacts is started.

METHODS

Matrix prediction method

The prediction of buried versus exposed residues is based on two different sets of propensity matrices derived from representative and non-redundant datasets of 21 channels and 14 membrane-coils containing 310 and 179 transmembrane helices, respectively (see website for details). The data were analyzed as described in detail in earlier analyses (23,49). Shortly, helical sections were defined by the Kabsch and Sander algorithm (50). Only those residues were defined as transmembrane helixes with their Cα-atoms lying between the two membrane planes. The membrane planes were calculated applying the output of the TMDET algorithm (51). The type of contact of a specified residue was determined counting the atomic contacts to residues of another helix, to the virtual membrane or to virtual water (23). Structures with helix pairs too far apart were removed after visual inspection. The matrices (which will be regularly updated due to the growing data set of high resolution membrane protein structures) store the propensities of residues to contact another helix or the membrane. To account for sequence motifs, propensities of all neighboring amino acids are stored in the same matrix [see website for details or ref. (23)]. Scores are calculated by summation of the residual propensities at positions 0 to ± 4 (channels) or 0 to ± 7 (membrane-coils). These windows account for the different RHYTHM of contacts in channels and membrane-coils (23,25,38). An amino acid is predicted to be buried (step 1, see Figure 1) or exposed (step 2, see Figure 1), when this score is above a certain threshold specified by the user. The advantage of this approach is that the prediction is thus much less affected by variations of single amino acid propensities. However, amino acids at the helix termini are not recorded by our method.

Figure 1.

Workflow of RHYTHM: the prediction is performed in three steps including (1) matrix prediction of helix–helix contacts; (2) matrix prediction of helix–membrane contacts and (3) prediction of helix–helix contacts by conservation criteria (Pfam domains) (52).

Conservation criteria

The Pfam database is an extensive set of protein domains and families currently covering 72% of known protein sequences (46). The families consist of multiple alignments of functionally or evolutionary-related protein sequences (47). These alignments also reproduce evolutionary relationships that would otherwise not be detected (9). To search the Pfam database, HMMER (version 2.3.2) is applied (52). HMMER allows for sensitive searching in a database of the consensus sequences of various protein families using Hidden Markov Models. To speed up the search the Pfam database was restricted to the 691 membrane protein families provided in February 2009. A bonus is added to the helix–helix score of fully conserved residues, according to the finding that conserved residues are often involved in helix–helix contacts (37,53). The value of the bonus depends on the selected specificity and is optimized for highest accuracy.

Three step prediction

A three-step approach was applied to predict buried versus exposed residues (Figure 1): Matrix prediction of helix–helix contacts: Amino acids predicted by HMMTOP (48) or specified by the user to be part of a transmembrane helix are scored. The prediction matrix has to be chosen by the user. In order to do this, the user must know whether the protein of the uploaded sequence has a functional pore (channels) or not (membrane-coil). The residues above the selected specificity threshold for helix–helix contacts (medium, high, very high and highest) are predicted as helix–helix contacts. The specificities for a single contact type range from about 75% for medium to 90% for very high thresholds. Matrix prediction of helix–membrane contacts: The remaining residues are predicted analogously as helix–membrane contacts using the threshold specified by the user at the beginning of the procedure. The specificities for that prediction also range from about 75–90%, respectively. In conjunction, a maximum of 70% (medium specificity) of the residues are recorded at the moment by the matrix prediction method. Pfam prediction: To optimize the sensitivity of helix–helix contact predictions at high specificity thresholds, residues not recorded by the matrix prediction method may be verified using conservation criteria. This means that a bonus optimized for the positive predictive value at a selected specificity threshold is added to the helix–helix score from matrix prediction. A residue is predicted as buried when the combined score is above the defined threshold. As a result a plus of 10–20% residues are additionally assigned to be part of a helix–helix contact. The specificity of prediction is not significantly affected by the Pfam prediction.

RESULTS AND DISCUSSION

Performance of the combined prediction

The prediction quality of RHYTHM improved compared to our previous analysis (23). This is due to the enlarged data set of helical membrane proteins and the combination of the matrix prediction method with the prediction from evolutionary conservation. The average AUC-values (from a leave-one-out cross validation) for the prediction of helix–helix contacts are 0.72 for channels [as in our previous analysis (23)] and 0.68 for membrane–coils, respectively. The corresponding values for the prediction of helix–membrane contacts are 0.75 and 0.73. Best predictions were obtained for helix–helix contacts of the translocon channel (PDB-entry: 1rh5, AUC-value: 0.78) and for helix–membrane contacts of the ABC-transporter protein (PDB-entry: 2qi9, AUC-value: 0.86). To receive high quality predictions with RHYTHM, we suggest selecting the default specificity threshold ‘very high’. This threshold may then be reduced if too few contacts are predicted. Besides tertiary structure contact types, the output assigns the secondary structure and topology of the protein (51). This information is provided as a graphical file for illustration (Figure 2), a text file for analysis and statistics and a PyMOL file for modeling purposes (Figure 3).

Figure 2.

Figure 3.

Two high-resolution crystal structures of (A) rhodopsin, PDB-entry: 1u19 and (B) the ammonium transporter, PDB-entry: 1xqf, were colored according to the predicted contact types (green = helix–membrane, red = helix–helix) using the downloadable PyMOL script from RHYTHM. Helical sections that are predicted by HMMTOP to protrude from the lipid bilayer are coloured yellow. The two structures represent two different architectures (23). Rhodopsin belongs to ‘membrane-coils’, where helix pairs are regularly arranged in small left-handed packing angles. The ammonium transporter belongs to ‘channels’ that compose of helix pairs packed at large right-handed angles. Different matrices are applied for the prediction of contact types of these two distinct packing modes.

Example graphical output of RHYTHM: topology of the ammonium transporter predicted with HMMTOP (51). Tertiary structure contacts predicted as helix–helix contacts (red) or helix–membrane contacts (green). Highly conserved residues are denoted with blue dots. Two high-resolution crystal structures of (A) rhodopsin, PDB-entry: 1u19 and (B) the ammonium transporter, PDB-entry: 1xqf, were colored according to the predicted contact types (green = helix–membrane, red = helix–helix) using the downloadable PyMOL script from RHYTHM. Helical sections that are predicted by HMMTOP to protrude from the lipid bilayer are coloured yellow. The two structures represent two different architectures (23). Rhodopsin belongs to ‘membrane-coils’, where helix pairs are regularly arranged in small left-handed packing angles. The ammonium transporter belongs to ‘channels’ that compose of helix pairs packed at large right-handed angles. Different matrices are applied for the prediction of contact types of these two distinct packing modes.

Complexity of tertiary contact predictions

The quality of prediction will further improve as the data set of non-homologous high resolution membrane protein structures grows. At the moment the prediction is limited for several reasons: A significant number of buried residues is close to internal cavities (37,54). Such residues are not judged in our analysis to be part of a helix–helix contact due to insufficient contacts to other residues and are thus often evaluated as false positives in our prediction. Large packing defects regularly account for structural flexibilities (36,55–57). The separate prediction of residues involved in packing defects could therefore enhance the prediction of tertiary structure contacts. Moreover, about one quarter of the residues is in contact with both another helix and the membrane. These residues are frequently not recorded at high specificity thresholds but will be predicted as buried or exposed at lower thresholds. This ambiguity clearly complicates the prediction, as well as the fact that many channels are highly flexible. Residues that are buried in one functional state may become exposed in another (45,58). Finally, residues that appear to be exposed to lipid may become (and may also be predicted to be) buried in quaternary complexes (59). With more structural data of channels a prediction of residues that are exposed or buried depending on their functional state will be possible.

Technical details

All computations are done on our server including optional prediction of membrane helix sections and searches for Pfam domains. Modern web technologies (AJAX, JavaScript, PHP, CSS) were used to create a fast and intuitively usable web application.

FUNDING

European Union (ProFIT) and the Deutsche Forschungsgemeinschaft (SFB449, SFB740). Funding for open access charge: SFB449. Conflict of interest statement. None declared.

59 in total

1. Helical packing patterns in membrane and soluble proteins.

Authors: Marina Gimpelev; Lucy R Forrest; Diana Murray; Barry Honig
Journal: Biophys J Date: 2004-10-01 Impact factor: 4.033

Review 2. Solving the membrane protein folding problem.

Authors: James U Bowie
Journal: Nature Date: 2005-12-01 Impact factor: 49.962

3. Multipass membrane protein structure prediction using Rosetta.

Authors: Vladimir Yarov-Yarovoy; Jack Schonbrun; David Baker
Journal: Proteins Date: 2006-03-01

Review 4. Computational analysis of membrane proteins: genomic occurrence, structure prediction and helix interactions.

Authors: Ursula Lehnert; Yu Xia; Thomas E Royce; Chem-Sing Goh; Yang Liu; Alessandro Senes; Haiyuan Yu; Zhao Lei Zhang; Donald M Engelman; Mark Gerstein
Journal: Q Rev Biophys Date: 2004-05 Impact factor: 5.318

Review 5. Transmembrane protein structures without X-rays.

Authors: Sarel J Fleishman; Vinzenz M Unger; Nir Ben-Tal
Journal: Trends Biochem Sci Date: 2006-01-10 Impact factor: 13.807

6. Prediction of buried helices in multispan alpha helical membrane proteins.

Authors: Larisa Adamian; Jie Liang
Journal: Proteins Date: 2006-04-01

7. Dictionary of interfaces in proteins (DIP). Data bank of complementary molecular surface patches.

Authors: R Preissner; A Goede; C Frömmel
Journal: J Mol Biol Date: 1998-07-17 Impact factor: 5.469

8. A potential smoothing algorithm accurately predicts transmembrane helix packing.

Authors: R V Pappu; G R Marshall; J W Ponder
Journal: Nat Struct Biol Date: 1999-01

9. Computational analysis of alpha-helical membrane protein structure: implications for the prediction of 3D structural models.

Authors: Tina A Eyre; Linda Partridge; Janet M Thornton
Journal: Protein Eng Des Sel Date: 2004-09-23 Impact factor: 1.650

10. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

Authors: Gábor E Tusnády; Zsuzsanna Dosztányi; István Simon
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

7 in total

1. Decoding phage resistance by mpr and its role in survivability of Mycobacterium smegmatis.

Authors: Surya Pratap Seniya; Vikas Jain
Journal: Nucleic Acids Res Date: 2022-06-17 Impact factor: 19.160

2. MPlot--a server to analyze and visualize tertiary structure contacts and geometrical features of helical membrane proteins.

Authors: Alexander Rose; Andrean Goede; Peter W Hildebrand
Journal: Nucleic Acids Res Date: 2010-05-19 Impact factor: 16.971

3. Structural consequences of hereditary spastic paraplegia disease-related mutations in kinesin.

Authors: Mandira Dutta; Michael R Diehl; José N Onuchic; Biman Jana
Journal: Proc Natl Acad Sci U S A Date: 2018-10-26 Impact factor: 11.205

4. Integrated prediction of one-dimensional structural features and their relationships with conformational flexibility in helical membrane proteins.

Authors: Shandar Ahmad; Yumlembam Hemajit Singh; Yogesh Paudel; Takaharu Mori; Yuji Sugita; Kenji Mizuguchi
Journal: BMC Bioinformatics Date: 2010-10-27 Impact factor: 3.169