| Literature DB >> 20333233 |
Timothy Nugent1, David T Jones.
Abstract
Alpha-helical transmembrane proteins constitute roughly 30% of a typical genome and are involved in a wide variety of important biological processes including cell signalling, transport of membrane-impermeable molecules and cell recognition. Despite significant efforts to predict transmembrane protein topology, comparatively little attention has been directed toward developing a method to pack the helices together. Here, we present a novel approach to predict lipid exposure, residue contacts, helix-helix interactions and finally the optimal helical packing arrangement of transmembrane proteins. Using molecular dynamics data, we have trained and cross-validated a support vector machine (SVM) classifier to predict per residue lipid exposure with 69% accuracy. This information is combined with additional features to train a second SVM to predict residue contacts which are then used to determine helix-helix interaction with up to 65% accuracy under stringent cross-validation on a non-redundant test set. Our method is also able to discriminate native from decoy helical packing arrangements with up to 70% accuracy. Finally, we employ a force-directed algorithm to construct the optimal helical packing arrangement which demonstrates success for proteins containing up to 13 transmembrane helices. This software is freely available as source code from http://bioinf.cs.ucl.ac.uk/memsat/mempack/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20333233 PMCID: PMC2841610 DOI: 10.1371/journal.pcbi.1000714
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Per residue lipid exposure prediction performance using a data set of 77 sequences.
| Method | Lipid exposure definition | Precision | Recall | FPR | FNR | MCC | Accuracy |
| MEMPACK | CGDB | 0.69 | 0.56 | 0.36 | 0.26 | 0.38 | 69.3% |
| MEMPACK | 1.9 Å probe | 0.71 | 0.61 | 0.39 | 0.33 | 0.27 | 64.3% |
| LIPS | CGDB | 0.61 | 0.59 | 0.48 | 0.29 | 0.23 | 61.7% |
| LIPS | 1.9 Å probe | 0.65 | 0.65 | 0.50 | 0.32 | 0.18 | 60.3% |
Lipid exposure definition = test set labelled according to the CGDB definition or using a 1.9 Å probe. FPR = false positive rate. FNR = false negative rate. MCC = Matthews Correlation Coefficient. Accuracy = (TP + TN)/(TP + TN + FP + FN).
Per residue pair contact prediction performance using a data set of 74 sequences.
| Method | Contact Definition | Precision | Recall | FPR | FNR | MCC |
| MEMPACK | 1 | 0.69 | 0.0023 | 0.0010 | 0.88 | 0.28 |
| SVMcon | 1 | 0.06 | 0.00050 | 0.0083 | 0.97 | 0.03 |
| SVMcon L5 | 1 | 0.09 | 0.00 | 0.0003 | 1.00 | 0.01 |
| PROFcon | 1 | 0.03 | 0.021 | 0.4600 | 0.41 | 0.04 |
| PROFcon L5 | 1 | 0.06 | 0.00010 | 0.0018 | 0.99 | 0.01 |
| MEMPACK | 2 | 0.69 | 0.0015 | 0.0007 | 0.88 | 0.28 |
| TMhit L5 | 2 | 0.57 | 0.0015 | 0.0012 | 0.88 | 0.26 |
| MEMPACK | 3 | 0.70 | 0.0022 | 0.0010 | 0.89 | 0.27 |
| TMHcon L5 | 3 | 0.09 | 0.00020 | 0.0021 | 0.99 | 0.02 |
Contact definition 1 = A maximal distance of 8 Å between their C-beta atoms (C-alpha for glycine). 2 = The distance between any two atoms from an interacting pair is less than the sum of their van der Waals radii plus a threshold of 0.6 Å. 3 = The minimal distance between side chain or backbone heavy atoms in an interacting pair is less than 5.5 Å. Results for contact definition 3 used 58 sequences that had more than 2 TM helices as TMHcon is unable to make predictions for 2 TM helix sequences.
Helix-helix interaction prediction performance using a data set of 74 sequences.
| Method | Contact Definition | Precision | Recall | FPR | FNR | MCC | Accuracy |
| MEMPACK | 1 | 0.93 | 0.10 | 0.0087 | 0.84 | 0.29 | 64.7% |
| SVMcon | 1 | 0.57 | 0.11 | 0.090 | 0.84 | 0.11 | 59.3% |
| SVMcon L5 | 1 | 0.82 | 0.034 | 0.0074 | 0.95 | 0.13 | 59.5% |
| PROFcon | 1 | 0.43 | 0.16 | 0.83 | 0.16 | 0.02 | 45.4% |
| PROFcon L5 | 1 | 0.72 | 0.11 | 0.043 | 0.84 | 0.19 | 62.0% |
| MEMPACK | 2 | 0.95 | 0.11 | 0.0062 | 0.84 | 0.29 | 63.6% |
| TMhit L5 | 2 | 0.77 | 0.31 | 0.12 | 0.47 | 0.45 | 73.2% |
| MEMPACK | 3 | 0.94 | 0.11 | 0.008 | 0.85 | 0.27 | 60.6% |
| TMHcon L5 | 3 | 0.49 | 0.32 | 0.37 | 0.63 | 0.02 | 52.3% |
Successful prediction of interacting helices requires one residue from each helix to be in contact. Results for contact definition 3 used 58 sequences that had more than 2 TM helices as TMHcon is unable to make predictions for 2 TM helix sequences.
Helical packing arrangement decoy discrimination using a data set of 71 sequences with 2 or more TM helices (n = 71) and a data set of 57 sequences with 3 or more helices (n = 57).
| Method | Contact Definition | Accuracy (n = 57) | Accuracy (n = 71) |
| MEMPACK | 1 | 68.4% | 69.0% |
| SVMcon L5 | 1 | 52.6% | 56.3% |
| PROFcon L5 | 1 | 45.6% | 52.1% |
| MEMPACK | 2 | 66.6% | 67.6% |
| TMhit L5 | 2 | 59.6% | 66.2% |
| MEMPACK | 3 | 70.2% | 70.4% |
| TMHcon L5 | 3 | 40.4% | - |
Accuracy reflects the frequency at which the native or native model helical packing arrangement achieved the highest score compared to the decoy set.
Figure 1Predicted helical packing arrangement and crystal structure of Halorhodopsin (1E12:A).
In this example the two left-most helices share the same interactions. The correct arrangement has been identified as having no same-side loop crossovers, compared to one for the incorrect arrangement. Predicted residue-residue contacts are annotated on the packing arrangement while observed helix-helix interactions are annotated on the crystal structure.
Figure 2Predicted helical packing arrangement and crystal structure of Photosystem I chain D (1JB0:L).
Application of a genetic algorithm to rotate helices about their Z-axes results in the correct positioning of residues Val64, Ala135 and Phe137.
Assessment of predicted helical packing arrangements for the 17 sequences where all interactions were successfully predicted.
| Helical packing arrangement prediction | Count |
| Resembles two-dimensional slice from crystal structure | 9 |
| No observed helix-helix interactions | 3 |
| Incorrect due to linear configuration | 3 |
| Incorrect helix placement | 2 |
Arrangements were compared to a two-dimensional slice taken from the respective crystal structures and assessed based on the alignment between the helices in the predicted arrangement and in the slice; in 9 cases there was overlap for all helices (2F95:A, 1E12:A, 1XIO:A, 2D57:A, 1FFT:C, 1JB0:L, 1C17:A, 1R3J:C, 2AHY:A). In 3 cases, there were no observed helix-helix interactions therefore no arrangement could be predicted (1VCR:A, 1YQ3:D, 1ZOY:C). In 3 cases, the arrangement predicted a circular configuration whereas the correct arrangement was approximately linear (1DXR:M, 2AXT:D, 2AXT:A).
Figure 3Helical packing arrangement and crystal structure of cytochrome C oxidase (1XME:A), generated using observed rather than predicted helix-helix interactions.
Observed residue-residue contacts are annotated on the packing arrangement while observed helix-helix interactions are annotated on the crystal structure. In this example, the two helices at the bottom left of the arrangement are incorrectly placed; they share the same helix-helix interactions but the correct arrangement has one same-side loop crossover whereas the incorrect arrangement has none. The alternative correct arrangement where the placement of these two helices is reversed is returned as the second highest scoring arrangement.