| Literature DB >> 24204844 |
Sindy Neumann1, Angelika Fuchs, Barbara Hummel, Dmitrij Frishman.
Abstract
Despite significant methodological advances in protein structure determination high-resolution structures of membrane proteins are still rare, leaving sequence-based predictions as the only option for exploring the structural variability of membrane proteins at large scale. Here, a new structural classification approach for α-helical membrane proteins is introduced based on the similarity of predicted helix interaction patterns. Its application to proteins with known 3D structure showed that it is able to reliably detect structurally similar proteins even in the absence of any sequence similarity, reproducing the SCOP and CATH classifications with a sensitivity of 65% at a specificity of 90%. We applied the new approach to enhance our comprehensive structural classification of α-helical membrane proteins (CAMPS), which is primarily based on sequence and topology similarity, in order to find protein clusters that describe the same fold in the absence of sequence similarity. The total of 151 helix architectures were delineated for proteins with more than four transmembrane segments. Interestingly, we observed that proteins with 8 and more transmembrane helices correspond to fewer different architectures than proteins with up to 7 helices, suggesting that in large membrane proteins the evolutionary tendency to re-use already available folds is more pronounced.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24204844 PMCID: PMC3808409 DOI: 10.1371/journal.pone.0077491
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of the methodology.
A: Classification of α-helical membrane protein structures using predicted helix architectures. B: Parameter optimization for generating consensus helix architectures. C: Classification of α-helical membrane proteins using consensus helix architectures.
Classification of proteins in SCOP and CATH using predicted helix interactions.
| HISS | Avg(HISS)same
| Avg(HISS)diff
| AUC | Score | Sensitivity | Specificity |
| uw | 0.890 | 0.749 | 0.846 | 0.88 | 72.6 | 78.4 |
| 0.90 | 65.3 | 92.2 | ||||
| w | 0.864 | 0.665 | 0.848 | 0.82 | 76.8 | 81.0 |
| 0.88 | 53.7 | 91.4 |
Helix interactions were predicted using the threshold combination C = 9 (network NN4) and C = 15 (network NN4-D), see Materials and Methods. HISS scores were calculated with and without weighting edges.
HISS scores were calculated both without weighting helix interactions (uw) and with up weighting interactions involving >15 residue contacts by a factor 1.5 (w).
Avg(HISS)same: average HISS score for proteins classified to the same fold in SCOP and CATH.
Avg(HISS)diff: average HISS score for proteins classified to different folds in SCOP and CATH.
AUC: area under the curve describing how well proteins with the same fold can be differentiated from proteins with different folds (AUC = 0.5 would correspond to a random prediction).
Score: HISS score threshold used to identify proteins with the same helix architecture. For both weighted and unweighted HISS scores, two thresholds were chosen such that the specificity of the obtained classifications most closely approached either 80% or 90%.
Sensitivity: Fraction of all protein pairs with the same SCOP/CATH fold annotation having a HISS score above the specified threshold.
Specificity: Fraction of all protein pairs with different SCOP/CATH fold annotation having a HISS score below the specified threshold.
Classification of all proteins with solved 3D structure using predicted helix architectures in comparison to the HISSdb database.
| HISS | Avg(HISS)same
| Avg(HISS)diff
| AUC | Score | Sensitivity | Specificity |
| Uw | 0.812 | 0.717 | 0.704 | 0.89 | 77.6 | 74.5 |
| 0.90 | 37.9 | 85.3 | ||||
| W | 0.775 | 0.622 | 0.752 | 0.85 | 66.0 | 81.0 |
| 0.88 | 48.7 | 90.5 |
Helix interactions were predicted using the threshold combination C = 9 (network NN4) and C = 15 (network NN4-D), see Materials and Methods. HISS scores were calculated with and without edge weighting. Final classifications were obtained by clustering all proteins satisfying the specified HISS score thresholds using the MCL algorithm.
HISS scores were calculated both without weighting helix interactions (uw) and with up weighting interactions involving >15 residue contacts by a factor 1.5 (w).
Avg(HISS)same: average HISS score for proteins classified to the same helix architecture type in HISSdb.
Avg(HISS)diff: average HISS score for proteins classified to different helix architecture types in HISSdb.
AUC: area under the curve describing how well proteins with the same fold can be differentiated from proteins with different folds (AUC = 0.5 would correspond to a random prediction).
Score: HISS score threshold used for clustering proteins with the MCL algorithm. For both weighted and unweighted HISS scores, two thresholds were selected such that the specificity of the obtained classifications most closely approached either 80% or 90%.
Sensitivity: Fraction of all protein pairs with the same HISSdb architecture annotation assigned to the same MCL cluster.
Specificity: Fraction of all protein pairs with different HISSdb architecture annotation assigned to different MCL clusters.
Figure 2Sequence similarity distribution.
Sequence identity among all protein pairs classified to the same architecture within the HISSdb database and the two structural classifications obtained using predicted helix interactions with either 80% specificity or 90% specificity.
Parameter optimization for generation of consensus helix architectures.
| Graph type | Contact threshold | Consensus threshold | Accuracy [%] | Sensitivity | Specificity |
| Consensus | C11/C11 | 0.3 | 71.8 | 61.6 | 80.1 |
| C12/C14 | 0.6 | 69.9 | 45.4 | 89.7 | |
| PDB | C5/C12 | − | 70.9 | 59.9 | 79.8 |
| C12/C18 | − | 69.8 | 45.4 | 89.6 | |
| Average | C6/C18 | − | 70.1 | 59.4 | 79.5 |
| C15/C27 | − | 66.0 | 41.3 | 89.5 |
Contact threshold (NN4/NN4-D): number of required helix-helix contacts to predict a helix as interacting. NN4 and NN4-D are two versions of the TMHcon software for the prediction of helix-helix contacts.
Consensus threshold: fraction of individual helix architectures required to contain a helix interaction to transfer it to the consensus architecture.
Sensitivity: fraction of known interacting helices that can also be found in the predicted architectures.
Specificity: fraction of known non-interacting helices that are also absent in the predicted architectures.
PDB: helix architectures derived from known PDB structures were compared with those that were predicted for these PDB proteins.
Average: helix architectures were predicted for all proteins involved in the consensus architecture and compared with the helix architectures derived from the known PDB structures.
Parameter optimization for clustering of consensus helix architectures.
| SC-cluster dataset | HISS score threshold | Inflation value | Sensitivity | Specificity |
| All | 0.70 | 1.1 | 67.1 | 85.3 |
| 0.86 | 2 | 51.8 | 89.5 | |
| ≤7 TMHs | 0.84 | 5 | 66.9 | 78.8 |
| 0.95 | 1.1 | 54.9 | 91.2 | |
| >7 TMHs | 0.70 | 1.1 | 54.2 | 84.3 |
| 0.75 | 1.1 | 49.9 | 89.5 |
All: All SC-clusters from the classification dataset; ≤7 TMHs: SC-clusters with members having up to seven TMHs; >7 TMHs: SC-clusters with members having more than seven TMHs.
Sensitivity: Fraction of all proteins pairs having the same Pfam annotation that were assigned to the same HIS cluster using the respective HISS score threshold and inflation value.
Specificity: Fraction of all proteins pairs having different Pfam annotations that were assigned to different HIS clusters using the respective HISS score threshold and inflation value.
TMH distribution among SC-clusters and HIS clusters.
| Number ofTMHs | Number ofSC-clusters | Number of HIS clusters | Reduction factor | ||
| Singleton | Non-Singleton | Total | |||
| 5 | 97 | 25 | 18 | 43 | 2.3 |
| 6 | 121 | 26 | 8 | 34 | 3.6 |
| 7 | 68 | 28 | 3 | 31 | 2.2 |
| 8 | 24 | 1 | 1 | 2 | 12.0 |
| 9 | 12 | 3 | 1 | 4 | 3.0 |
| 10 | 37 | 16 | 4 | 20 | 1.9 |
| 11 | 27 | 0 | 1 | 1 | 27.0 |
| 12 | 30 | 6 | 1 | 7 | 4.3 |
| 13 | 5 | 0 | 1 | 1 | 5.0 |
| 14 | 8 | 4 | 2 | 6 | 1.3 |
| 15 | 2 | 2 | 0 | 2 | 1.0 |
| Total | 431 | 111 | 40 | 151 | 2.9 |
HIS cluster containing only one SC-cluster.
HIS cluster containing two or more SC-clusters.
Number of SC-clusters divided by total number of HIS clusters.
Figure 3Consensus helix architectures of selected SC-clusters.
A: All SC-clusters belong to Pfam clan CL0192 (‘Family A G protein-coupled receptor-like superfamily’) and were joined into the same HIS cluster (CMHIS0006). B: All SC-clusters belong to Pfam clan CL0062 (‘APC superfamily’) and were joined into the same HIS cluster (CMHIS0005). Nodes correspond to transmembrane helices, edges represent helix interactions.
Figure 4Example of two SC-clusters that were joined together.
Both SC-clusters contain structures with a very similar transmembrane helix packing. (A) Left panel: Representative structure (PDB code: 2f2b, chain A) of SC-cluster CMSC0058. Right panel: Consensus helix architecture for SC-cluster CMSC0058. (B) Left panel: Representative structure (PDB code: 3 kly, chain A) of SC-cluster CMSC0180. Right panel: Consensus helix architecture for SC-cluster CMSC0180. Both structures contain six transmembrane helices (M1–M6) colored differently. The fifth helix of 3 kly_A is interrupted (M5a, M5b). Nodes correspond to transmembrane helices, edges represent helix interactions. Transmembrane helix coordinates were extracted from PDBTM [16].
Figure 5Occurrence of TMH classes among individual proteins, SC-clusters and HIS clusters.
A: Percentage of proteins with a certain number of TMHs. Percentage of SC-clusters (B) and HIS clusters (C) with a certain representative TMH number.