| Literature DB >> 23289014 |
Abstract
The increasing protein sequences from the genome project require theoretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1KQG is chosen as an example to describe the prediction process by this method. 80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs) and 80 sequences are divided into 13 groups according to their function and type. TMHs prediction is carried out for each group of membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%. Compared with the main prediction results of seven popular prediction methods, the obtained results indicate that the proposed method in this paper has higher prediction accuracy.Entities:
Keywords: Discrete wavelet transform; Hydrophobicity.; Membrane protein; Transmembrane helical segments
Mesh:
Substances:
Year: 2012 PMID: 23289014 PMCID: PMC3535531 DOI: 10.7150/ijbs.5371
Source DB: PubMed Journal: Int J Biol Sci ISSN: 1449-2288 Impact factor: 6.580
Figure 1Schematic drawing of a G protein-coupled receptor structure.
Membrane protein families used in our predictions.
| Family name | PDB code | |||||
|---|---|---|---|---|---|---|
| ABC transporters | 1jsq | 1l7vA | 1pf4 | |||
| Bacteriorhodopsin | 1ap9 | |||||
| Channel proteins | 1fqyA | 1fx8A | 1msl | 1mxm | 1oedA | 1oedB |
| 1oedC | 1oedE | 1p7b | 1rc2A | 1rhzA | 1rhzB | |
| Cytochrome bc1 complexes | 1bgyE | 1bgyJ | 1bgyK | |||
| Cytochrome b6f complexes | 1um3A | 1um3B | 1um3D | 1um3F | 1um3G | 1um3H |
| Cytochrome c oxidases | 1ehkA | 1ehkB | 1ehkC | 1occA | 1occB | 1occC |
| 1occD | 1occG | 1occI | 1occJ | 1occK | 1occL | |
| 1occM | 1qleA | 1qleB | 1qleC | 1qleD | ||
| Glycophorin | 1afoA | |||||
| Light-harvesting complexes | 1kzuA | 1lghA | ||||
| Photosynthetic reaction centers | 1eysH | 1eysL | 1eysM | 1prcH | 1prcL | 1prcM |
| 2rcrL | 2rcrM | |||||
| Photosystems | 1jboA | 1jboB | 1jboF | 1jboI | 1jboJ | 1jboK |
| 1jboL | 1jboM | |||||
| Respiratory proteins | 1a91C | 1fftA | 1fftB | 1fftC | 1fumC | 1kqgB |
| 1kqgC | 1lovD | 1nekC | 1nekD | 1okcA | 1q16C | |
| 1qlaC | ||||||
| Rhodopsins | 1f88 | 1h2sB | 1h68A | |||
| Translocation proteins | 1pw4A | 1s7b | 2cpb | |||
Five different hydrophobicity values.
| Amino acids | FPa | KDb | PPc | EId | JTTe |
|---|---|---|---|---|---|
| A | 0.62 | 1.80 | 0.324 | 0.62 | 0.595 |
| C | 0.29 | 2.50 | 0.184 | 0.29 | 0.205 |
| D | -1.05 | -3.50 | -1.877 | -0.90 | -1.276 |
| E | -0.87 | -3.50 | -2.033 | -0.74 | -1.291 |
| F | 1.19 | 2.80 | 0.804 | 1.19 | 1.467 |
| G | 0.48 | -0.40 | 0.147 | 0.48 | 0.065 |
| H | -0.40 | -3.20 | -0.930 | 0.30 | -0.387 |
| I | 1.38 | 4.50 | 0.734 | 1.38 | 1.888 |
| K | -1.35 | -3.90 | -2.230 | -1.50 | -1.245 |
| L | 1.06 | 3.80 | 0.612 | 1.06 | 1.234 |
| M | 0.64 | 1.90 | 0.407 | 0.64 | 0.626 |
| N | -0.85 | -3.50 | -0.944 | -0.78 | -0.870 |
| P | 0.12 | -1.60 | -0.516 | 0.12 | -0.746 |
| Q | -0.78 | -3.50 | -1.300 | -0.85 | -0.995 |
| R | -1.37 | -4.50 | -2.085 | -2.53 | -1.073 |
| S | -0.18 | -0.80 | -0.216 | -0.18 | -0.247 |
| T | -0.05 | -0.70 | -0.129 | -0.05 | -0.154 |
| V | 1.08 | 4.20 | 0.563 | 1.08 | 1.280 |
| W | 0.81 | -0.90 | 0.582 | 0.81 | 0.891 |
| Y | 0.26 | -1.30 | 0.073 | 0.26 | 0.034 |
aFauchere and Pliska 42 (1983). bKyte and Doolittle 14 (1982). cPasquier et al. 17 (1999). dEisenberg et al. 43 (1984). eBoyd et al. 44 (1998).
Prediction accuracy for each group of training set and test set of FP hydrophobic parameters.
| Set number | Qp % | FAAcor % | |
|---|---|---|---|
| Training set | Testing set | ||
| 1 | 96.7 (0.422) | 94.7 | 81.1 |
| 2 | 96.1 (0.422) | 93.8 | 81.6 |
| 3 | 96.3 (0.422) | 97.1 | 79.7 |
| 4 | 95.6 (0.433) | 95.4 | 82.9 |
| 5 | 95.1 (0.422) | 97.3 | 75.6 |
| 6 | 96.2 (0.422) | 95.1 | 77.6 |
Prediction accuracy for each group of training set and test set of KD hydrophobic parameters.
| Set number | Qp % | FAAcor % | |
|---|---|---|---|
| Training set | Testing set | ||
| 1 | 95.4 (0.888) | 93.0 | 85.3 |
| 2 | 95.6 (0.773) | 94.1 | 86.5 |
| 3 | 95.5 (0.836) | 95.8 | 81.5 |
| 4 | 95.9 (0.888) | 94.9 | 84.6 |
| 5 | 94.6 (0.836) | 96.7 | 82.3 |
| 6 | 95.5 (0.836) | 94.9 | 85.6 |
Prediction accuracy for each group of training set and test set of PP hydrophobic parameters.
| Set number | Qp % | FAAcor % | |
|---|---|---|---|
| Training set | Testing set | ||
| 1 | 96.2 (0.022) | 90.2 | 77.8 |
| 2 | 94.7 (-0.074) | 89.5 | 67.3 |
| 3 | 95.8 (0.017) | 90.8 | 81.0 |
| 4 | 94.5 (0.050) | 92.0 | 81.8 |
| 5 | 93.9 (0.081) | 93.2 | 73.9 |
| 6 | 93.4 (0.050) | 94.5 | 79.8 |
Prediction accuracy for each group of training set and test set of EI hydrophobic parameters.
| Set number | Qp % | FAAcor % | |
|---|---|---|---|
| Training set | Testing set | ||
| 1 | 94.0 (0.414) | 92.0 | 78.5 |
| 2 | 94.9 (0.384) | 92.8 | 76.2 |
| 3 | 95.9 (0.384) | 92.8 | 74.0 |
| 4 | 93.6 (0.436) | 93.2 | 81.3 |
| 5 | 92.8 (0.384) | 93.4 | 71.4 |
| 6 | 93.4 (0.413) | 91.9 | 76.6 |
Prediction accuracy for each group of training set and test set of JTT hydrophobic parameters.
| Set number | Qp % | FAAcor % | |
|---|---|---|---|
| Training set | Testing set | ||
| 1 | 95.4 (0.446) | 92.2 | 86.6 |
| 2 | 95.8 (0.411) | 94.0 | 83.5 |
| 3 | 95.8 (0.411) | 95.4 | 80.5 |
| 4 | 94.8 (0.409) | 94.6 | 82.8 |
| 5 | 94.6 (0.412) | 96.7 | 80.0 |
| 6 | 94.6 (0.409) | 93.0 | 81.1 |
Figure 2Prediction accuracy for test set of five kinds of hydrophobic parameters.
Figure 3The three-dimensional structure of protein 1KQG.
Figure 4Linear sequence of the 1KQG protein and the parts of bold-face denote the real TMHs.
Figure 5The hydrophobicity signal plot and low frequencies at five different scale levels for 1KQG protein. (a) j=0; (b) j=1; (c) j=2; (d) j=3; (e) j=4; (f) j=5.
Location of TMHs of the sequence of 1KQG (top row), WavePrd prediction and results from other currently used prediction methods.
| TM1 | TM2 | TM3 | TM4 | ||
|---|---|---|---|---|---|
| Observed | 15-37 | 51-74 | 112-134 | 146-175 | |
| WavePrd | 17-36 | 53-70 | 116-134 | 149-176 | |
| DAS | 18-39 | 57-75 | 90-92 | 118-136 | 149-175 |
| HMMTOP2.0 | 20-38 | 55-73 | 116-135 | 152-176 | |
| PHDhtm | 18-45 | 55-76 | 117-180 | ||
| PRED-TMR2 | 19-37 | 55-73 | 115-135 | 156-176 | |
| SOSUI | 18-40 | 55-77 | 115-137 | 150-172 | |
| TMAP | 14-42 | 51-78 | 112-134 | 148-172 | |
| TMHMM2.0 | 21-40 | 55-77 | 117-139 | 154-176 |
Prediction accuracy of TMHs in each protein family based on different thresholds.
| Family name | Prediction accuracy % | |||
|---|---|---|---|---|
| Qp a | FAAcora | Qp b | FAAcorb | |
| ABC transporters | 95.3 (0.836) | 74.8 | 100 (0.566) | 75.6 |
| Bacteriorhodopsin | 100 (0.836) | 70.7 | 100 (0.885) | 71.3 |
| Channel proteins | 91.4 (0.836) | 81.1 | 91.4 (0.847) | 81.3 |
| Cytochrome bc1 complexes | 86.6 (0.836) | 66.7 | 86.6 (0.765) | 68.9 |
| Cytochrome b6f complexes | 95.7 (0.836) | 82.5 | 95.7 (0.891) | 82.6 |
| Cytochrome c oxidases | 99.2 (0.836) | 93.7 | 99.2 (0.836) | 93.7 |
| Glycophorin | 100 (0.836) | 91.3 | 100 (0.668) | 92.0 |
| Light-harvesting complexes | 100 (0.836) | 93.9 | 100 (0.915) | 97.9 |
| Photosynthetic reaction centers | 98.4 (0.836) | 90.6 | 98.4 (0.866) | 91.2 |
| Photosystems | 97.0 (0.836) | 82.6 | 97.0 (0.836) | 82.6 |
| Respiratory proteins | 93.7 (0.836) | 91.6 | 93.7 (0.836) | 91.6 |
| Translocation proteins | 97.0 (0.836) | 88.1 | 97.0 (0.868) | 88.5 |
| Average | 96.5 | 83.6 | 96.8 | 84.4 |
aThe average prediction accuracy of every group of membrane proteins with the threshold 0.836. bWith the different threshold for every group of membrane proteins , the prediction accuracy will be raised.
Main results of eight prediction methods.
| Method | Nobs | Nprd | Ncor | Qp % | M | C | FP | FN | FAAcor % |
|---|---|---|---|---|---|---|---|---|---|
| WavePrd | 325 | 315 | 308 | 96.3 | 94.8 | 97.8 | 7 | 17 | 83.5 |
| DAS | 325 | 357 | 308 | 90.4 | 94.8 | 86.3 | 49 | 17 | 77.6 |
| HMMTOP2 | 325 | 321 | 308 | 95.4 | 94.8 | 96.0 | 13 | 17 | 84.3 |
| PHDhtm | 325 | 286 | 269 | 88.3 | 82.8 | 94.1 | 17 | 56 | 72.5 |
| PRED-TMR2 | 325 | 285 | 279 | 91.7 | 85.9 | 97.9 | 6 | 46 | 76.8 |
| SOSUI | 325 | 297 | 288 | 92.7 | 88.6 | 97.0 | 9 | 37 | 78.9 |
| TMAP | 325 | 299 | 291 | 93.4 | 89.5 | 97.3 | 8 | 34 | 81.7 |
| TMHMM2.0 | 325 | 307 | 299 | 94.7 | 92.0 | 97.4 | 8 | 26 | 84.6 |
Nobs, Nprd and Ncor are the number of observed, predicted and correctly predicted TMHs, respectively. Qp stands for prediction accuracy of TMHs, M and C stand for the measure indexes of sensitivity and specificity. FP and FN are the number of wrongly predicted TMHs and the number of not-predicted TMHs, respectively. FAAcor is the prediction accuracy of residues.