| Literature DB >> 17708758 |
Yungki Park1, Sikander Hayat, Volkhard Helms.
Abstract
BACKGROUND: Helical membrane proteins (HMPs) play a crucial role in diverse cellular processes, yet it still remains extremely difficult to determine their structures by experimental techniques. Given this situation, it is highly desirable to develop sequence-based computational methods for predicting structural characteristics of HMPs.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17708758 PMCID: PMC2000914 DOI: 10.1186/1471-2105-8-302
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Prediction accuracies of different methods examined in the study
| Prediction method | Prediction accuracy [%]1 |
| The BW method | 68.67 |
| TMX | 78.71 |
| The YU method | 71.062 |
1Defined as the fraction of the TM residues in the data set whose burial status was correctly predicted.
2Best prediction accuracy among 16 ones shown in Table 6.
Prediction accuracies obtained by linear regression with different window sizes
| Window size | Prediction accuracy |
| 1 | 74.51 |
| 3 | 73.55 |
| 5 | 73.96 |
| 7 | 74.82 |
| 9 | 75.69 |
| 11 | 75.37 |
| 13 | 75.81 |
| 15 | 75.97 |
| 17 | 75.46 |
| 19 | 75.75 |
| 21 | 75.59 |
Prediction accuracies obtained by increasing fractions of the 441 elements of a window of size 21
| Fraction used in the prediction | Linear regression | SVR C – 101 | SVR C – 1 | SVR C – 0.1 | SVR C – 0.01 |
| 0.052 | 75.65 | 70.36 | 73.45 | 75.21 | 74.89 |
| 0.1 | 76.61 | 71.06 | 75.88 | 75.97 | 74.57 |
| 0.15 | 76.90 | 70.78 | 75.11 | 76.20 | 74.09 |
| 0.2 | 77.21 | 70.65 | 75.14 | 75.78 | 73.58 |
| 0.25 | 76.45 | 70.01 | 75.59 | 75.78 | 73.04 |
| 0.3 | 76.04 | 71.13 | 75.11 | 75.24 | 72.56 |
| 0.35 | 75.72 | 71.54 | 74.79 | 75.27 | 71.54 |
| 0.4 | 75.91 | 72.69 | 74.76 | 75.11 | 71.80 |
| 0.45 | 75.91 | 72.72 | 75.11 | 74.95 | 71.86 |
| 0.5 | 76.13 | 72.82 | 75.33 | 75.11 | 71.67 |
| 0.55 | 76.39 | 72.63 | 75.43 | 75.43 | 72.08 |
| 0.6 | 76.04 | 72.69 | 75.43 | 75.14 | 71.61 |
| 0.65 | 75.33 | 72.94 | 75.24 | 75.30 | 70.40 |
| 0.7 | 74.86 | 73.01 | 74.98 | 74.92 | 70.24 |
| 0.75 | 75.33 | 73.20 | 75.43 | 74.86 | 69.31 |
| 0.8 | 75.75 | 72.75 | 75.75 | 74.44 | 68.48 |
| 0.85 | 75.62 | 72.59 | 75.75 | 74.22 | 67.97 |
| 0.9 | 75.24 | 72.34 | 75.14 | 74.00 | 67.65 |
| 0.95 | 75.21 | 72.79 | 75.46 | 74.22 | 67.53 |
| 1.0 | 75.59 | 73.26 | 75.97 | 74.16 | 67.85 |
1Regularization constant C was set to 10.
2Meaning that the top 5% of the 441 elements when ranked by the Fisher's index were input for the prediction.
Top 20 elements of the 441 ones of a window of size 21 according to the Fisher's index
| Rank | Position1 | Type | Fisher's index |
| 1 | T | conservation index | 0.987 |
| 2 | C4 | conservation index | 0.534 |
| 3 | N4 | conservation index | 0.469 |
| 4 | N3 | conservation index | 0.307 |
| 5 | N7 | conservation index | 0.306 |
| 6 | C3 | conservation index | 0.248 |
| 7 | T | L2 | 0.243 |
| 8 | C7 | conservation index | 0.240 |
| 9 | T | G | 0.203 |
| 10 | T | I | 0.143 |
| 11 | C8 | conservation index | 0.132 |
| 12 | C1 | conservation index | 0.092 |
| 13 | T | V | 0.059 |
| 14 | N1 | conservation index | 0.057 |
| 15 | N4 | I | 0.057 |
| 16 | N4 | G | 0.056 |
| 17 | T | F | 0.053 |
| 18 | T | S | 0.052 |
| 19 | N8 | conservation index | 0.045 |
| 20 | N4 | L | 0.041 |
1T: the target residue, C4: the 4th residue C terminal to the target residue, N4: the 4th residue N terminal to the target residue. Thus, the conservation index of the target residue is most indicative of its burial status, and the conservation index of the 4th residue C terminal to the target residue is second most indicative of the burial status of the target residue.
2L: leucine, G: glycine, I: isoleucine, V: valine, F: phenylalanine and S: serine.
Best prediction accuracies for each combination of an SVC kernel and a regularization constant C
| Regularization constant C | |||||
| Kernel | 1 | 0.5 | 0.1 | 0.05 | 0.01 |
| Linear | 78.62 | 78.71 | 78.65 | 78.62 | 78.01 |
| Radial | 78.55 | 78.71 | 78.23 | 78.30 | 77.25 |
Prediction accuracies obtained by SVR with different input vectors
| Window size | Profile | Conservation index | PSSM (the YU method) |
| C – 11 | |||
| 11 | 70.87 | 73.68 | 70.08 |
| 13 | 71.16 | 73.65 | 69.47 |
| 15 | 71.51 | 74.16 | 71.06 |
| 17 | 71.19 | 74.16 | 68.23 |
| 19 | 70.91 | 74.41 | 61.85 |
| 21 | 70.84 | 74.19 | 59.46 |
| C – 2 | |||
| 11 | 70.55 | 73.17 | 69.85 |
| 13 | 70.94 | 73.26 | 69.31 |
| 15 | 71.16 | 74.31 | 70.52 |
| 17 | 71.64 | 73.61 | 67.11 |
| 19 | 70.68 | 73.90 | 61.54 |
| 21 | 70.49 | 74.09 | 59.31 |
| C – 5 | |||
| 11 | 70.36 | 72.37 | 69.79 |
| 13 | 71.19 | 72.53 | 69.12 |
| 15 | 70.91 | 73.36 | 70.43 |
| 17 | 71.32 | 72.72 | 66.92 |
| 19 | 70.46 | 73.07 | 61.60 |
| 21 | 70.59 | 72.85 | 59.18 |
| C – 7 | |||
| 11 | 70.81 | 72.05 | 70.17 |
| 13 | 71.03 | 72.15 | 69.15 |
| 15 | 71.13 | 72.94 | 70.43 |
| 17 | 71.19 | 72.37 | 66.89 |
| 19 | 70.43 | 72.50 | 61.63 |
| 21 | 70.52 | 72.08 | 59.18 |
1Regularization constant C was set to 1.
Prediction accuracies for each amino acid
| Amino acid | Number of occurrence | Prediction accuracy [%] | Fraction of exposed residues in the data set [%] |
| A | 381 | 80.31 | 45.93 |
| C | 50 | 74.00 | 46.00 |
| D | 19 | 89.47 | 0.00 |
| E | 30 | 56.67 | 40.00 |
| F | 294 | 80.95 | 73.13 |
| G | 316 | 80.38 | 27.22 |
| H | 42 | 90.48 | 16.67 |
| I | 328 | 80.49 | 72.26 |
| K | 20 | 85.00 | 55.00 |
| L | 521 | 80.42 | 73.70 |
| M | 128 | 75.78 | 57.03 |
| N | 41 | 75.61 | 17.07 |
| P | 89 | 61.80 | 48.31 |
| Q | 26 | 76.92 | 26.92 |
| R | 15 | 100.00 | 6.67 |
| S | 153 | 71.24 | 39.22 |
| T | 169 | 73.37 | 48.52 |
| V | 365 | 79.45 | 65.48 |
| W | 74 | 81.08 | 70.27 |
| Y | 77 | 72.73 | 50.65 |
Specificity and sensitivity of TMX
| Observed | ||
| Predicted | Buried | Exposed |
| Buried | 978 (70.61%) | 267 (15.23%) |
| Exposed | 407 (29.39%) | 1486 (84.77%) |
| Sum | 1385 | 1753 |
Figure 1Prediction accuracy and coverage depending on confidence scores. When considering all predictions (i.e. predictions with a confidence score ≥ 0.00), the prediction accuracy is 78.71% and the coverage is 100%. When considering only the 1440 predictions with a confidence score ≥ 1.20, the prediction accuracy rises to 90.21% and the coverage falls down to 45.89%.
43 Protein chains used in the study
| PDB ID | Protein | Chains |
| 1. 1M0L | Bacteriorhodopsin | A |
| 2. 1GZM | Rhodopsin | A |
| 3. 1R3J | KcsA potassium channel | C |
| 4. 1J4N | Aquaporin | A |
| 5. 1LDF | Glycerol facilitator channel | A |
| 6. 1XQF | Ammonia channel | A |
| 7. 1OTS | H+/Cl- exchanger | A |
| 8. 2A65 | Leucine transporter | A |
| 9. 2CFQ | Lactose permease | A |
| 10. 1YEW | Methane monooxygenase | B, C |
| 11. 1SU4 | Calcium ATPase | A |
| 12. 2BL2 | Rotor of V-type Na+-ATPase | A |
| 13. 1DXR | Photosynthetic reaction center | L, M, H |
| 14. 1KF6 | Fumarate reductase ( | C, D |
| 15. 1QLA | Fumarate reductase ( | C |
| 16. 1KQF | Formate dehydrogenase N | B, C |
| 17. 1Q16 | Nitrate reductase A | C |
| 18. 1NEK | Succinate dehydrogenase | C, D |
| 19. 1ZOY | Complex II | C, D |
| 20. 1OKC | Mitochondrial ADP/ATP carrier | A |
| 21. 1V55 | Cytochrome C oxidase (aa3 type) | B, D, G, I, J, L, M |
| 22. 1EHK | Cytochrome C oxidase (ba3 type) | A, B |
| 23. 1PP9 | Cytochrome bc1 complex | D, E, G, J |
| 24. 2GIF | AcrB multidrug efflux transporter | A |
| 25. 2IC8 | GlpG rhomboid-family intramembrane protease | A |
| 26. 2NQ2 | Putative metal-chelate-type ABC transporter | A |