| Literature DB >> 21814556 |
Nazar Zaki1, Salah Bouktif, Sanja Lazarova-Molnar.
Abstract
UNLABELLED: Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. AVAILABILITY: The datasets, software together with supplementary materials are available at: http://faculty.uaeu.ac.ae/nzaki/TMHindex.htm.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21814556 PMCID: PMC3144211 DOI: 10.1371/journal.pone.0021821
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1TMHindex overview.
Illustration of the calculation of the averaged compositional index values .
|
| Amino Acid | AAC |
|
|
| 1 | A | 15.556 | −0.30841 | (15.556*(−0.30841)+8.889*(1.472438)+2.222*(1.473881))/3 = 4.160103797 |
| 2 | E | 8.889 | 1.472438 | (15.556*(−0.30841)+8.889*(1.472438)+2.222*(1.473881)+6.667*(0.137164))/4 = 3.120077848 |
| 3 | R | 2.222 | 1.473881 | (15.556*(−0.30841)+8.889*(1.472438)+2.222*(1.473881)+6.667*(0.137164)+8.889*(−0.53791))/5 = 1.53976588 |
| 4 | S | 6.667 | 0.137164 | (8.889*(1.472438)+2.222*(1.473881)+6.667*(0.137164)+8.889*(−0.53791)+6.667*(0.137164))/5 = 2.68218555 |
| 5 | L | 8.889 | −0.53791 | (2.222*(1.473881)+6.667*(0.137164)+8.889*(−0.53791)+6.667*(0.137164)+2.222*(−0.07568))/5 = 0.030853082 |
| : | : | : | : | : |
Figure 2Encoding protein sequence as a chromosome.
Figure 3The N and C scores.
Figure 4Sample protein 1OCC.
Figure 5TMH segment detection in protein 1OCC using the index .
Figure 6TMH segment detection in protein 1OCC using the compositional index .
Figure 7TMH segment detection in protein 1OCC using GA.
Performance comparison of various TMH predictors.
| Predictor |
|
| N-Score | C-Score | Correct TMHs |
| THUMBU | 85.5 | 47.1 |
|
| 316 |
| SOSUI | 89.1 | 57.1 |
|
| 334 |
| DAS-TMfilter | 90.7 | 64.3 |
|
| 341 |
| TOP-PRED | 92.6 | 60 |
|
| 352 |
| TMHMM | 91 | 65.7 |
|
| 343 |
| Phobious | 91.8 | 71.4 |
|
| 345 |
| MemBrain | 97.9 | 87.1 |
|
| 371 |
| TMHindex | 99.46 | 91.1 |
|
| 376 |
Figure 8Length distribution of the 378 known TMHs in the testing dataset compared to predicted TMHs using (a) TMHindex, (b) MemBrain, (C) THUMBU, (d) DAS-TMfilter and (e)Phobius methods.