| Literature DB >> 16597327 |
Pantelis G Bagos1, Theodore D Liakopoulos, Stavros J Hamodrakas.
Abstract
BACKGROUND: Hidden Markov Models (HMMs) have been extensively used in computational molecular biology, for modelling protein and nucleic acid sequences. In many applications, such as transmembrane protein topology prediction, the incorporation of limited amount of information regarding the topology, arising from biochemical experiments, has been proved a very useful strategy that increased remarkably the performance of even the top-scoring methods. However, no clear and formal explanation of the algorithms that retains the probabilistic interpretation of the models has been presented so far in the literature.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16597327 PMCID: PMC1523218 DOI: 10.1186/1471-2105-7-189
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results obtained from the various predictors, on a dataset of 72 transmembrane proteins [38]. Results obtained when the methods were not trained and tested on the same dataset, however some of the proteins in the dataset were present in the datasets used for training the other methods. The results of HMM-TM were obtained through a nine-fold cross validation procedure. The methods that allow the incorporation of experimental information are listed separately. The results of UMDHMMTMHP could not be obtained by cross-validation (since it was trained on the same dataset), and thus are listed separately in the text
| TMHMM | 0.902 | 0.762 | 0.931 | 58/72 (80.6%) | 49/72 (68.1%) |
| HMMTOP | 0.890 | 0.735 | 0.932 | 58/72 (80.6%) | 49/72 (68.1%) |
| Phobius † | 0.911 | 0.785 | 0.954 | 65/72 (90.3%) | 52/72 (72.2%) |
| MEMSAT | 0.905 | 0.767 | 0.954 | 63/72 (87.5%) | 48/72 (66.7%) |
| S-TMHMM † | 0.897 | 0.747 | 0.925 | 59/72 (81.9%) | 52/72 (72.2%) |
| PRO-TMHMM* † | 0.910 | 0.779 | 0.945 | 65/72 (90.3%) | 63/72 (87.5%) |
| PRODIV-TMHMM* † | 0.914 | 0.794 | 0.970 | 67/72 (93.1%) | 64/72 (87.5%) |
* The methods using evolutionary information are denoted with an asterisk.
† These predictors were trained on sets containing sequences similar to the ones included in the training set we used here
Results of the independent test on a dataset of 26 transmembrane proteins with known three-dimensional structures. The proteins were chosen not to have significant sequence identity (<30%) with the proteins used to train the methods: HMM-TM, UMDHMMTMHP, TMHMM and HMMTOP. The methods that allow the incorporation of experimental information are listed separately
| TMHMM | 0.899 | 0.782 | 0.956 | 19/26 (73.08%) | 17/26 (65.38%) |
| HMMTOP | 0.881 | 0.744 | 0.925 | 19/26 (73.08%) | 18/26 (69.23%) |
| Phobius † | 0.894 | 0.773 | 0.907 | 15/26 (57.69%) | 13/26 (50%) |
| MEMSAT | 0.890 | 0.762 | 0.928 | 16/26 (61.54%) | 13/26 (50%) |
| UMDHMMTMHP | 0.896 | 0.777 | 0.947 | 23/26 (88.46%) | 22/26 (84.61%) |
| S-TMHMM † | 0.899 | 0.781 | 0.957 | 21/26 (80.77%) | 20/26 (76.92%) |
| PRO-TMHMM*† | 0.870 | 0.718 | 0.916 | 16/26 (61.54%) | 15/26 (57.69%) |
| PRODIV-TMHMM*† | 0.897 | 0.778 | 0.946 | 19/26 (73.08%) | 19/26 (73.08%) |
* The methods using evolutionary information are denoted with an asterisk.
† These predictors were trained on sets containing sequences similar to the ones included in the test set.
Figure 1Posterior probability plots and predicted transmembrane segments for a protein whose localisation of the C-terminal was missed by HMM-TM (YDGG_ECOLI). In the upper graph we can see the unconstrained prediction. In the lower part, we can see the conditional prediction, after incorporating the information concerning the experimentally verified localisation of the C-terminus. The red bars indicate the predicted transmembrane segments, and we observe that these change also, coming in agreement with the other predictors.
Figure 2Posterior probability plots and predicted transmembrane segments for the multidrug efflux transporter AcrB, a protein with known 3-dimensional structure (PDB code: 1IWG). In the upper graph we can see the unconstrained prediction. The red bars indicate the predicted transmembrane segments whereas the black bars, the observed segments. There are two missed transmembrane helices and a falsely predicted one. In the lower part, we can see the constrained prediction, after incorporating the experimental information derived from cysteine-scanning mutagenesis experiments [46]. Green arrows indicate the experimentally verified localisation of a residue in the cytoplasm, whereas blue ones indicate the experimentally verified localisation to the extracellular (periplasmic) space. We observe a remarkable agreement of the constrained prediction with the known structure.
Figure 3A representation of the matrix produced by the forward algorithm modified to incorporate some prior information. We have a (hypothetical) model, which consists of 12 states, with 3 labels I, M, O corresponding respectively to states modelling the intracellular, transmembrane and extracellular parts of the sequence. The likelihood of sequence x (8 residues), is calculated incorporating the prior information that residues 3 and 4 are transmembrane, residue 1 is extracellular and residue 8 is intracellular.
Figure 4A schematic representation of the model's architecture. The model consists of three sub-models denoted by the labels: Cytoplasmic loop, Transmembrane Helix and Extracellular loop. Within each sub-model, states with the same shape, size and colour are sharing the same emission probabilities (parameter tying). Allowed transitions are indicated with arrows.
The independent test set of 26 transmembrane proteins with known three-dimensional structures. We list the PDB code, the name of the protein and the number of the transmembrane segments
| AQUAPORIN Z | 6 | |
| AQUAPORIN 1 | 6 | |
| MULTIDRUG EFFLUX TRANSPORTER ACRB | 12 | |
| SUCCINATE DEHYDROGENASE CYTOCHROME B-556 SUBUNIT | 3 | |
| SUCCINATE_DEHYDROGENASE HYDROPHOBIC MEMBRANE ANCHOR PROTEIN | 3 | |
| RESPIRATORY NITRATE REDUCTASE 1 GAMMA CHAIN | 5 | |
| PREPROTEIN TRANSLOCASE SECE SUBUNIT | 1 | |
| PREPROTEIN TRANSLOCASE SECBETA SUBUNIT | 1 | |
| PHOTOSYSTEM II SUBUNIT PSBA | 5 | |
| PHOTOSYSTEM II SUBUNIT PSBC | 5 | |
| ROTOR OF F-TYPE NA+-ATPASE | 2 | |
| ROTOR OF V-TYPE NA+-ATPASE | 4 | |
| PROBABLE AMMONIUM TRANSPORTER | 11 | |
| H+/CL- EXCHANGE TRANSPORTER | 14 | |
| PHOTOSYSTEM II CORE LIGHT HARVESTING PROTEIN | 6 | |
| PHOTOSYSTEM II CP43 PROTEIN | 6 | |
| PHOTOSYSTEM II REACTION CENTER D2 PROTEIN | 5 | |
| CYTOCHROME B559 ALPHA SUBUNIT | 1 | |
| GLPF GLYCEROL FACILITATOR CHANNEL | 6 | |
| MTHK POTTASIUM CHANNEL, CA-GATED | 2 | |
| MECHANOSENSITIVE CHANNEL PROTEIN | 3 | |
| CYTOCHROME B6F COMPLEX SUBUNIT PETM | 1 | |
| CYTOCHROME F | 1 | |
| POTASSIUM CHANNEL KCSA | 2 | |
| NA(+)/H(+) ANTIPORTER 1 | 12 | |
| POTASSIUM VOLTAGE-GATED CHANNEL SUBFAMILY A MEMBER 2 | 6 |