| Literature DB >> 17483518 |
Lukas Käll1, Anders Krogh, Erik L L Sonnhammer.
Abstract
When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30-65% of all predicted signal peptides and 25-35% of all predicted transmembrane topologies overlap. This impairs predictions of 5-10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions. We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17483518 PMCID: PMC1933244 DOI: 10.1093/nar/gkm256
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The Phobius model. The model comprise submodels for signal peptides, transmembrane helices, cytoplasmic loops and two different submodels for non-cytoplasmic loops.
Overlapping signal peptide and transmembrane segments in whole proteome predictions by conventional predictors
| Overlap/TM predictions | Overlap/SP predictions | Overlap/All sequences | |
|---|---|---|---|
| Human (CCDS) | 30% (1113/3720) | 32% (1113/3485) | 7.6% (1113/14663) |
| 34% (2587/7694) | 41% (2587/6328) | 9.6% (2587/26032) | |
| 26% (377/1468) | 48% (377/787) | 5.6% (377/6680) | |
| 26% (271/1039) | 39% (271/698) | 6.4% (271/4243) | |
| 32% (358/1133) | 63% (358/565) | 8.7% (358/4105) |
The proteomes of five different species were annotated with SignalP-NN 3.0 and TMHMM 2.0. Predictions were counted as overlapping if a part of a potential signal peptide as predicted by SignalP also was predicted as a transmembrane helix by TMHMM. In such cases, at least one of the prediction methods is wrong. The overlapping predictions are expressed as fractions of all predicted transmembrane proteins, all signal peptide predictions and the number of sequences in the proteome. The SignalP-NN predictions were carried out using the optional 70 residue truncation and the correct organism group, and were counted as predicted signal peptides if the D-score was over threshold. The proteomes of Caenorhabditis elegans and cerevisiae and the consensus coding sequences of the human proteome were downloaded from Ensembl and the proteomes of Escherichia coli K12 and Bacillus subtilis were downloaded from NCBI's web site.
Figure 2.Output from the Phobius web server. An optional posterior probability plot is included in the prediction result.