| Literature DB >> 25969446 |
Konstantinos D Tsirigos1, Christoph Peters1, Nanjiang Shu2, Lukas Käll1, Arne Elofsson3.
Abstract
TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25969446 PMCID: PMC4489233 DOI: 10.1093/nar/gkv485
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The TOPCONS workflow: four of the topology predictors (OCTOPUS, PolyPhobius, SPOCTOPUS and SCAMPI) use an MSA-derived sequence profile as input, whereas the fifth method (Philius) only requires the protein sequence. The topology predictions are used to construct a topology profile, which is fed into the TOPCONS Hidden Markov Model and the final consensus topology is created.
Figure 2.Distribution of time (in seconds) required for processing the proteins in all data sets we used in the benchmark. The increase in speed is substantial, since almost 80% of all proteins in total took less than 30 seconds.
Figure 3.Comparison of the topology prediction accuracy of the current TOPCONS implementation versus other topology prediction methods. Note that the performance drops for all predictors that predict both signal peptides and TM regions as opposed to methods specifically designed to predict the topology of membrane proteins.
Performance of several topology prediction methods, appropriate for whole-proteome scanning, along with the current TOPCONS implementation
| Method | MSA | TM | SP+TM | Globular | Globular+SP | Overall |
|---|---|---|---|---|---|---|
| TOPCONS | + | 80% | 80% | 97% | 91% | 87% |
| MEMSAT-SVM | + | 67% | 52% | 88% | 0.0% | 52% |
| Philius | − | 70% | 75% | 94% | 94% | 83% |
| Phobius | − | 55% | 83% | 95% | 94% | 82% |
| PolyPhobius | + | 68% | 64% | 95% | 85% | 78% |
| SPOCTOPUS | + | 71% | 78% | 78% | 79% | 76% |
For the TM-set, the correct topology should have the correct number of TM regions at approximately correct locations (overlap of at least five residues) and the correct location of the N and C-termini; for the SP+TM-set we also require the prediction of a signal peptide in the N-terminal of the protein sequence; for the Globular-set we require that no membrane regions and no signal peptides are predicted in order for a prediction to be considered as correct; finally, for the Globular+SP set, the predictor should only predict the presence of a signal peptide in the sequence.
Confusion matrix for all type of errors that TOPCONS makes
| Data set | Correct prediction | Wrong topology | TM → SP or SP → TM | TM → non-TM or non-TM → TM | non-TM → SP or SP → non-TM |
|---|---|---|---|---|---|
| TM | 80% | 16% | 2.6% | 0.9% | -- |
| SP+TM | 80% | 7.0% | 13% | -- | 0.0% |
| Globular+SP | 91% | -- | 7.2% | -- | 1.8% |
| Globular | 97% | -- | -- | 1.5% | 1.5% |
Correct prediction: requires that both the classification and the topology of the given protein are correct; Wrong topology: the classification is correct but the overall topology is not (e.g. extra predicted TM helices in non-membrane regions); TM → SP or SP → TM: the N-terminal TM helix is wrongly assigned as a signal peptide or vice versa; TM → non-TM: a TM protein is classified as non-TM protein or vice versa; SP → non-TM: a protein with a signal peptide or a protein with a signal peptide and transmembrane region(s) is classified as non-TM protein or vice versa.
Confusion matrix for classification of proteins in each of the data sets using the TOPCONS algorithm
| Data set | TM | SP+TM | Globular+SP | Globular |
|---|---|---|---|---|
| TM | 95% | 3.0% | 1.0% | 1.0% |
| SP+TM | 12% | 86% | 2.0% | 0.0% |
| Globular+SP | 1.0% | 6.0% | 91% | 2.0% |
| Globular | 1.0% | 0.0% | 2.0% | 97% |
Each row shows the number of proteins in one class that is categorized to each of the four classes (transmembrane, signal peptide and transmembrane, only signal peptide and globular). It can be seen that the vast majority of wrong classifications are between transmembrane regions and signal peptides.
Figure 4.Example output from the TOPCONS web server, based on the Bacteriorhodopsin sequence from Halobacterium sp. (UniProt-ID: BACR_HALS4). Topology predicted by TOPCONS, the individual methods and predicted ΔG values across the sequence.