| Literature DB >> 19389770 |
Nils Anders Leversen1, Gustavo A de Souza1, Hiwa Målen1, Swati Prasad2, Inge Jonassen2, Harald G Wiker3,1.
Abstract
Secreted proteins play an important part in the pathogenicity of Mycobacterium tuberculosis, and are the primary source of vaccine and diagnostic candidates. A majority of these proteins are exported via the signal peptidase I-dependent pathway, and have a signal peptide that is cleaved off during the secretion process. Sequence similarities within signal peptides have spurred the development of several algorithms for predicting their presence as well as the respective cleavage sites. For proteins exported via this pathway, algorithms exist for eukaryotes, and for Gram-negative and Gram-positive bacteria. However, the unique structure of the mycobacterial membrane raises the question of whether the existing algorithms are suitable for predicting signal peptides within mycobacterial proteins. In this work, we have evaluated the performance of nine signal peptide prediction algorithms on a positive validation set, consisting of 57 proteins with a verified signal peptide and cleavage site, and a negative set, consisting of 61 proteins that have an N-terminal sequence that confirms the annotated translational start site. We found the hidden Markov model of SignalP v3.0 to be the best-performing algorithm for predicting the presence of a signal peptide in mycobacterial proteins. It predicted no false positives or false negatives, and predicted a correct cleavage site for 45 of the 57 proteins in the positive set. Based on these results, we used the hidden Markov model of SignalP v3.0 to analyse the 10 available annotated proteomes of mycobacterial species, including annotations of M. tuberculosis H37Rv from the Wellcome Trust Sanger Institute and the J. Craig Venter Institute (JCVI). When excluding proteins with transmembrane regions among the proteins predicted to harbour a signal peptide, we found between 7.8 and 10.5% of the proteins in the proteomes to be putative secreted proteins. Interestingly, we observed a consistent difference in the percentage of predicted proteins between the Sanger Institute and JCVI. We have determined the most valuable algorithm for predicting signal peptidase I-processed proteins of M. tuberculosis, and used this algorithm to estimate the number of mycobacterial proteins with the potential to be exported via this pathway.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19389770 PMCID: PMC2885676 DOI: 10.1099/mic.0.025270-0
Source DB: PubMed Journal: Microbiology (Reading) ISSN: 1350-0872 Impact factor: 2.777
Signal peptide prediction by various algorithms on a positive and negative validation set
| SignalP v3.0 hidden Markov model | 57 (100.0 %) | 61 (100.0 %) | 45 (78.9 %) |
| SignalP v2.0 hidden Markov model | 57 (100.0 %) | 61 (100.0 %) | 39 (68.4 %) |
| Signal-3L | 57 (100.0 %) | 52 (85.2 %) | 39 (68.4 %) |
| Signal-CF | 57 (100.0 %) | 52 (85.2 %) | 36 (63.2 %) |
| PrediSi | 54 (94.7 %) | 59 (96.7 %) | 32 (56.1 %) |
| SignalP v2.0 neural network | 57 (100.0 %) | 61 (100.0 %) | 32 (56.1 %) |
| SignalP v3.0 neural network | 53 (93.0 %) | 61 (100.0 %) | 29 (50.9 %) |
| SPEPLip | 56 (98.2 %) | 61 (100.0 %) | 21 (36.8 %) |
| SIGCLEAVE | 55 (96.5 %) | 14 (33.0 %) | 21 (36.8 %) |
Fig. 1.Scatter diagrams of signal peptide probability scores: the negative set is displayed on the left-hand half of the x axis, and the positive set on the right-hand half. The values from each set are sorted in increasing order by their acquired probability scores. Ideally, the scores should appear in separate ranges without overlap. The asterisk shows that the SPEPLip program does not output scores for proteins with a negative signal peptide prediction, and these proteins have therefore been given a minimum score (0), in order to plot them on the diagram.
Signal peptide prediction by SignalP v3.0 hidden Markov model for various mycobacterial proteome annotations
| 3991 | 519 (13.0 %) | 384 (9.6 %) | |
| 4219 | 464 (11.0 %) | 333 (7.9 %) | |
| 3991 | 526 (13.2 %) | 386 (9.7 %) | |
| 4189 | 497 (11.9 %) | 363 (8.7 %) | |
| 3920 | 515 (13.1 %) | 379 (9.7 %) | |
| 3891 | 509 (13.1 %) | 367 (9.4 %) | |
| 5245 | 581 (11.1 %) | 413 (7.9 %) | |
| 4350 | 532 (12.2 %) | 378 (8.7 %) | |
| 5462 | 729 (13.3 %) | 542 (9.9 %) | |
| 1605 | 174 (10.8 %) | 168 (10.5 %) | |
| 6880 | 807 (11.7 %) | 535 (7.8 %) |