| Literature DB >> 19390620 |
Ram Samudrala1, Fred Heffron, Jason E McDermott.
Abstract
The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates--effector proteins--are not. We have used a novel computational approach to confidently identify new secreted effectors by integrating protein sequence-based features, including evolutionary measures such as the pattern of homologs in a range of other organisms, G+C content, amino acid composition, and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from the plant pathogen Pseudomonas syringae and validated on a set of effectors from the animal pathogen Salmonella enterica serovar Typhimurium (S. Typhimurium) after eliminating effectors with detectable sequence similarity. We show that this approach can predict known secreted effectors with high specificity and sensitivity. Furthermore, by considering a large set of effectors from multiple organisms, we computationally identify a common putative secretion signal in the N-terminal 20 residues of secreted effectors. This signal can be used to discriminate 46 out of 68 total known effectors from both organisms, suggesting that it is a real, shared signal applicable to many type III secreted effectors. We use the method to make novel predictions of secreted effectors in S. Typhimurium, some of which have been experimentally validated. We also apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis, identifying the majority of known secreted proteins in addition to providing a number of novel predictions. This approach provides a new way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19390620 PMCID: PMC2668754 DOI: 10.1371/journal.ppat.1000375
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Known secreted effectors used for training SIEVE and their scores using the STM to STM and PSY to STM SIEVE models.
| ID | Name | Description | System | STMtoSTM | Rank | PSYtoSTM | Rank |
| STM1055 | gtgE | Gifsy-2 encoded effector | ??? | 4.06 |
| 2.20 |
|
| STM0972 | sopD-2 | homologous to secreted protein sopD | SPI-2 | 3.82 |
| 1.80 |
|
| STM1051 | sseI/srfH | Secretion system effector | SPI-2 | 3.46 |
| 1.81 |
|
| STM1583 | steA | putative cytoplasmic protein | both | 3.29 |
| 1.94 |
|
| STM1088 | pipB-1 | Pathogenicity island encoded protein: SPI5 | SPI-2 | 3.23 |
| 2.05 |
|
| STM1631 | sseJ | Salmonella translocated effector | SPI-2 | 3.11 |
| 1.79 |
|
| STM2945 | sopD-1 | secreted protein in the Sop family | SPI-1 | 3.09 |
| 1.27 |
|
| STM1602 | sifB | Salmonella translocated effector | SPI-2 | 3.06 |
| 1.97 |
|
| STM4157 | sseK-1 | putative cytoplasmic protein | SPI-2 | 3.05 |
| 1.53 |
|
| STM2137 | sseK-2 | putative cytoplasmic protein | SPI-2 | 3.01 |
| 2.00 |
|
| STM1224 | sifA | replication in macrophages | SPI-2 | 2.97 |
| 1.90 |
|
| STM2584 | gogB | Gifsy-1 prophage: leucine-rich repeat | both | 2.86 |
| 1.61 |
|
| STM2614 | gogA | Gifsy 1 encoded effector | ??? | 2.68 |
| 1.29 |
|
| STM2865 | avrA | putative inner membrane protein | SPI-1 | 2.66 |
| 1.89 |
|
| STM2884 | sipC | cell invasion protein | SPI-1 | 2.62 |
| 2.20 |
|
|
|
|
|
|
|
|
|
|
| STM1855 | sopE-2 | TypeIII-secreted protein effector | SPI-1 | 2.49 |
| 1.87 |
|
| STM2878 | sptP | protein tyrosine phosphate | both | 2.42 |
| 1.73 |
|
| STM1398 | sseB | Secretion system effector | SPI-2 | 2.38 |
| 2.06 |
|
|
|
|
|
|
|
|
|
|
| STM1026 | gtgA | Gifsy 2 encoded effector | ??? | 2.34 |
| 1.60 |
|
|
|
|
|
|
|
|
|
|
| STM1393 | ssaB/spiC | Secretion system apparatus | SPI-2 | 2.22 |
| 2.14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| STM1402 | sseE | Secretion system effector | SPI-2 | 1.91 |
| 1.20 |
|
| STM1401 | sseD | Secretion system effector | SPI-2 | 1.88 |
| 1.98 |
|
|
|
|
|
|
|
|
|
|
| STM2780 | pipB-2 | homologue of pipB | SPI-2 | 1.52 |
| 0.45 |
|
| STM2882 | sipA | cell invasion protein | SPI-1 | 1.40 |
| 2.12 |
|
| STM0800 | slrP | leucine-rich repeat protein | both | 1.37 |
| 1.90 |
|
| STM2885 | sipB | cell invasion protein | SPI-1 | 1.25 |
| 2.32 |
|
|
|
|
|
|
|
|
|
|
| STM2241 | sspH-2 | Leucine-rich repeat protein | SPI-2 | 1.11 |
| 1.16 |
|
The top 10 highest scores from the PSY to STM model are shown in bold.
Figure 1Accurate identification of type III secreted effectors using sequence data.
The sensitivity (TP/(TP+FN); solid lines) and specificity (TN/(FP+TN); dashed lines) of SIEVE on S. Typhimurium predictions (PSY to STM model; red) and P. syringae (STM to PSY model; blue) effectors were calculated as a function of a SIEVE score threshold (X axis). The results show that both models perform well providing a maximum sensitivity and specificity at about 90%. For example 33 of 36 known S. Typhimurium effectors are in the top 10% of predictions.
Figure 2Delineating the length of the type III secretion signal.
A. The performance of SIEVE on S. Typhimurium (PSY to STM model; red) and P. syringae (STM to PSY model; blue) was evaluated using the ROC area under the curve metric described in the text (Y axes). Models were trained using the indicated number of residues from the N-termini of the examples (X axis) and tested on the complete testing set (i.e. the entire set of positive and negative examples from the other organism). Maximum performance of both models was at approximately 30 residues (asterisks) suggesting that this might be the maximum length of a secretion signal. B. From the analysis in panel A we calculated the difference from the maximum ROC value (at 29 for the PSY to STM model and 32 for the STM to PSY model) for each length sequence and divided this by the standard error (difference from maximum, Y axis) for that sequence length (X axis). This shows the significance of each sequence length, with values below 2.0 (grey area) having insignificant differences (as judged using standard error). For S. Typhimurium effectors (PSY to STM model) the longest sequence length that is significantly different from the maximum value is 21 residues and for the P. syringae effectors (STM to PSY model) it is 16 residues. These lengths agree generally with previous estimates of secretion signal length.
Figure 3Identification of a shared sequence motif in type III secreted effectors.
We identified the features (sequence locations and residue types) with the greatest ability to classify S. Typhimurium and P. syringae secreted effectors (see text and Figure S4). The residue type with the highest positive weight is shown in bold for each position, followed by the other residue types that were also found to be significant. Amino acids with a negative weight are also shown. Positions with an “x” have no representation in the minimal set. Grey background indicates sequence positions where both models agree (for at least one amino acid type). It is important to note that this does not represent a consensus sequence, since there is very little similarity between individual effector signals (see Table S4). Rather it shows those sequence positions and amino acid types that SIEVE found particularly helpful in discriminating between the secreted effectors and negative examples.
High confidence secreted effector predictions in S. Typhimurium.
| ID | Name | Description | Score | Confidence | Reference |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| STM1417 | ssaP | Secretion system apparatus | 2.97 | 100% |
|
| STM2897 | invE | invasion protein | 2.60 | 85% |
|
| PSLT073 | traM | conjugative transfer: mating signal | 2.38 | 70% | |
| PSLT075 | traJ | conjugative transfer: regulation | 2.43 | 75% | |
| PSLT102 | traS | conjugative transfer: surface exclusion | 2.21 | 60% | |
| STM2087 | rfbV | LPS side chain defect: abequosyltransferase | 2.60 | 85% |
|
| STM2088 | rfbX | LPS side chain defect: putative O-antigen transferase | 2.38 | 70% |
|
| STM1332 | rfc | O-antigen polymerase | 2.70 | 90% | |
| STM2112 | wcaD | putative colanic acid polymerase | 2.21 | 60% | |
|
|
|
|
| 50% |
|
| STM1867 | pagK | PhoPQ-activated gene | 2.47 | 75% |
|
| STM1240 | envF | putative envelope lipoprotein | 2.41 | 75% |
|
| STM2866 | sprB | transcriptional regulator | 2.16 | 55% | |
| STM1087 | pipA | Pathogenicity island encoded protein: SPI3 | 2.30 | 70% | |
| STM1381 | orf245 | putative cytoplasmic protein | 2.32 | 70% |
|
| STM1896 | putative cytoplasmic protein | 2.19 | 55% |
| |
| STM2761 | putative inner membrane protein | 3.00 | 100% |
| |
|
|
|
|
|
|
|
| STM4302 | putative cytoplasmic protein | 2.41 | 75% |
| |
| STM4316 | putative cytoplasmic protein | 2.34 | 70% |
| |
| STM1228 | putative periplasmic protein | 2.33 | 70% |
| |
| STM3026 | putative outer membrane protein | 2.14 | 55% |
| |
| STM2138 | putative cytoplasmic protein | 2.71 | 90% |
| |
| STM2585A | Gifsy-1 prophage: similar to transpose | 2.58 | 85% |
| |
| STM4257 | putative inner membrane or exported | 2.49 | 80% |
| |
| STM0284 | putative shiga-like toxin A subunit | 2.44 | 75% |
| |
| STM2225 | putative inner membrane protein | 2.26 | 65% |
| |
| STM1868A | putative protein | 2.19 | 55% |
| |
| STM1554 | putative coiled-coil protein | 2.59 | 85% |
| |
| STM0100 | putative cytoplasmic protein | 2.54 | 80% |
| |
| STM4155 | putative inner membrane protein | 2.39 | 70% |
| |
| STM2208 | putative inner membrane protein | 2.35 | 75% |
| |
| STM3052 | putative outer membrane protein | 2.28 | 65% |
|
confidence based on the “generous” estimate in Figure S3.
references for secretion or involvement in virulence.
proteins experimentally determined to be secreted.
L. Crosa and F.H., unpublished results.
not secreted by a type III secretion system.
Predicted secreted effectors in C. trachomatis.
| ID | Name | Description | Score | Confidence | Reference |
| CT006 | - | hypothetical protein | 1.32 | 20% |
|
| CT007 | - | hypothetical protein | 1.41 | 25% | |
| CT011 | - | hypothetical protein | 1.63 | 40% | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| CT060 | flhA | Flagellar Secretion Protein | 1.59 | 30% | |
| CT080 | ltuB | hypothetical protein | 1.90 | 50% | |
| CT082 | - | hypothetical protein | 2.57 | 80% | |
| CT087 | malQ | 4-alpha glucanotransferase | 2.10 | 50% | |
| CT088 | sycE | Secretion Chaperone | 1.93 | 50% | |
|
|
|
|
|
|
|
| CT101 | - | hypothetical protein | 1.40 | 25% |
|
| CT105 | - | hypothetical protein | 3.43 | 100% | |
|
|
|
|
|
|
|
| CT142 | - | hypothetical protein | 1.36 | 20% | |
|
|
|
|
|
|
|
| CT148 | mhpA | Monooxygenase | 1.41 | 25% | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| CT164 | - | hypothetical protein | 2.18 | 60% |
|
| CT165 | - | hypothetical protein | 1.67 | 40% | |
| CT166 | - | hypothetical protein | 1.91 | 50% | |
| CT174 | - | hypothetical protein | 1.30 | 20% | |
| CT181 | - | hypothetical protein | 2.40 | 70% | |
| CT196 | - | hypothetical protein | 1.55 | 30% |
|
| CT198 | oppA_3 | Oligopeptide Binding Protein | 1.33 | 20% | |
| CT205 | pfkA_1 | Fructose-6-P Phosphotransferase | 1.44 | 25% | |
| CT214 | - | hypothetical protein | 1.44 | 25% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| CT262 | - | hypothetical protein | 1.37 | 20% | |
| CT273 | - | hypothetical protein | 1.33 | 20% | |
| CT288 | - | hypothetical protein | 1.38 | 20% | |
| CT309 | - | hypothetical protein | 1.35 | 20% | |
| CT311 | - | hypothetical protein | 1.63 | 40% | |
| CT326 | - | hypothetical protein | 2.46 | 75% | |
| CT344 | lon | Lon ATP-dependent protease | 1.72 | 40% | |
| CT345 | - | hypothetical protein | 1.81 | 40% |
|
|
|
|
|
|
|
|
| CT365 | - | hypothetical protein | 2.44 | 75% |
|
| CT384 | - | hypothetical protein | 1.78 | 40% | |
| CT391 | - | hypothetical protein | 1.31 | 20% | |
| CT392 | yprS | hypothetical protein | 2.36 | 70% | |
| CT412 | pmpA | Putative outer memb. protein A | 1.97 | 50% | |
|
|
|
|
|
|
|
| CT449 | - | hypothetical protein | 1.78 | 40% |
|
|
|
|
|
|
|
|
| CT461 | yaeI | Phosphohydrolase | 1.69 | 40% | |
| CT483 | - | hypothetical protein | 2.12 | 55% |
|
|
|
|
|
|
|
|
| CT552 | - | hypothetical protein | 1.40 | 25% | |
| CT559 | yscJ | Yop proteins translocation | 1.35 | 20% | |
|
|
|
|
|
|
|
| CT583 | gp6D | CHLTR Plasmid Paralog | 1.81 | 40% | |
| CT616 | - | hypothetical protein | 2.19 | 55% | |
| CT620 | - | hypothetical protein | 1.97 | 50% | |
| CT622 | - | CHLPN 76 kDa Homolog | 1.92 | 50% | |
| CT623 | - | CHLPN 76 kDa Homolog | 1.69 | 40% | |
|
|
|
|
|
|
|
| CT664 | - | adenylate cyclase-like protein | 1.35 | 20% | |
| CT668 | - | hypothetical protein | 1.37 | 20% | |
|
|
|
|
|
|
|
| CT672 | fliN | Flagellar Motor Switch | 1.74 | 40% | |
| CT694 | - | hypothetical protein | 3.52 | 100% | |
| CT695 | - | hypothetical protein | 2.28 | 65% | |
| CT696 | - | hypothetical protein | 1.38 | 20% | |
| CT711 | - | hypothetical protein | 1.98 | 50% | |
|
|
|
|
|
|
|
| CT728 | - | hypothetical protein | 1.36 | 20% |
|
| CT736 | ybcL | hypothetical protein | 2.75 | 90% | |
| CT794.1 | - | hypothetical protein | 1.50 | 30% | |
| CT795 | - | hypothetical protein | 1.96 | 50% | |
| CT809 | - | hypothetical protein | 1.79 | 40% | |
|
|
|
|
|
|
|
| CT849 | - | hypothetical protein | 1.93 | 50% | |
| CT853 | - | hypothetical protein | 1.32 | 20% | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| CT867 | - | Membrane Thiol Protease | 3.02 | 100% | |
| CT868 | - | Membrane Thiol Protease | 4.17 | 100% | |
| CT870 | pmpF | Putative Outer Membrane Protein | 1.65 | 40% | |
| CT872 | pmpH | Putative Outer Membrane Protein | 1.33 | 20% |
confidence based on the “generous” estimate in Figure S3.
Bold type indicates that the protein has been shown to be secreted in one of several experimental systems.