| Literature DB >> 24447430 |
Yejun Wang1, Xiaowei Wei, Hongxia Bao, Shu-Lin Liu.
Abstract
BACKGROUND: Many bacteria can deliver pathogenic proteins (effectors) through type IV secretion systems (T4SSs) to eukaryotic cytoplasm, causing host diseases. The inherent property, such as sequence diversity and global scattering throughout the whole genome, makes it a big challenge to effectively identify the full set of T4SS effectors. Therefore, an effective inter-species T4SS effector prediction tool is urgently needed to help discover new effectors in a variety of bacterial species, especially those with few known effectors, e.g., Helicobacter pylori.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24447430 PMCID: PMC3915618 DOI: 10.1186/1471-2164-15-50
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Sequence-based Aac difference between T4S and control proteins for C-terminal 100-aa positions. (A) Single-residue composition difference. The different amino acids were listed along the horizontal axis while the length of bars represented the frequency of the corresponding amino acid. T4S and non-T4S proteins were represented in black and gray, respectively. Amino acid with significant different compositions between effectors and non-effectors were indicated with a star above the bar (Bonferroni-corrected Student’s t test and binomial test, p < 0.05). The logarithm of amino acid frequency ratio was also shown, with red representing preference and black representing depletion in effectors. (B) Continual and spanned bi-residues with statistically significant composition difference between effectors and non-effectors (Bonferroni-corrected Student’s t test and binomial test, p < 0.05). ‘Px’ represented ‘Position x’. ‘X’ represented any type of amino acid. The amino acid at the last position was in red if the corresponding bi-residue was preferred and in black if depleted in T4S sequences. (C) Distribution of motifs in T4S and non-T4S proteins.
Figure 2Position-specific Aac profiles of T4S and control proteins for C-terminal 50 positions. The horizontal axis indicates the C-terminal position number. (A) and (B) represent T4S proteins and control proteins, respectively.
Figure 3Distribution of amino acids with significant different position-specific composition. (A) and (B) show the distribution of significantly preferred or unfavorable amino acids in T4S proteins, respectively. (C) and (D) show the distribution of amino acid with significantly different composition between T4S and control proteins. (A) and (C) compare the numbers of significantly different amino acids at each position. (B) and (D) showed the times of each type of amino acid exhibiting significant difference.
Performance of different models classifying T4S effectors and non-effectors
| Seq_Aac | SVM | 50.57 vs. 93.86 | 79.43 | 0.8212 | 0.5146 |
| Seq_bAac | SVM | 44.57 vs. 96.29 | 79.05 | 0.8311 | 0.5088 |
| Seq_Aac, bAac | SVM | 46.00 vs. 96.14 | 79.43 | 0.8343 | 0.5182 |
| Seq_Sig | SVM | 50.57 vs. 93.86 | 79.43 | 0.8500 | 0.5146 |
| Motif | - | 50.43 vs. 88.18 | 75.60 | - | 0.4222 |
| Seq_Aac, Sse, Acc | SVM | 69.71 vs. 91.14 | 84.00 | 0.8742 | 0.6313 |
| Pos_Aac_SPB | SVM | 61.71 vs. 92.14 | 82.00 | 0.8538 | 0.5802 |
| Pos_Aac _SPB + Seq_Aac | SVM | 78.86 vs. 93.29 | 88.48 | 0.9362 | 0.7369 |
| Pos_Aac_BPB | BPB-SVM | 79.14 vs. 94.43 | 89.33 | 0.9559 | 0.7561 |
| Pos_Aac, Sse, Acc | BPB-SVM | 89.14 vs. 97.14 | 94.57 | 0.9883 | 0.8770 |
Note: The RBF kernel function was used for all the models except ‘Motif’. The performance was evaluated according to 5-fold cross validation results.
Figure 4Performance ROCs of different T4S effector prediction models. (A) Comparison of ‘Pos_Aac_SPB’, ‘Seq_Aac’, and ‘Pos_Aac_SPB + Seq_Aac’ models. ‘Pos_Aac_SPB’ only extracted the features of positive dataset. ‘Seq_Aac’ only learned sequence-based single-residue composition features. ‘Pos_Aac_SPB + Seq_Aac’ combined the features of ‘Pos_Aac_SPB’ and ‘Seq_Aac’. (B) Comparison of ‘Pos_Aac_SPB’, ‘Pos_Aac_BPB’, ‘Pos_Aac_SPB + Seq_Aac’ and ‘Pos_Aac,Sse,Acc’ models. ‘Pos_Aac_BPB’ model extracted the Aac features of both positive and negative datasets, while ‘Pos_Aac,Sse,Acc’ learned the joint position-specific Aac, Sse and Acc features. All comparisons were performed with a 5-fold cross-validation strategy.
Figure 5Inter-species/group prediction of T4S effectors by three computational models with a Leave-One genus-Out strategy. (A) Recall of known effectors in each species or group. Agr, Ana, Bar, Bor, Bru, Cox, Ehr, Hel, Leg and Och represented Agrobacterium, Anaplasma, Bartonella, Bordetella, Brucella, Coxiella, Ehrlichia, Helicobacter, Legionella, and Ochrobactrum respectively. Type A and B represented the two types of T4SSs. (B) Prediction specificity of different models in each species or group.
T4S effectors predicted from
| √ | √ | √ | ||
| √ | √ | √ | ||
| √ | √ | √ | ||
| √ | √ | | ||
| √ | √ | √ | ||
| gi|15645728|ref|NP_207905.1| | Excinuclease ABC subunit B | √ | √ | |
| √ | √ | | ||
| √ | √ | √ | ||
| gi|15644995|ref|NP_207165.1| | Hypothetical protein HP0367 | √ | √ | |
| √ | √ | | ||
| gi|15645618|ref|NP_207794.1| | Hypothetical protein HP1003 | √ | √ | √ |
| gi|15645567|ref|NP_207743.1| | Putative recombination protein RecO | √ | | √ |
| √ | √ | | ||
| √ | √ | | ||
| √ | √ | √ | ||
| gi|15645647|ref|NP_207823.1| | Hypothetical protein HP1033 | √ | √ | |
| gi|15644672|ref|NP_206842.1| | Hypothetical protein HP0041 | √ | √ | √ |
| √ | √ | √ | ||
| gi|15646203|ref|NP_208145.1| | Hypothetical protein HP1353 | √ | √ | |
| gi|15645609|ref|NP_207785.1| | Hypothetical protein HP0994 | √ | √ | √ |
| gi|15644860|ref|NP_207030.1| | Hypothetical protein HP0232 | √ | √ | √ |
| √ | √ | | ||
| gi|15645351|ref|NP_207525.1| | Hypothetical protein HP0731 | √ | √ | |
| gi|15645518|ref|NP_207693.1| | Hydrogenase expression/formation protein (hypB) | √ | √ | √ |
| √ | √ |
Note: ‘Joint’, ‘bpbAac’ and ‘psAac’ represent ‘T4SEpre_Joint’, ‘T4SEpre_bpbAac’ and ‘T4SEpre_psAac’ model, respectively. The genes with one or more of the three motifs identified in this study were in italic.