| Literature DB >> 25110714 |
Min-Gang Su1, Chien-Hsun Huang2, Tzong-Yi Lee1, Yu-Ju Chen3, Hsin-Yi Wu3.
Abstract
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25110714 PMCID: PMC4109367 DOI: 10.1155/2014/972692
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Systematic workflow. It consists of the following steps: data collection and preprocessing, feature extraction, model learning and cross validation, and independent testing.
Figure 2Percent composition of 20 amino acids between positive data (bacterial toxin proteins) and negative data (nontoxin proteins).
Five-fold cross-validation performance of basic features in prediction of bacterial toxins.
| Features | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| Amino acid composition | 92.81% | 99.56% | 97.75% |
| Dipeptide composition | 87.42% | 96.71% | 94.06% |
| Functional domain | 100.0% | 94.73% | 96.15% |
Figure 3Percent composition of 20 amino acids between endotoxin and exotoxin.
Five-fold cross-validation performance of basic features in discrimination of endotoxins and exotoxins.
| Features | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| Amino acid composition | 93% | 93.93% | 94.02% |
| Dipeptide composition | 92.22% | 85.71% | 89.22% |
| Functional domain | 100.0% | 82.2% | 90.42% |
Figure 4Probability difference of 20 × 20 amino acid pairs between bacterial toxin proteins and nontoxin proteins. The amino acid pair with red box indicates an overrepresentation in bacterial toxin proteins; on the other hand, green box means an underrepresentation.
Figure 5Probability difference of 20 × 20 amino acid pairs between endotoxin and exotoxin. The amino acid pair with red box indicates an overrepresentation in endotoxin; green box means an overrepresentation in exotoxin.
Statistics of InterPro functional annotations in 167 bacterial toxin proteins. InterPro classifies sequences at superfamily, family, and subfamily levels and annotates of the occurrence of functional domains, repeats, and important sites. The annotations which occur in more than five bacterial toxins are presented with the information of InterPro ID, description, and bacterial toxin proteins.
| InterPro ID | Description | Number of bacterial |
|---|---|---|
| IPR005639 | Endotoxin_N | 84 |
| IPR008979 | Galactose-bd-like | 83 |
| IPR005638 | Endotoxin_C | 82 |
| IPR001178 | Endotoxin_cen_dom | 75 |
| IPR015790 | Endotoxin_cen_dom_subgr1 | 68 |
| IPR006123 | Toxin_b-grasp_Staph/Strep | 15 |
| IPR006126 | Staph/Strept_toxin_CS | 15 |
| IPR006173 | Staph_tox_OB | 15 |
| IPR008992 | Enterotoxin_bac | 15 |
| IPR016091 | SuperAg_toxin_C | 15 |
| IPR006177 | Toxin_bac | 14 |
| IPR013307 | Superantigen_bac | 14 |
| IPR003995 | RTX_toxin_determinant-A | 13 |
| IPR011049 | Serralysin-like_metalloprot_C | 13 |
| IPR018504 | RTX_N | 13 |
| IPR001343 | Hemolysn_Ca-bd | 12 |
| IPR001340 | Leukocidin/haemolysin_toxin | 11 |
| IPR001489 | Heat-stable_enterotox_STa | 11 |
| IPR008985 | ConA-like_lec_gl_sf | 11 |
| IPR013320 | ConA-like_subgrp | 11 |
| IPR018511 | Hemolysin-typ_Ca-bd_CS | 11 |
| IPR019806 | Heat-stable_enterotox_CS | 11 |
| IPR000395 | Neurotox_Zn_protease | 9 |
| IPR001869 | Thiol_cytolysin | 9 |
| IPR011065 | Kunitz_inhibitor_ST1-like | 9 |
| IPR012500 | Toxin_trans | 9 |
| IPR012928 | Toxin_rcpt-bd_N | 9 |
| IPR013104 | Toxin_rcpt-bd_C | 9 |
| IPR013550 | RTX_C | 9 |
| IPR015214 | Endotoxin_cen_dom_subgr2 | 8 |
| IPR001615 | Endotoxin_CytB | 7 |
| IPR003963 | Bi-component_toxin_staph | 7 |
| IPR003842 | Vacuolating_cytotoxin | 6 |
| IPR005015 | Thermostable_hemolysn_vibrio | 6 |
| IPR005546 | Auto_transptbeta | 6 |
| IPR008947 | PLipase_C/P1_nuclease | 6 |
| IPR001024 | PLAT/LH2_dom | 5 |
| IPR001144 | Enterotoxin_A | 5 |
| IPR001531 | PLipaseC_domain | 5 |
| IPR008976 | Lipase_LipOase | 5 |
Predictive performance of basic features for distinghishing bacterial toxins and nontoxins on an independent testing data.
| Features | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| Amino acid composition | 30.99% | 93.95% | 78.17% |
| Dipeptide composition | 27.31% | 88.89% | 73.45% |
| Functional domain | 99.63% | 93.95% | 95.37% |
Predictive performance of basic features for distinghishing endo- and exotoxins on an independent testing data.
| Features | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| Amino acid composition | 0% | 94.49% | 38.01% |
| Dipeptide composition | 11.72% | 89% | 43.17% |
| Functional domain | 94.44% | 73.39% | 85.98% |