| Literature DB >> 30210341 |
Salman Sadullah Usmani1,2, Sherry Bhalla1, Gajendra P S Raghava1,2.
Abstract
Tuberculosis is one of the leading cause of death worldwide, particularly due to evolution of drug resistant strains. Antitubercular peptides may provide an alternate approach to combat antibiotic tolerance. Sequence analysis reveals that certain residues (e.g., Lysine, Arginine, Leucine, Tryptophan) are more prevalent in antitubercular peptides. This study describes the models developed for predicting antitubercular peptides by using sequence features of the peptides. We have developed support vector machine based models using different sequence features like amino acid composition, binary profile of terminus residues, dipeptide composition. Our ensemble classifiers that combines models based on amino acid composition and N5C5 binary pattern, achieves highest Acc of 73.20% with 0.80 AUROC on our main dataset. Similarly, the ensemble classifier achieved maximum Acc 75.62% with 0.83 AUROC on secondary dataset. Beside this, hybrid model achieves Acc of 75.87 and 78.54% with 0.83 and 0.86 AUROC on main and secondary dataset, respectively. In order to facilitate scientific community in designing of antitubercular peptides, we implement above models in a user friendly webserver (http://webs.iiitd.edu.in/raghava/antitbpred/).Entities:
Keywords: Mycobacterium; antimycobacterial therapy; antitubercular peptides; drug discovery; ensemble classifier; machine learning; tuberculosis
Year: 2018 PMID: 30210341 PMCID: PMC6121089 DOI: 10.3389/fphar.2018.00954
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1The construction of positive and negative dataset to develop machine learning models for prediction of Anti-tubercular peptides.
Figure 2Comparison of percent amino acid composition of anti-tubercular, antibacterial, and non-antibacterial peptides.
Figure 3Comparison of residue preference at N-terminal of (A) Anti-tubercular, (B) Anti-bacterial, and (C) Non-antibacterial peptide.
Figure 4Comparison of residue preference at C terminal of (A) Anti-tubercular, (B) Anti-bacterial, and (C) Non-antibacterial peptide.
The performance of different machine learning techniques based models on AntiTb_RD dataset developed using AAC of peptides.
| SVM | Train | 78.39 | 84.42 | 81.41 | 0.63 | 0.85 |
| Valid | 65.96 | 93.62 | 79.79 | 0.62 | 0.88 | |
| RF | Train | 74.87 | 74.87 | 74.87 | 0.50 | 0.85 |
| Valid | 87.23 | 91.49 | 89.36 | 0.79 | 0.94 | |
| SMO | Train | 75.88 | 80.40 | 78.14 | 0.56 | 0.78 |
| Valid | 80.85 | 82.98 | 81.91 | 0.64 | 0.82 | |
| NB | Train | 67.84 | 90.45 | 79.15 | 0.60 | 0.84 |
| Valid | 63.83 | 93.62 | 78.72 | 0.60 | 0.87 | |
| J48 | Train | 67.84 | 75.38 | 71.61 | 0.43 | 0.75 |
| Valid | 82.98 | 80.85 | 81.91 | 0.64 | 0.88 | |
The performance of different machine learning techniques based models on AntiTb_RD dataset developed using binary pattern of peptide segments obtained from N and C terminals.
| SVM | Train | 72.86 | 81.91 | 77.39 | 0.55 | 0.82 |
| Valid | 70.21 | 89.36 | 79.79 | 0.61 | 0.88 | |
| RF | Train | 73.87 | 78.39 | 76.13 | 0.52 | 0.82 |
| Valid | 72.34 | 89.36 | 80.85 | 0.63 | 0.89 | |
| SMO | Train | 70.85 | 80.40 | 75.63 | 0.51 | 0.76 |
| Valid | 74.47 | 91.49 | 82.98 | 0.67 | 0.83 | |
| NB | Train | 62.81 | 89.45 | 76.13 | 0.54 | 0.82 |
| Valid | 68.09 | 97.87 | 82.98 | 0.69 | 0.91 | |
| J48 | Train | 72.36 | 66.33 | 69.35 | 0.39 | 0.68 |
| Valid | 70.21 | 63.83 | 67.02 | 0.34 | 0.68 | |
The SVM based ensemble of AAC and N5C5 binary pattern on AntiTb_RD on five different training and validation datasets along with average results.
| Run 1 | 69.19 | 88.38 | 78.79 | 0.59 | 0.86 | 62.50 | 79.17 | 70.83 | 0.42 | 0.78 |
| Run 2 | 69.70 | 86.87 | 78.28 | 0.57 | 0.87 | 70.83 | 79.17 | 75.00 | 0.50 | 0.85 |
| Run 3 | 69.19 | 87.37 | 78.28 | 0.58 | 0.86 | 72.92 | 83.33 | 78.12 | 0.57 | 0.81 |
| Run 4 | 64.65 | 80.30 | 72.47 | 0.46 | 0.82 | 62.50 | 83.33 | 72.92 | 0.47 | 0.82 |
| Run 5 | 71.21 | 87.88 | 79.55 | 0.60 | 0.86 | 77.08 | 85.42 | 81.25 | 0.63 | 0.89 |
| Average | 68.79 | 86.16 | 77.47 | 0.56 | 0.85 | 69.17 | 82.08 | 75.62 | 0.52 | 0.83 |
The SVM based on hybrid features of AAC and N5C5 binary pattern on AntiTb_RD on five different training and validation datasets along with average results.
| Run 1 | 78.28 | 83.84 | 81.06 | 0.62 | 0.88 | 70.83 | 87.50 | 79.17 | 0.59 | 0.85 |
| Run 2 | 78.28 | 86.36 | 82.32 | 0.65 | 0.88 | 70.83 | 79.17 | 75.0 | 0.50 | 0.82 |
| Run 3 | 80.81 | 83.84 | 82.32 | 0.65 | 0.87 | 77.08 | 81.25 | 79.17 | 0.58 | 0.86 |
| Run 4 | 74.24 | 82.32 | 78.28 | 0.57 | 0.85 | 70.83 | 81.25 | 76.04 | 0.52 | 0.84 |
| Run 5 | 81.82 | 86.87 | 84.34 | 0.69 | 0.88 | 77.08 | 89.58 | 83.33 | 0.67 | 0.92 |
| Average | 78.68 | 84.64 | 81.66 | 0.64 | 0.87 | 73.33 | 83.75 | 78.54 | 0.57 | 0.86 |
The performance of different machine learning techniques based models on AntiTb_MD dataset developed using AAC of peptides.
| SVM | Train | 78.39 | 70.35 | 74.37 | 0.49 | 0.78 |
| Valid | 83.33 | 77.08 | 80.21 | 0.61 | 0.86 | |
| RF | Train | 75.88 | 77.39 | 76.63 | 0.53 | 0.84 |
| Valid | 72.92 | 72.92 | 72.92 | 0.46 | 0.78 | |
| SMO | Train | 74.37 | 74.37 | 74.37 | 0.49 | 0.74 |
| Valid | 83.33 | 87.50 | 85.42 | 0.71 | 0.85 | |
| NB | Train | 58.79 | 77.39 | 68.09 | 0.37 | 0.74 |
| Valid | 50.00 | 85.42 | 67.71 | 0.38 | 0.73 | |
| J48 | Train | 74.37 | 73.37 | 73.87 | 0.48 | 0.76 |
| Valid | 70.83 | 70.83 | 70.83 | 0.42 | 0.74 | |
The performance of different machine learning techniques based models on AntiTb_MD dataset developed using binary pattern of peptide segments obtained from N and C terminals.
| SVM | Train | 69.85 | 76.88 | 73.37 | 0.47 | 0.81 |
| Valid | 75.00 | 72.92 | 73.96 | 0.48 | 0.80 | |
| RF | Train | 80.00 | 72.36 | 72.36 | 0.45 | 0.78 |
| Valid | 77.08 | 66.67 | 71.88 | 0.44 | 0.75 | |
| SMO | Train | 67.34 | 72.36 | 69.85 | 0.40 | 0.70 |
| Valid | 72.92 | 81.25 | 77.08 | 0.54 | 0.77 | |
| NB | Train | 56.28 | 78.89 | 67.59 | 0.36 | 0.73 |
| Valid | 53.27 | 84.42 | 68.84 | 0.40 | 0.73 | |
| J48 | Train | 66.33 | 63.82 | 65.08 | 0.30 | 0.68 |
| Valid | 68.75 | 70.83 | 69.79 | 0.40 | 0.72 | |
The SVM based ensemble of AAC and N5C5 binary pattern on AntiTb_MD on five different training and validation datasets along with average results.
| Run 1 | 82.83 | 76.14 | 79.49 | 0.59 | 0.85 | 72.92 | 67.35 | 70.10 | 0.40 | 0.78 |
| Run 2 | 80.30 | 73.60 | 76.96 | 0.54 | 0.85 | 77.08 | 77.55 | 77.32 | 0.55 | 0.82 |
| Run 3 | 78.79 | 73.60 | 76.20 | 0.52 | 0.84 | 85.42 | 51.02 | 68.04 | 0.39 | 0.72 |
| Run 4 | 80.81 | 70.56 | 75.70 | 0.52 | 0.83 | 75.00 | 73.47 | 74.23 | 0.48 | 0.82 |
| Run 5 | 78.28 | 70.56 | 74.43 | 0.49 | 0.81 | 83.33 | 69.39 | 76.29 | 0.53 | 0.84 |
| Average | 80.20 | 72.89 | 76.56 | 0.53 | 0.83 | 78.75 | 67.76 | 73.20 | 0.47 | 0.80 |
The SVM based on hybrid features of AAC and N5C5 binary pattern on AntiTb_MD on five different training and validation datasets along with average results.
| Run 1 | 79.29 | 73.68 | 78.99 | 0.58 | 0.85 | 70.83 | 71.43 | 71.13 | 0.42 | 0.81 |
| Run 2 | 77.78 | 79.70 | 78.17 | 0.57 | 0.82 | 60.42 | 91.84 | 76.29 | 0.55 | 0.82 |
| Run 3 | 75.76 | 78.17 | 76.96 | 0.54 | 0.83 | 85.42 | 61.22 | 73.20 | 0.48 | 0.80 |
| Run 4 | 75.76 | 77.16 | 76.46 | 0.53 | 0.81 | 72.92 | 83.67 | 78.35 | 0.57 | 0.85 |
| Run 5 | 74.24 | 77.66 | 75.95 | 0.52 | 0.79 | 85.42 | 75.51 | 80.41 | 0.61 | 0.88 |
| Average | 76.76 | 77.27 | 77.48 | 0.55 | 0.82 | 75.02 | 76.73 | 75.87 | 0.52 | 0.83 |
p-values between AUROC of different methods obtained by implementing Wilcoxon rank sum test.
| 1 | AntiTb_RD dataset | Ensemble | SVM based on AAC | 0.73 |
| 2 | Ensemble | SVM based on N5C5 binary patterns | 0.01 | |
| 3 | Ensemble | SVM based on hybrid features | 0.1 | |
| 4 | AntiTb_MD dataset | Ensemble | SVM based on AAC | 0.03 |
| 5 | Ensemble | SVM based on N5C5 binary patterns | 0.01 | |
| 6 | Ensemble | SVM based on hybrid features | 0.52 |