| Literature DB >> 28211521 |
Gandharva Nagpal1, Salman Sadullah Usmani1, Sandeep Kumar Dhanda1, Harpreet Kaur1, Sandeep Singh1, Meenu Sharma1, Gajendra P S Raghava1.
Abstract
In the past, numerous methods have been developed to predict MHC class II binders or T-helper epitopes for designing the epitope-based vaccines against pathogens. In contrast, limited attempts have been made to develop methods for predicting T-helper epitopes/peptides that can induce a specific type of cytokine. This paper describes a method, developed for predicting interleukin-10 (IL-10) inducing peptides, a cytokine responsible for suppressing the immune system. All models were trained and tested on experimentally validated 394 IL-10 inducing and 848 non-inducing peptides. It was observed that certain types of residues and motifs are more frequent in IL-10 inducing peptides than in non-inducing peptides. Based on this analysis, we developed composition-based models using various machine-learning techniques. Random Forest-based model achieved the maximum Matthews's Correlation Coefficient (MCC) value of 0.59 with an accuracy of 81.24% developed using dipeptide composition. In order to facilitate the community, we developed a web server "IL-10pred", standalone packages and a mobile app for designing IL-10 inducing peptides (http://crdd.osdd.net/raghava/IL-10pred/).Entities:
Mesh:
Substances:
Year: 2017 PMID: 28211521 PMCID: PMC5314457 DOI: 10.1038/srep42851
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Role of different types of immune cells in production of interleukin-10.
Figure 2A schematic diagram of immunosuppressive mechanism of Interleukin-10.
It mainly involves dendritic cells (DC), major histocompatibility complex (MHC), phosphatidylinositol 3-kinase (PI3-K) and immunoglobulin.
Figure 3Visualization of residues conserved in IL-10 inducing and non-inducing peptides using two-sample logo.
Figure 4Bar graph shows average amino acid composition of IL-10 inducing and non-inducing peptides.
Exclusive motifs found in IL-10 inducing and non-inducing peptides; motifs searched using MERCI program.
| IL-10 inducing peptide | IL-10 Non-inducing peptides | ||||||
|---|---|---|---|---|---|---|---|
| Motif | # of sequences | Coverage of positive dataset | # of unique Sequences | Motif | # of sequences | Coverage of negative dataset | # of unique Sequences |
| R-D-H | 12 | 12 | 12 | A-T-A-A-T | 32 | 32 | 32 |
| L-A-E-Y | 11 | 23 | 11 | V-W-Q | 26 | 58 | 26 |
| I-F-L-V | 10 | 33 | 10 | PG-P-G | 25 | 83 | 25 |
| G-A-Q-G-K | 10 | 43 | 10 | K-P-G-D | 22 | 104 | 21 |
| H-F-T | 10 | 52 | 9 | KDV | 21 | 124 | 20 |
| E-V-C-G | 10 | 61 | 9 | A-G-A-T-A | 27 | 143 | 19 |
| R-L-K-V-A | 10 | 69 | 8 | V-GP | 25 | 163 | 20 |
| PLL | 9 | 78 | 9 | EA-A-T | 24 | 181 | 18 |
| I-K-R-K | 9 | 87 | 9 | A-VA-V | 23 | 199 | 18 |
| E-R-V-V | 9 | 95 | 8 | VP-K | 23 | 217 | 18 |
The performance of SVM based models developed using different peptide features.
| Features | Threshold | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|
| Whole peptide length | |||||
| AAC | −0.5 | 70.05 | 73.35 | 72.30 | 0.41 |
| DPC | −0.3 | 79.95 | 77.71 | 78.42 | 0.55 |
| split-AAC | −0.6 | 70.05 | 75.35 | 73.67 | 0.43 |
| split-DPC | −0.4 | 67.77 | 75.00 | 72.71 | 0.41 |
| NT8 | |||||
| AAC | 0.3 | 63.20 | 63.56 | 63.45 | 0.25 |
| DPC | −0.4 | 67.01 | 66.63 | 66.75 | 0.32 |
| Binary | −0.2 | 64.72 | 68.28 | 67.15 | 0.31 |
| CT8 | |||||
| AAC | −0.2 | 62.69 | 64.39 | 63.85 | 0.25 |
| DPC | −0.4 | 67.77 | 64.03 | 65.22 | 0.30 |
| Binary | −0.3 | 63.20 | 62.74 | 62.88 | 0.24 |
| NT8CT8 | |||||
| AAC | −0.5 | 70.05 | 69.46 | 69.65 | 0.37 |
| DPC | −0.3 | 77.92 | 78.42 | 78.26 | 0.54 |
| Binary | −0.4 | 68.27 | 64.03 | 65.38 | 0.30 |
The performance of models based on different classifiers developed using amino acid and dipeptide composition; classifiers implemented using WEKA.
| Classifier | Threshold | Sensitivity | Specificity | Accuracy | MCC | Parameters |
|---|---|---|---|---|---|---|
| Amino Acid Composition (AAC) | ||||||
| IBK | 0.3 | 71.83 | 74.29 | 73.51 | 0.44 | -K 6 |
| SMO | 0.5 | 44.42 | 88.33 | 74.40 | 0.37 | -C 5 –G 0.001 |
| J48 | 0.2 | 66.50 | 69.10 | 68.28 | 0.34 | -C 0.4 -M 9 |
| Random forest | 0.3 | 80.46 | 79.95 | 80.11 | 0.58 | -I 300 |
| Dipeptide Composition (DPC) | ||||||
| IBK | 0.2 | 76.40 | 76.18 | 76.25 | 0.50 | -K 3 |
| SMO | 0.5 | 56.35 | 89.03 | 78.66 | 0.49 | -C 5 –G 0.001 |
| J48 | 0.1 | 67.26 | 67.10 | 67.15 | 0.32 | -C 0.4 -M 2 |
| Random forest | 0.3 | 79.70 | 81.96 | 81.24 | 0.59 | -I 600 |
The performance of models based on WEKA classifiers developed using with selected features obtain from amino acid and dipeptides composition.
| Classifier | Threshold | Sensitivity | Specificity | Accuracy | MCC | Parameters |
|---|---|---|---|---|---|---|
| 16 selected features from amino acid composition | ||||||
| IBK | 0.3 | 70.30 | 70.99 | 70.77 | 0.39 | -K 6 |
| SMO | 0.5 | 37.60 | 88.92 | 72.46 | 0.31 | -C 5 –G 0.001 |
| J48 | 0.2 | 66.50 | 69.10 | 68.28 | 0.34 | -C 0.3 -M 9 |
| Random forest | 0.3 | 79.95 | 78.42 | 78.90 | 0.55 | -I 700 |
| 57 selected features from dipeptide composition | ||||||
| IBK | 0.2 | 72.84 | 71.58 | 71.98 | 0.42 | -K 1 |
| SMO | 0.5 | 46.70 | 87.38 | 74.48 | 0.37 | -C 5 –G 0.01 |
| J48 | 0.3 | 72.84 | 70.52 | 71.26 | 0.41 | -C 0.4 -M 2 |
| Random forest | 0.3 | 77.66 | 77.00 | 77.21 | 0.52 | -I 200 |
Figure 5Flow chart shows processing of data in android based mobile app, developed for predicting IL-10 inducing peptides.
Figure 6ROC plot shows performance of dipeptide composition based models developed using different machine learning techniques; Random Forest (RFor) based model achieves maximum AUC 0.88.