| Literature DB >> 35053004 |
Onkar Singh1,2,3, Wen-Lian Hsu1,4, Emily Chia-Yu Su3,5.
Abstract
Interleukin (IL)-10 is a homodimer cytokine that plays a crucial role in suppressing inflammatory responses and regulating the growth or differentiation of various immune cells. However, the molecular mechanism of IL-10 regulation is only partially understood because its regulation is environment or cell type-specific. In this study, we developed a computational approach, ILeukin10Pred (interleukin-10 prediction), by employing amino acid sequence-based features to predict and identify potential immunosuppressive IL-10-inducing peptides. The dataset comprises 394 experimentally validated IL-10-inducing and 848 non-inducing peptides. Furthermore, we split the dataset into a training set (80%) and a test set (20%). To train and validate the model, we applied a stratified five-fold cross-validation method. The final model was later evaluated using the holdout set. An extra tree classifier (ETC)-based model achieved an accuracy of 87.5% and Matthew's correlation coefficient (MCC) of 0.755 on the hybrid feature types. It outperformed an existing state-of-the-art method based on dipeptide compositions that achieved an accuracy of 81.24% and an MCC value of 0.59. Our experimental results showed that the combination of various features achieved better predictive performance..Entities:
Keywords: anti-inflammatory; cytokines; extra tree classifier; immunosuppressive peptides; interleukin-10; machine learning
Year: 2021 PMID: 35053004 PMCID: PMC8773200 DOI: 10.3390/biology11010005
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1The systematic architecture of the proposed method, Ileukin10Pred, which included dataset collection, feature generation, SMOTE, feature selection, machine learning algorithms, and an evaluation process.
Figure 2Two-sample logo showing the preference of positively charged and hydrophobic residues in IL-10-inducing peptides and non-IL-10-inducing peptides at different positions. The first eight positions represent the N-terminus of peptides, and the last eight positions represent the C-terminus of peptides.
Figure 3Average percentages of amino acid compositions (AACs) in IL-10-inducing peptides and non-IL-10-inducing peptides.
Performances of machine learning models based on single-feature types for the benchmark training and test datasets. Values shown are the mean ± standard deviation for the training dataset.
| Training Set | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Feature | ETC | CatBoost | LGBM | ||||||
| Acc. % | AUC | MCC | Acc. % | AUC | MCC | Acc. % | AUC | MCC | |
| AAC | 82.3 ± 0.022 | 0.906 ± 0.013 | 0.647 ± 0.046 | 86.1 ± 0.022 | 0.920 ± 0.018 | 0.722 ± 0.045 | 85.3 ± 0.017 | 0.919 ± 0.014 | 0.707 ± 0.035 |
| DPC | 86.5 ± 0.008 | 0.942 ± 0.004 | 0.730 ± 0.017 | 84.2 ± 0.010 | 0.922 ± 0.011 | 0.685 ± 0.021 | 85.4 ± 0.015 | 0.920 ± 0.012 | 0.709 ± 0.029 |
| CTD | 84.6 ± 0.014 | 0.915 ± 0.005 | 0.693 ± 0.027 | 85.4 ± 0.013 | 0.912 ± 0.010 | 0.708 ± 0.025 | 85.6 ± 0.023 | 0.913 ± 0.012 | 0.704 ± 0.046 |
| AutoC | 82.9 ± 0.021 | 0.905 ± 0.012 | 0.664 ± 0.042 | 84.9 ± 0.014 | 0.903 ± 0.009 | 0.699 ± 0.029 | 84.8 ± 0.015 | 0.907 ± 0.014 | 0.696 ± 0.029 |
| QSO | 86.8 ± 0.013 | 0.925 ± 0.009 | 0.7369 ± 0.025 | 84.8 ± 0.029 | 0.912 ± 0.016 | 0.695 ± 0.055 | 84.3 ± 0.021 | 0.911 ± 0.019 | 0.687 ± 0.041 |
| SOC | 82.3 ± 0.004 | 0.887 ± 0.005 | 0.649 ± 0.008 | 80.7 ± 0.014 | 0.876 ± 0.008 | 0.619 ± 0.027 | 80.4 ± 0.016 | 0.869 ± 0.014 | 0.608 ± 0.033 |
|
| |||||||||
|
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
| |
| AAC | 83.5 | 0.912 | 0.674 | 85.1 | 0.919 | 0.705 | 85.4 | 0.903 | 0.712 |
| DPC | 86.6 | 0.943 | 0.733 | 84.5 | 0.925 | 0.689 | 84.8 | 0.919 | 0.695 |
| CTD | 83.8 | 0.913 | 0.678 | 83.8 | 0.887 | 0.677 | 83.8 | 0.891 | 0.677 |
| AutoC | 84.8 | 0.922 | 0.698 | 85.9 | 0.909 | 0.719 | 84.8 | 0.916 | 0.695 |
| QSO | 86.3 | 0.924 | 0.726 | 82.9 | 0.906 | 0.658 | 86.3 | 0.910 | 0.725 |
| SOC | 87.8 | 0.952 | 0.757 | 89.9 | 0.936 | 0.801 | 86.9 | 0.932 | 0.737 |
Performances and comparison with state-of-the-art machine learning models based on hybrid features for the benchmark training and test datasets. The values shown are mean ± standard deviation for the training dataset.
| Training Set | ||||||
|---|---|---|---|---|---|---|
| Model | Acc. (%) | AUC | Recall/Sen. (%) | Specificity (%) | Precision (%) | MCC |
| ETC | 86.5 ± 0.013 | 0.929 ± 0.015 | 82.2 ± 0.004 | 89.8 ± 0.025 | 88.3 ± 0.025 | 0.724 ± 0.027 |
| LGBM | 86.3 ± 0.015 | 0.918 ± 0.013 | 83.8 ± 0.016 | 88.6 ± 0.029 | 87.3 ± 0.025 | 0.726 ± 0.030 |
| CatBoost | 86.2 ± 0.019 | 0.916 ± 0.019 | 83.1 ± 0.009 | 88.9 ± 0.034 | 87.6 ± 0.033 | 0.724 ± 0.039 |
|
| ||||||
|
|
|
|
|
|
|
|
| IL-10Pred | 81.2 | 0.880 | 79.7 | 81.9 | N/A * | 0.590 |
| ETC | 87.5 | 0.931 | 80.4 | 94.7 | 92.7 | 0.755 |
| LGBM | 87.2 | 0.929 | 81.0 | 91.7 | 91.4 | 0.747 |
| CatBoost | 86.6 | 0.923 | 79.1 | 92.9 | 91.9 | 0.737 |
* N/A denotes “not available.” The precision score of IL-10Pred is not available in the manuscript (Nagpal et al., Scientific Reports, 2017).
Figure 4The area under the receiver operating characteristics (AUC) curve and the area under the precision–recall (AUCPR) curve show model performances, developed using selected features on the holdout set.
Figure 5A plot of the top 10 important features for the IL-10 training datasets.