| Literature DB >> 30891059 |
Mst Shamima Khatun1, Md Mehedi Hasan1, Hiroyuki Kurata1,2.
Abstract
Numerous inflammatory diseases and autoimmune disorders by therapeutic peptides have received substantial consideration; however, the exploration of anti-inflammatory peptides via biological experiments is often a time-consuming and expensive task. The development of novel in silico predictors is desired to classify potential anti-inflammatory peptides prior to in vitro investigation. Herein, an accurate predictor, called PreAIP (Predictor of Anti-Inflammatory Peptides) was developed by integrating multiple complementary features. We systematically investigated different types of features including primary sequence, evolutionary and structural information through a random forest classifier. The final PreAIP model achieved an AUC value of 0.833 in the training dataset via 10-fold cross-validation test, which was better than that of existing models. Moreover, we assessed the performance of the PreAIP with an AUC value of 0.840 on a test dataset to demonstrate that the proposed method outperformed the two existing methods. These results indicated that the PreAIP is an accurate predictor for identifying AIPs and contributes to the development of AIPs therapeutics and biomedical research. The curated datasets and the PreAIP are freely available at http://kurata14.bio.kyutech.ac.jp/PreAIP/.Entities:
Keywords: anti-inflammatory peptides prediction; feature encoding; feature selection; inflammatory disease; random forest
Year: 2019 PMID: 30891059 PMCID: PMC6411759 DOI: 10.3389/fgene.2019.00129
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Computational framework of PreAIP.
Figure 2Sequence logo representation of positive and negative AIPs. The upper portion (enriched) is represented by positive AIPs, while lower portion (depleted) negative AIPs. The statistically significant local sequence within the N-terminal 15-residues of AIPs was plotted with p < 0.05 by Welch's t-test.
Figure 3Comparison of evolutionary information of positive and negative AIPs. Blue lines represent the positive AIP, while orange lines the negative AIPs. “*” represents that the APV is statistically different between both the AIPs, with p < 0.05 by the KW test.
Figure 4Comparison of eight high-quality amino acid indices between two samples of AIPs. The eight high-quality amino acid indices from HI1 to HI8 are placed at the centers of eight amino acid index clusters, which indicate high residue propensities of AAindex. The row represents the N-terminal peptide, while the blue lines signify the positive AIP and the orange lines the negative AIPs. “*” represents that the amino acid indices are statistically different between both the samples with p < 0.05 by the KW test.
Figure 5Comparison of 8 types of the SFs by SPIDER2 between positive and negative AIPs. The row represents the N-terminal peptide, while the blue lines signify the positive AIPs and the orange lines the negative AIPs. “*” represents that the SFs are statistically different between both the samples with p < 0.05 by the KW test.
Figure 6ROC curves of the various prediction models. (A) 10-fold CV test on a training dataset and (B) test dataset. The PreAIP combined the KSAAP, pKSAAP, and AAindex methods. High AUC values show accurate performance.
AUC values for prediction performance of the training dataset by 10-fold CV test.
| pKSAAP | 0.798 | 0.647 | 0.738 | 0.450 | 0.789 | 0.017 |
| AAindex | 0.795 | 0.644 | 0.735 | 0.448 | 0.774 | 0.012 |
| SPIDER2 | 0.765 | 0.434 | 0.633 | 0.235 | 0.739 | 0.004 |
| PEP2D | 0.769 | 0.411 | 0.629 | 0.219 | 0.734 | 0.004 |
| KSAAP | 0.805 | 0.656 | 0.745 | 0.463 | 0.813 | 0.118 |
| PreAIP | 0.806 | 0.709 | 0.767 | 0.508 | 0.833 |
PreAIP is the linear combination of the RF scores estimated by SPIDER2, PEP2D, KSAAP, AAindex, and pKSAAP encoding schemes and their weight coefficients are 0.00, 0.00, 0.15, 0.25, and 0.6, respectively. A p-value was computed based on the final model of AUC values by using a Wilcoxson matched-pair signed test.
Figure 7Top 20 amino acid pairs selected by the IG feature of the KSAAP method. (A) The radar diagram is represented by the composition of each amino acid pair whose length is proportional to the composition of KSAAP features. (B) Box plot shows the top 20 average value of feature scores (AVFS) by the IG. Red color denotes the positive AIPs, while gray color denotes the negative AIPs. The p-value is computed by two-sample t-test.
Performance comparison with exiting predictors using test dataset.
| AntiInflam (LA) | −0.3 | 0.892 | 0.258 | 0.638 | 0.197 | 0.647 | <0.001 |
| AntiInflam (MA) | 0.5 | 0.417 | 0.786 | 0.565 | 0.210 | 0.706 | <0.001 |
| AIPpred | Server | 0.746 | 0.741 | 0.744 | 0.479 | 0.813 | 0.039 |
| PreAIP | High | 0.871 | 0.618 | 0.770 | 0.512 | 0.840 | |
| Moderate | 0.747 | 0.784 | 0.762 | 0.522 | 0.840 | ||
| Low | 0.636 | 0.863 | 0.727 | 0.492 | 0.840 |
A p-value was computed based on AUC values by using a Wilcoxson matched-pair signed test and p < 0.05 indicates a statistically significant difference between the proposed PreAIP and each selected method. The performances of AntiInflam LA and MA methods were computed using default threshold (server) values of −0.3 and 0.5, respectively. The AIPpred threshold was the same as given by its server.
Performance comparison of PreAIP with AIPpred using training dataset.
| AIPpred | Default given in the server | 0.711 | 0.758 | 0.730 | 0.460 | 0.801 | 0.034 |
| PreAIP | High | 0.903 | 0.632 | 0.795 | 0.566 | 0.833 | |
| Moderate | 0.801 | 0.719 | 0.768 | 0.520 | 0.833 | ||
| Low | 0.709 | 0.784 | 0.739 | 0.484 | 0.833 |
A p-value was computed based on AUC values by using a Wilcoxson matched-pair signed test and p < 0.05 indicates a statistically significant difference between the proposed PreAIP and AIPpred.
AUC values of AIP prediction by different machine learning algorithms based on a 10-fold CV test.
| RF | 0.739 | 0.734 | 0.774 | 0.813 | 0.789 | 0.833 |
| NB | 0.659 | 0.655 | 0.707 | 0.729 | 0.717 | 0.736 |
| SVM | 0.698 | 0.677 | 0.738 | 0.766 | 0.749 | 0.779 |
| ANN | 0.662 | 0.649 | 0.716 | 0.741 | 0.736 | 0.753 |
“Combined” indicates that the performance of the optimized combined features. The combined score of RF was given as the sum of the five SPIDER2, PEP2D, AAindex, KSAAP, and pKSAAP features with weight values of 0.00, 0.00, 0.15, 0.25, and 0.6 respectively. In the same way, the weight values of NB, SVM, and ANN were given as (0.00, 0.00, 0.10, 0.35, and 0.55), (0.00, 0.00, 0.22, 0.45, and 0.33), and (0.00, 0.00, 0.18, 0.5, and 0.32), respectively.