| Literature DB >> 30914676 |
Piyush Agrawal1,2, Sumit Kumar3, Archana Singh4, Gajendra P S Raghava5, Indrakant K Singh6.
Abstract
Insect neuropeptides and their associated receptors have been one of the potential targets for the pest control. The present study describes in silico models developed using natural and modified insect neuropeptides for predicting and designing new neuropeptides. Amino acid composition analysis revealed the preference of residues C, D, E, F, G, N, S, and Y in insect neuropeptides The positional residue preference analysis show that in natural neuropeptides residues like A, N, F, D, P, S, and I are preferred at N terminus and residues like L, R, P, F, N, and G are preferred at C terminus. Prediction models were developed using input features like amino acid and dipeptide composition, binary profiles and implementing different machine learning techniques. Dipeptide composition based SVM model performed best among all the models. In case of NeuroPIpred_DS1, model achieved an accuracy of 86.50% accuracy and 0.73 MCC on training dataset and 83.71% accuracy and 0.67 MCC on validation dataset whereas in case of NeuroPIpred_DS2, model achieved 97.47% accuracy and 0.95 MCC on training dataset and 97.93% accuracy and 0.96 MCC on validation dataset. In order to assist researchers, we created standalone and user friendly web server NeuroPIpred, available at ( https://webs.iiitd.edu.in/raghava/neuropipred .).Entities:
Mesh:
Substances:
Year: 2019 PMID: 30914676 PMCID: PMC6435694 DOI: 10.1038/s41598-019-41538-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of insect neuropeptide biosynthesis and secretion.
Figure 2Comparison of percent average composition of residues present in (A) natural insect neuropeptides and random peptides, (B) modified insect neuropeptides and modified bioactive peptides taken from SATPDB.
The performance of amino acid composition based models developed using different machine learning techniques on NeuroPIpred_DS1.
| Machine Learning Techniques (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM (g = 0.001, c = 2, j = 2) | 88.14 | 83.43 | 85.79 | 0.72 | 0.92 | 85.71 | 82.29 | 84.00 | 0.68 | 0.90 |
| Random Forest (Ntree = 20) | 86.29 | 85.71 | 86.00 | 0.72 | 0.93 | 83.43 | 84.57 | 84.00 | 0.68 | 0.91 |
| SMO (g = 0.001, c = 4) | 84.29 | 84.86 | 84.57 | 0.69 | 0.85 | 80.57 | 83.43 | 82.00 | 0.64 | 0.82 |
| J48 (c = 0.1, m = 10) | 81.86 | 80.43 | 81.14 | 0.62 | 0.84 | 80.00 | 81.71 | 80.86 | 0.62 | 0.86 |
| Naive Bayes (Default) | 82.29 | 80.57 | 81.43 | 0.63 | 0.87 | 76.00 | 79.43 | 77.71 | 0.55 | 0.83 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve.
The performance of amino acid composition based models developed using different machine learning techniques on NeuroPIpred_DS2.
| Machine Learning Techniques (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM (g = 0.005, c = 2, j = 2) | 97.28 | 96.53 | 96.95 | 0.94 | 0.99 | 97.55 | 96.83 | 97.23 | 0.94 | 0.99 |
| Random Forest (Ntree = 60) | 97.52 | 95.26 | 96.53 | 0.93 | 0.99 | 97.06 | 95.56 | 96.40 | 0.93 | 0.98 |
| SMO (g = 0.001, c = 5) | 97.96 | 94.16 | 96.29 | 0.92 | 0.96 | 98.28 | 96.19 | 97.37 | 0.95 | 0.97 |
| J48 (c = 0.4, m = 3) | 91.96 | 90.84 | 91.47 | 0.83 | 0.93 | 93.87 | 93.97 | 93.91 | 0.88 | 0.94 |
| Naive Bayes (Default) | 90.10 | 89.42 | 89.80 | 0.79 | 0.94 | 89.22 | 91.11 | 90.04 | 0.80 | 0.95 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve.
The performance of dipeptide composition based models developed using different machine learning techniques on NeuroPIpred_DS1.
| Machine Learning Techniques (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM (g = 0.001, c = 1, j = 4) | 87.57 | 85.43 | 86.50 | 0.73 | 0.93 | 82.29 | 85.14 | 83.71 | 0.67 | 0.91 |
| Random Forest (Ntree = 70) | 90.29 | 82.00 | 86.14 | 0.73 | 0.94 | 86.86 | 69.14 | 78.00 | 0.57 | 0.89 |
| SMO (g = 0.0005, c = 5) | 84.57 | 86.86 | 85.71 | 0.71 | 0.86 | 79.43 | 88.00 | 83.71 | 0.68 | 0.84 |
| J48 (c = 0.3, m = 4) | 80.00 | 81.57 | 80.79 | 0.62 | 0.85 | 76.57 | 83.43 | 80.00 | 0.60 | 0.84 |
| Naive Bayes (Default) | 76.29 | 73.14 | 74.71 | 0.49 | 0.75 | 75.43 | 70.29 | 72.86 | 0.46 | 0.72 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve.
The performance of dipeptide composition based models developed using different machine learning techniques on NeuroPIpred_DS2.
| Machine Learning Techniques (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM (g = 0.001, c = 1, j = 3) | 97.96 | 96.84 | 97.47 | 0.95 | 0.99 | 98.28 | 97.46 | 97.93 | 0.96 | 0.99 |
| Random Forest (Ntree = 70) | 97.83 | 96.21 | 97.12 | 0.94 | 0.99 | 97.55 | 93.97 | 95.99 | 0.92 | 0.99 |
| SMO (g = 0.0005, c = 5) | 98.21 | 96.29 | 97.36 | 0.95 | 0.97 | 98.28 | 96.51 | 97.51 | 0.95 | 0.97 |
| J48 (c = 0.4, m = 3) | 93.44 | 90.06 | 91.95 | 0.84 | 094 | 94.61 | 88.57 | 91.98 | 0.84 | 0.93 |
| Naive Bayes (Default) | 93.38 | 86.03 | 90.15 | 0.80 | 0.90 | 94.12 | 86.03 | 90.59 | 0.81 | 0.90 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve.
The performance of SVM based model developed on NeuroPIpred_DS1, where models were developed using binary profile of part of peptide.
| Features (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| N5 (g = 0.05, c = 3, j = 2) | 76.97 | 76.86 | 76.91 | 0.54 | 0.83 | 72.41 | 74.29 | 73.35 | 0.47 | 0.80 |
| N10 (g = 0.1, c = 1, j = 4) | 83.31 | 79.18 | 81.26 | 0.63 | 0.90 | 83.64 | 84.62 | 84.13 | 0.68 | 0.91 |
| N15 (g = 0.005, c = 2, j = 1) | 81.78 | 79.92 | 80.77 | 0.62 | 0.88 | 82.05 | 77.04 | 79.37 | 0.59 | 0.88 |
| C5 (g = 0.05, c = 8, j = 2) | 75.68 | 73.80 | 74.75 | 0.49 | 0.82 | 74.71 | 78.86 | 76.79 | 0.54 | 0.83 |
| C10 (g = 0.05, c = 4, j = 3) | 79.88 | 77.05 | 78.48 | 0.57 | 0.87 | 80.61 | 78.70 | 79.64 | 0.59 | 0.90 |
| C15 (g = 0.1, c = 2, j = 2) | 77.68 | 77.63 | 77.65 | 0.55 | 0.86 | 81.20 | 80.00 | 80.56 | 0.61 | 0.89 |
| N5C5 (g = 0.05, c = 3, j = 4) | 82.98 | 79.18 | 81.10 | 0.62 | 0.89 | 81.03 | 79.43 | 80.23 | 0.60 | 0.89 |
| N10C10 (g = 0.05, c = 1, j = 1) | 84.35 | 85.56 | 84.95 | 0.70 | 0.93 | 87.27 | 85.21 | 86.23 | 0.72 | 0.94 |
| N15C15 (g = 0.05, c = 1, j = 1) | 84.74 | 84.70 | 84.72 | 0.69 | 0.92 | 86.32 | 85.19 | 85.71 | 0.71 | 0.93 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve, N5/N10/N15: First 5/10/15 elements from N-terminal, C5/C10/C15: First 5/10/15 elements from C-terminal, N5C5/N10C10/N15C15: First 5/10/15 elements from N-terminal as well as from C-terminal joined together.
The performance of SVM based model developed on NeuroPIpred_DS2, where models were developed using binary profile of part of peptide.
| Features (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| N5 (g = 0.5, c = 2, j = 1) | 95.11 | 93.92 | 94.59 | 0.89 | 0.99 | 94.85 | 93.97 | 94.47 | 0.89 | 0.99 |
| N10 (g = 0.5, c = 2, j = 1) | 97.56 | 94.32 | 95.80 | 0.92 | 0.99 | 98.83 | 94.30 | 96.40 | 0.93 | 0.99 |
| N15 (g = 0.1, c = 2, j = 1) | 97.22 | 94.60 | 95.63 | 0.91 | 0.99 | 98.43 | 96.79 | 97.45 | 0.95 | 0.99 |
| C5 (g = 1, c = 1, j = 2) | 97.52 | 97.39 | 97.47 | 0.95 | 0.99 | 97.55 | 96.51 | 97.10 | 0.94 | 0.99 |
| C10 (g = 0.1, c = 2, j = 2) | 98.07 | 96.86 | 97.41 | 0.95 | 0.99 | 99.22 | 96.64 | 97.84 | 0.96 | 0.99 |
| C15 (g = 0.1, c = 2, j = 1) | 98.93 | 95.57 | 96.89 | 0.94 | 0.99 | 99.21 | 92.51 | 95.22 | 0.91 | 0.99 |
| N5C5 (g = 0.05, c = 3, j = 2) | 98.27 | 97.24 | 97.81 | 0.96 | 0.99 | 98.77 | 97.78 | 98.34 | 0.97 | 0.99 |
| N10C10 (g = 0.1, c = 2, j = 1) | 98.48 | 97.54 | 97.97 | 0.96 | 0.99 | 98.83 | 97.32 | 98.02 | 0.96 | 0.99 |
| N15C15 (g = 0.005, c = 1, j = 3) | 97.86 | 97.09 | 97.39 | 0.95 | 0.99 | 97.64 | 96.79 | 97.13 | 0.94 | 0.99 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve, N5/N10/N15: First 5/10/15 elements from N-terminal, C5/C10/C15: First 5/10/15 elements from C-terminal, N5C5/N10C10/N15C15: First 5/10/15 elements from N-terminal as well as from C-terminal joined together.
The performance of SVM based models developed using different features on additional dataset.
| Features (Parameters) | NeuroPIpred_Similar Dataset | ||||
|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | |
| Amino acid composition (NeuroPIpred_DS1) (g = 0.1, c = 9, j = 1) | 85.71 | 70.29 | 78.00 | 0.57 | 0.85 |
| Amino acid composition (NeuroPIpred_DS2) (g = 0.1, c = 9, j = 1) | 97.55 | 97.06 | 97.30 | 0.95 | 0.99 |
| Dipeptide composition (NeuroPIpred_DS1) (g = 0.1, c = 9, j = 1) | 82.29 | 84.57 | 83.43 | 0.67 | 0.91 |
| Dipeptide composition (NeuroPIpred_DS2) (g = 0.1, c = 9, j = 1) | 98.28 | 96.32 | 97.30 | 0.95 | 0.99 |
| N10C10 Binary profile (NeuroPIpred_DS1) (g = 0.1, c = 9, j = 1) | 87.27 | 94.25 | 90.86 | 0.82 | 0.97 |
| N10C10 Binary profile (NeuroPIpred_DS2) (g = 0.1, c = 9, j = 1) | 98.83 | 98.19 | 98.45 | 0.97 | 0.99 |
*Sen: Sensitivity, Spc: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, AUROC: Area Under the Receiver Operating Characteristic curve, N10C10: First 10 elements form N-terminus and C-terminus respectively.
Comparison of NeuroPIpred with existing method NeuroPID on the NeuroPIpred_DS1 validation dataset.
| Method | Performance of benchmarking dataset NeuroPIpred_DS1 | |||||||
|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | Sen | Spc | Acc | MCC | |
| NuroPID | 175 | 9 | 166 | 0 | 100.00 | 5.16 | 52.57 | 0.16 |
| NeuroPIpred | 144 | 149 | 26 | 31 | 82.29 | 85.14 | 83.71 | 0.67 |
Figure 3Schematic representation of workflow used for developing NeuroPIpred.
Figure 4Comparison of percent average composition of residues present in insect neuropeptides and human neuropeptides.
Figure 5Schematic representation of generation of binary profiles. [Figure adapted from PLoS One 2011;6(9):e24039].