| Literature DB >> 28724993 |
Haiyan Huo1, Tao Li2, Shiyuan Wang3, Yingli Lv3, Yongchun Zuo4, Lei Yang5.
Abstract
Presynaptic and postsynaptic neurotoxins are two groups of neurotoxins. Identification of presynaptic and postsynaptic neurotoxins is an important work for numerous newly found toxins. It is both costly and time consuming to determine these two neurotoxins by experimental methods. As a complement, using computational methods for predicting presynaptic and postsynaptic neurotoxins could provide some useful information in a timely manner. In this study, we described four algorithms for predicting presynaptic and postsynaptic neurotoxins from sequence driven features by using Increment of Diversity (ID), Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK). Each protein sequence was encoded by pseudo amino acid (PseAA) compositions and three biological motif features, including MEME, Prosite and InterPro motif features. The Maximum Relevance Minimum Redundancy (MRMR) feature selection method was used to rank the PseAA compositions and the 50 top ranked features were selected to improve the prediction accuracy. The PseAA compositions and three kinds of biological motif features were combined and 12 different parameters that defined as P1-P12 were selected as the input parameters of ID, MNBC, RF, and IBK. The prediction results obtained in this study were significantly better than those of previously developed methods.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28724993 PMCID: PMC5517432 DOI: 10.1038/s41598-017-06195-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The phylogenetic trees for (A) presynaptic neurotoxins and (B) postsynaptic neurotoxins.
Figure 2MEME motifs for (A) presynaptic neurotoxins motif 1, (B) presynaptic neurotoxins motif 2, (C) presynaptic neurotoxins motif 3, (D) postsynaptic neurotoxins motif 1, (E) postsynaptic neurotoxins motif 2, and (F) postsynaptic neurotoxins motif 3 in logo format. The regular expression for each MEME motif was shown at the bottom of each figure.
Results obtained by ID, MNBC, RF and IBK in identifying presynaptic and postsynaptic neurotoxins with 12 parameters.
| ID | MNBC | RF | IBK | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | |||||||||
| Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | Sn (%) | Sp (%) | |
| P1a | 88.46 | 92.00 | 91.30 | 87.50 | 91.03 | 92.21 | 91.30 | 90.00 | 96.15 | 82.61 | 86.21 | 95.00 | 88.46 | 82.61 | 85.19 | 86.36 |
| P2 | 92.31 | 92.31 | 91.30 | 91.30 | 92.31 | 92.31 | 91.30 | 91.30 | 98.72 | 84.06 | 87.50 | 98.31 | 92.31 | 85.51 | 87.80 | 90.77 |
| P3 | 91.03 | 92.21 | 91.30 | 90.00 | 93.59 | 92.41 | 91.30 | 92.65 | 94.87 | 86.96 | 89.16 | 93.75 | 91.03 | 89.86 | 91.03 | 89.86 |
| P4 | 93.59 | 92.41 | 91.30 | 92.65 | 94.87 | 92.50 | 91.30 | 94.03 | 96.15 | 88.41 | 90.36 | 95.31 | 93.59 | 88.41 | 90.12 | 92.42 |
| P5 | 93.59 | 92.41 | 91.30 | 92.65 | 91.03 | 92.21 | 91.30 | 90.00 | 97.44 | 85.51 | 88.37 | 96.72 | 92.31 | 88.41 | 90.00 | 91.04 |
| P6 | 94.87 | 92.50 | 91.30 | 94.03 | 93.59 | 92.41 | 91.30 | 92.65 | 97.44 | 85.51 | 88.37 | 96.72 | 94.87 | 88.41 | 90.24 | 93.85 |
| P7 | 97.44 | 91.57 | 89.86 | 96.88 | 98.72 | 91.67 | 89.86 | 98.41 | 96.15 | 88.41 | 90.36 | 95.31 | 84.62 | 88.41 | 89.19 | 83.56 |
| P8 | 100.0 | 90.70 | 88.41 | 100.0 | 100.0 | 91.76 | 89.86 | 100.0 | 100.00 | 89.86 | 91.76 | 100.00 | 87.18 | 88.41 | 89.47 | 85.92 |
| P9 | 98.72 | 92.77 | 91.30 | 98.44 | 98.72 | 91.67 | 89.86 | 98.41 | 97.44 | 91.30 | 92.68 | 96.92 | 88.46 | 88.41 | 89.61 | 87.14 |
| P10 | 100.0 | 91.76 | 89.86 | 100.0 | 100.0 | 90.70 | 88.41 | 100.0 | 100.00 | 89.86 | 91.76 | 100.00 | 92.31 | 94.20 | 94.74 | 91.55 |
| P11 | 98.72 | 91.67 | 89.86 | 98.41 | 97.44 | 92.68 | 91.30 | 96.92 | 97.44 | 91.43 | 92.68 | 96.97 | 89.74 | 92.75 | 93.33 | 88.89 |
| P12 | 98.72 | 92.77 | 91.30 | 98.44 | 100.0 | 92.86 | 91.30 | 100.0 | 100.00 | 91.30 | 92.86 | 100.00 | 92.31 | 94.20 | 94.74 | 91.55 |
aCome from[30] by using Increment of Diversity (ID).
Overall predictive accuracy and CC values obtained by ID, MNBC, RF and IBK in identifying presynaptic and postsynaptic neurotoxins with 12 parameters.
| ID | MNBC | RF | IBK | |||||
|---|---|---|---|---|---|---|---|---|
| Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | Presynaptic | Postsynaptic | |
| Acc (%) | CC | Acc (%) | CC | Acc (%) | CC | Acc (%) | CC | |
| P1a | 89.80 | 0.7963 | 91.16 | 0.8227 | 89.80 | 0.7998 | 85.71 | 0.7131 |
| P2 | 91.84 | 0.8361 | 91.84 | 0.8361 | 91.84 | 0.8428 | 89.12 | 0.7819 |
| P3 | 91.16 | 0.8227 | 92.52 | 0.8497 | 91.16 | 0.8237 | 90.48 | 0.8088 |
| P4 | 92.52 | 0.8497 | 93.20 | 0.8635 | 92.52 | 0.8511 | 91.16 | 0.8227 |
| P5 | 92.52 | 0.8497 | 91.16 | 0.8227 | 91.84 | 0.8401 | 90.48 | 0.8088 |
| P6 | 93.20 | 0.8635 | 92.52 | 0.8497 | 91.84 | 0.8401 | 91.84 | 0.8368 |
| P7 | 93.88 | 0.8786 | 94.56 | 0.8932 | 92.52 | 0.8511 | 86.39 | 0.7289 |
| P8 | 94.56 | 0.8954 | 95.24 | 0.9080 | 95.24 | 0.9080 | 87.76 | 0.7549 |
| P9 | 95.24 | 0.9061 | 94.56 | 0.8932 | 94.56 | 0.8917 | 88.44 | 0.7681 |
| P10 | 95.24 | 0.9080 | 94.56 | 0.8954 | 95.24 | 0.9080 | 93.20 | 0.8640 |
| P11 | 94.56 | 0.8932 | 94.56 | 0.8917 | 94.59 | 0.8990 | 91.16 | 0.8236 |
| P12 | 95.24 | 0.9061 | 95.92 | 0.9208 | 95.92 | 0.9208 | 93.20 | 0.8640 |
aCome from[30] by using Increment of Diversity (ID).
Figure 3(A) Overall predictive accuracies and (B) CC values obtained by four different algorithms with 12 parameters.
Combination of dipeptide parameters and motif parameters.
| Parameters | Number | Description of parameters |
|---|---|---|
| P1 | 400 | 400 dipeptides |
| P2 | 406 | 400 dipeptides and 6 kinds of MEME motifs |
| P3 | 413 | 400 dipeptides and 13 kinds of Prosite motifs |
| P4 | 419 | 400 dipeptides, 6 kinds of MEME motifs and 13 kinds of Prosite motifs |
| P5 | 446 | 400 dipeptides and 46 kinds of InterPro motifs |
| P6 | 452 | 400 dipeptides, 6 kinds of MEME motifs and 46 kinds of InterPro motifs |
| P7 | 50 | 50 dipeptides selected by MRMR |
| P8 | 56 | 50 dipeptides and 6 kinds of MEME motifs |
| P9 | 63 | 50 dipeptides and 13 kinds of Prosite motifs |
| P10 | 69 | 50 dipeptides, 13 kinds of Prosite motifs and 6 kinds of MEME motifs |
| P11 | 96 | 50 dipeptides and 46 kinds of InterPro motifs |
| P12 | 102 | 50 dipeptides, 46 kinds of InterPro motifs and 6 kinds of MEME motifs |