| Literature DB >> 21533231 |
Ping Wang1, Lele Hu, Guiyou Liu, Nan Jiang, Xiaoyun Chen, Jianyong Xu, Wen Zheng, Li Li, Ming Tan, Zugen Chen, Hui Song, Yu-Dong Cai, Kuo-Chen Chou.
Abstract
Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of 'nature's antibiotics' is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21533231 PMCID: PMC3076375 DOI: 10.1371/journal.pone.0018476
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The physicochemical and biochemical properties of the 20 amino acids.
| Amino Acid | Polarity | Secondary structure | Molecular volume | Codon diversity | Electrostatic charge |
| A | −0.591 | −1.302 | −0.733 | 1.57 | −0.146 |
| C | −1.343 | 0.465 | −0.862 | −1.02 | −0.255 |
| D | 1.05 | 0.302 | −3.656 | −0.259 | −3.242 |
| E | 1.357 | −1.453 | 1.477 | 0.113 | −0.837 |
| F | −1.006 | −0.59 | 1.891 | −0.397 | 0.412 |
| G | −0.384 | 1.652 | 1.33 | 1.045 | 2.064 |
| H | 0.336 | −0.417 | −1.673 | −1.474 | −0.078 |
| I | −1.239 | −0.547 | 2.131 | 0.393 | 0.816 |
| K | 1.831 | −0.561 | 0.533 | −0.277 | 1.648 |
| L | −1.019 | −0.987 | −1.505 | 1.266 | −0.912 |
| M | −0.663 | −1.524 | 2.219 | −1.005 | 1.212 |
| N | 0.945 | 0.828 | 1.299 | −0.169 | 0.933 |
| P | 0.189 | 2.081 | −1.628 | 0.421 | −1.392 |
| Q | 0.931 | −0.179 | −3.005 | −0.503 | −1.853 |
| R | 1.538 | −0.055 | 1.502 | 0.44 | 2.897 |
| S | −0.228 | 1.399 | −4.76 | 0.67 | −2.647 |
| T | −0.032 | 0.326 | 2.213 | 0.908 | 1.313 |
| V | −1.337 | −0.279 | −0.544 | 1.242 | −1.262 |
| W | −0.595 | 0.009 | 0.672 | −2.128 | −0.184 |
| Y | 0.26 | 0.83 | 3.097 | −0.838 | 1.512 |
Listed below are the scores of the physicochemical and biochemical properties of the 20 amino acids, each of which can be coded by a 5-dimensional vector.
Figure 1An illustration to show (I) TP (true positive) quadrant (green) for correct prediction of positive dataset, (II) FP (false positive) quadrant (red) for incorrect prediction of negative dataset; (III) TN (true negative) quadrant (blue) for correct prediction of negative dataset; and (IV) FN (false negative) quadrant (pink) for incorrect prediction of positive dataset.
The predicted results of the three methods.
| Method | Number of Predicted Peptides | Sn (%) | Sp (%) | AC (%) | MCC |
| Sequence Alignment Method | 5855 | 91.22 | 95.55 | 95.12 | 0.7723 |
| Feature selection Method | 3876 | 56.83 | 93.19 | 90.58 | 0.6426 |
| Integrated Method | 9731 | 80.23 | 94.59 | 93.31 | 0.7312 |
Figure 2IFS curve.
It reveals the relation between the performance of the NNA predictor and the feature subset. The IFS curve arrives at the apogee when the feature set is comprised of the first 25 features in the mRMR feature list.
Comparison between CAMP and our method on the test set.
| Method | Algorithm | Predicted AMPs | Sn (%) |
| CAMP | Support Vector Machine | 866 | 76.23 |
| CAMP | Random Forest | 852 | 75.00 |
| CAMP | Discriminant Analysis | 881 | 77.55 |
| Our Method | BLASTP+Nearest Neighbor Algorithm | 965 | 84.95 |
Comparison between sequence alignment method and feature selection method.
| Dataset | Method | Number of Predicted Peptides | Number of Correctly Predicted Peptides | Sn (%) |
| Original Dataset with high sequence similarity | Sequence Alignment | 986 | 896 | 90.87 |
| Feature Selection | 1136 | 791 | 69.63 | |
| Dataset with <0.7 sequence similarity | Sequence Alignment | 869 | 679 | 78.14 |
| Feature Selection | 1136 | 692 | 60.92 |
Figure 3The numbers of each kind of features in optimal features.
In the feature space, all the features can be classified into six kinds: amino acid composition, codon diversity, electrostatic charge, molecular volume, polarity and secondary structure.
Figure 4Amino acid distribution in AMPs and non-AMPs.
* indicates amino acid in the optimal feature set.