| Literature DB >> 32024233 |
Chia-Ru Chung1, Jhih-Hua Jhong2, Zhuo Wang2, Siyu Chen3, Yu Wan3, Jorng-Tzong Horng1,4, Tzong-Yi Lee2,3.
Abstract
Because of the rapid development of multidrug resistance, conventional antibiotics cannot kill pathogenic bacteria efficiently. New antibiotic treatments such as antimicrobial peptides (AMPs) can provide a possible solution to the antibiotic-resistance crisis. However, the identification of AMPs using experimental methods is expensive and time-consuming. Meanwhile, few studies use amino acid compositions (AACs) and physicochemical properties with different sequence lengths against different organisms to predict AMPs. Therefore, the major purpose of this study is to identify AMPs on seven categories of organisms, including amphibians, humans, fish, insects, plants, bacteria, and mammals. According to the one-rule attribute evaluation, the selected features were used to construct the predictive models based on the random forest algorithm. Compared to the accuracies of iAMP-2L (a web-server for identifying AMPs and their functional types), ADAM (a database of AMP), and MLAMP (a multi-label AMP classifier), the proposed method yielded higher than 92% in predicting AMPs on each category. Additionally, the sensitivities of the proposed models in the prediction of AMPs of seven organisms were higher than that of all other tools. Furthermore, several physicochemical properties (charge, hydrophobicity, polarity, polarizability, secondary structure, normalized van der Waals volume, and solvent accessibility) of AMPs were investigated according to their sequence lengths. As a result, the proposed method is a practical means to complement the existing tools in the characterization and identification of AMPs in different organisms.Entities:
Keywords: antimicrobial peptides; feature selection; machine learning; organisms; sequence analysis
Mesh:
Substances:
Year: 2020 PMID: 32024233 PMCID: PMC7038045 DOI: 10.3390/ijms21030986
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Average AACs of (A) AMPs and non-AMPs, and (B) AMPs with respect to the seven categories of organisms.
Figure 2Comparisons of physicochemical properties between AMPs and non-AMPs for (A) hydrophobicity, (B) polarity, and (C) charge.
Figure A1Comparisons of charge distributions between AMPs and non-AMPs.
Figure A2Comparisons of physicochemical properties between AMPs and non-AMPs for (A) polarizability, (B) normalized van der Waals volume, (C) secondary structure, and (D) solvent accessibility.
Figure 3Comparisons of (A) charge on different positions of sequence between AMPs and non-AMPs, and (B) hydrophobicity at different positions of sequence between AMPs and non-AMPs.
Figure A3Comparisons of physicochemical properties between AMPs and non-AMPs at different positions (quantiles of sequence length) for (A) polarity, (B) polarizability, (C) normalized van der Waals volume, (D) secondary structure, and (E) solvent accessibility.
Distribution of AMP sequence lengths among different organisms on training datasets.
| Organisms | Number of Peptides with Length L | ||||||
|---|---|---|---|---|---|---|---|
| L ≤ 20 | 20 < L ≤ 40 | 40 < L ≤ 60 | 60 < L ≤ 80 | 80 < L ≤ 100 | 100 < L | Total | |
| Amphibia | 269 | 437 | 28 | 3 | 0 | 4 | 741 |
| Bacteria | 117 | 111 | 61 | 16 | 13 | 27 | 345 |
| Fish | 18 | 54 | 10 | 5 | 3 | 5 | 95 |
| Human | 11 | 53 | 13 | 26 | 7 | 76 | 186 |
| Insects | 67 | 94 | 32 | 12 | 7 | 8 | 220 |
| Mammals | 78 | 180 | 51 | 43 | 11 | 85 | 448 |
| Plants | 63 | 153 | 95 | 7 | 14 | 32 | 364 |
Figure 4Comparisons of AMP hydrophobicity (A) in different categories of organisms and (B) at different positions of sequence (percentiles of sequence length) in each category of organism.
Figure A4Comparisons of AMP charges (A) for different categories of organisms and (B) at different positions of sequence (percentiles of sequence length) in each category of organism.
Figure A5Charge distribution of AMPs from different organisms.
Figure A6Performance with different numbers of features using forward selection method for (A) amphibians, (B) bacteria, (C) fish, (D) humans, (E) insects, (F) mammals, and (G) plants. Note that the red point means the number of features associated with the accuracy for the optimal model.
Figure A7Top 100 features for (A) Amphibians, (B) bacteria, (C) fish, (D) humans, (E) insects, (F) mammals, and (G) plants. The rank column with blue background color indicates that the feature was selected from the feature-selection method. The features marked red in (A) are related to charge property which is the majority member among the top 100 features for Amphibians. The features marked yellow in (B) are associated with the hydrophobicity which is the majority member among the top 100 features for bacteria. The features marked orange in (D) are related to AAPC which is the majority member among the top 100 features for human.
Figure 5Distribution of features (top 100). Shows the performance of AAC and amino acid pair composition (AAPC), as well as physicochemical composition in different organisms.
Figure A8AAPC heatmaps for (A) human, (B) amphibians, (C) bacteria, (D) fish, (E) insects, (F) mammals, and (G) plants.
Performance of training datasets for the AMPs derived from different organisms. The optimal models which contain best prediction performance are marked in blue background color. It would be noted that the optimal model was determined as the one with the minimum difference between sensitivity and specificity.
| Organisms | Classifier | Sensitivity | Specificity | Accuracy | Matthews Correlation Coefficient |
|---|---|---|---|---|---|
| Amphibia | RF | 99.19% | 99.18% | 99.19% | 0.981 |
| DT | 97.84% | 98.81% | 98.50% | 0.965 | |
| KNN | 96.76% | 99.81% | 98.84% | 0.973 | |
| SVM | 98.92% | 98.93% | 98.93% | 0.975 | |
| Bacteria | RF | 95.94% | 96.18% | 96.16% | 0.735 |
| DT | 86.67% | 97.95% | 97.34% | 0.769 | |
| KNN | 73.62% | 99.44% | 98.04% | 0.7959 | |
| SVM | 95.94% | 95.94% | 95.94% | 0.725 | |
| Fish | RF | 96.84% | 96.87% | 96.87% | 0.789 |
| DT | 73.68% | 98.43% | 96.93% | 0.728 | |
| KNN | 68.42% | 99.52% | 97.63% | 0.774 | |
| SVM | 82.11% | 99.86% | 98.79% | 0.889 | |
| Human | RF | 94.09% | 93.07% | 93.10% | 0.489 |
| DT | 74.19% | 98.15% | 97.49% | 0.615 | |
| KNN | 68.28% | 98.94% | 98.10% | 0.654 | |
| SVM | 88.17% | 87.82% | 87.83% | 0.354 | |
| Insects | RF | 96.36% | 96.33% | 96.34% | 0.838 |
| DT | 91.36% | 97.56% | 96.88% | 0.849 | |
| KNN | 85.91% | 98.28% | 96.93% | 0.842 | |
| SVM | 95.00% | 95.11% | 95.10% | 0.793 | |
| Mammals | RF | 94.42% | 95.24% | 95.19% | 0.708 |
| DT | 83.71% | 92.60% | 92.06% | 0.560 | |
| KNN | 74.55% | 98.92% | 97.43% | 0.767 | |
| SVM | 93.97% | 93.97% | 93.97% | 0.662 | |
| Plants | RF | 97.53% | 97.39% | 97.39% | 0.822 |
| DT | 88.74% | 98.82% | 98.19% | 0.851 | |
| KNN | 80.49% | 99.45% | 98.26% | 0.845 | |
| SVM | 96.70% | 96.70% | 96.70% | 0.786 |
Note. RF = random forest; DT = decision tree; KNN = K-nearest neighbor; SVM = support vector machine.
Performance of the models using data from different types of organisms in the independent test.
| Organisms | Sensitivity | Specificity | Accuracy | Matthews Correlation Coefficient |
|---|---|---|---|---|
| Amphibia | 100.00% | 98.24% | 98.80% | 0.973 |
| Bacteria | 96.51% | 96.36% | 96.36% | 0.746 |
| Fish | 100.00% | 97.00% | 97.18% | 0.810 |
| Human | 97.83% | 92.17% | 92.33% | 0.482 |
| Insects | 100.00% | 97.56% | 97.82% | 0.900 |
| Mammals | 92.79% | 94.56% | 94.46% | 0.673 |
| Plants | 97.78% | 97.94% | 97.93% | 0.851 |
Figure 6Comparison of ROC curves between our method and other prediction tools in the identification of AMPs on (A) Amphibians, (B) bacteria, (C) fish, (D) humans, (E) insects, (F) mammals, and (G) plants.
Comparisons of independent testing results between our method and other prediction tools in the identification of AMPs on different organisms.
| Organisms | Classifier | Sensitivity | Specificity | Accuracy | Matthews Correlation Coefficient |
|---|---|---|---|---|---|
| Amphibia | Our method | 100.00% | 98.24% | 98.80% | 0.973 |
| iAMPpred | 98.92% | 1.51% | 32.42% | 0.017 | |
| iAMP-2L | 96.76% | 98.99% | 98.28% | 0.960 | |
| ADAM | 98.38% | 99.50% | 99.14% | 0.980 | |
| DBAASP | 90.22% | 76.92% | 89.34% | 0.477 | |
| MLAMP | 90.27% | 98.24% | 95.71% | 0.900 | |
| CAMPR3_RF | 98.92% | 1.01% | 32.08% | −0.004 | |
| CAMPR3_SVM | 97.30% | 1.01% | 31.56% | −0.064 | |
| CAMPR3_ANN | 92.97% | 54.77% | 66.90% | 0.454 | |
| CAMPR3_DA | 95.14% | 0.75% | 30.70% | −0.135 | |
| Bacteria | Our method | 96.51% | 96.36% | 96.36% | 0.746 |
| iAMPpred | 84.88% | 1.99% | 6.46% | −0.183 | |
| iAMP-2L | 83.72% | 99.54% | 98.68% | 0.867 | |
| ADAM | 90.70% | 98.87% | 98.43% | 0.855 | |
| DBAASP | 35.44% | 80.00% | 57.86% | 0.173 | |
| MLAMP | 65.12% | 99.47% | 97.62% | 0.743 | |
| CAMPR3_RF | 90.70% | 1.99% | 6.77% | −0.108 | |
| CAMPR3_SVM | 79.07% | 2.72% | 6.83% | −0.218 | |
| CAMPR3_ANN | 68.60% | 45.00% | 46.27% | 0.062 | |
| CAMPR3_DA | 76.74% | 2.78% | 6.77% | −0.239 | |
| Fish | Our method | 100.00% | 97.00% | 97.18% | 0.810 |
| iAMPpred | 91.30% | 1.63% | 6.92% | −0.117 | |
| iAMP-2L | 86.96% | 99.46% | 98.72% | 0.882 | |
| ADAM | 95.65% | 99.18% | 98.97% | 0.912 | |
| DBAASP | 82.61% | 80.00% | 81.58% | 0.620 | |
| MLAMP | 91.30% | 99.46% | 98.97% | 0.908 | |
| CAMPR3_RF | 91.30% | 1.36% | 6.67% | −0.130 | |
| CAMPR3_SVM | 95.65% | 2.18% | 7.69% | −0.034 | |
| CAMPR3_ANN | 82.61% | 50.68% | 52.56% | 0.157 | |
| CAMPR3_DA | 86.96% | 1.36% | 6.41% | −0.194 | |
| Human | Our method | 97.83% | 92.17% | 92.33% | 0.482 |
| iAMPpred | 91.30% | 22.88% | 24.73% | 0.055 | |
| iAMP-2L | 54.35% | 98.18% | 96.99% | 0.482 | |
| ADAM | 52.17% | 98.91% | 97.64% | 0.534 | |
| DBAASP | 40.54% | 86.84% | 64.00% | 0.310 | |
| MLAMP | 50.00% | 98.36% | 97.05% | 0.464 | |
| CAMPR3_RF | 93.48% | 0.85% | 3.36% | −0.092 | |
| CAMPR3_SVM | 82.61% | 1.09% | 3.31% | −0.215 | |
| CAMPR3_ANN | 69.57% | 48.67% | 49.23% | 0.059 | |
| CAMPR3_DA | 84.78% | 1.46% | 3.72% | −0.167 | |
| Insects | Our method | 100.00% | 97.56% | 97.82% | 0.900 |
| iAMPpred | 94.44% | 39.11% | 45.04% | 0.217 | |
| iAMP-2L | 94.44% | 96.67% | 96.43% | 0.835 | |
| ADAM | 100.00% | 96.67% | 97.02% | 0.870 | |
| DBAASP | 70.37% | 90.91% | 73.85% | 0.469 | |
| MLAMP | 72.22% | 98.00% | 95.24% | 0.740 | |
| CAMPR3_RF | 87.04% | 1.33% | 10.52% | −0.227 | |
| CAMPR3_SVM | 87.04% | 1.33% | 10.52% | −0.227 | |
| CAMPR3_ANN | 87.04% | 43.33% | 48.02% | 0.192 | |
| CAMPR3_DA | 79.63% | 1.56% | 9.92% | −0.314 | |
| Mammals | Our method | 92.79% | 94.56% | 94.46% | 0.673 |
| iAMPpred | 95.50% | 68.94% | 70.54% | 0.322 | |
| iAMP-2L | 68.47% | 98.73% | 96.90% | 0.712 | |
| ADAM | 65.77% | 99.48% | 97.45% | 0.753 | |
| DBAASP | 45.88% | 83.02% | 60.14% | 0.295 | |
| MLAMP | 51.35% | 98.44% | 95.60% | 0.568 | |
| CAMPR3_RF | 93.69% | 1.27% | 6.85% | −0.096 | |
| CAMPR3_SVM | 92.79% | 1.91% | 7.39% | −0.085 | |
| CAMPR3_ANN | 78.38% | 48.58% | 50.38% | 0.129 | |
| CAMPR3_DA | 88.29% | 2.14% | 7.34% | −0.140 | |
| Plants | Our method | 97.78% | 97.94% | 97.93% | 0.851 |
| iAMPpred | 90.00% | 0.81% | 6.35% | −0.190 | |
| iAMP-2L | 77.78% | 98.67% | 97.38% | 0.773 | |
| ADAM | 84.44% | 98.67% | 97.79% | 0.815 | |
| DBAASP | 34.94% | 88.46% | 47.71% | 0.219 | |
| MLAMP | 58.89% | 98.82% | 96.34% | 0.654 | |
| CAMPR3_RF | 86.67% | 0.59% | 5.94% | −0.264 | |
| CAMPR3_SVM | 83.33% | 0.88% | 6.01% | −0.282 | |
| CAMPR3_ANN | 74.44% | 47.57% | 49.24% | 0.107 | |
| CAMPR3_DA | 75.56% | 1.10% | 5.73% | −0.357 |
RF = random forest; DT = decision tree; KNN = K-nearest neighbor; SVM = support vector machine; ANN = artificial neural network; DA = discriminant analysis.
Figure 7Conceptual framework. This study was divided into three parts: data collection and preprocessing, feature investigation, and model training and evaluation.
Number of peptides in training and testing datasets among different organisms.
| Organisms | Training Dataset | Testing Dataset | ||
|---|---|---|---|---|
| Positive | Negative | Positive | Negative | |
| Amphibia | 741 | 1595 | 185 | 398 |
| Bacteria | 345 | 6040 | 86 | 1509 |
| Fish | 95 | 1469 | 23 | 367 |
| Human | 186 | 6595 | 46 | 1648 |
| Insects | 220 | 1800 | 54 | 450 |
| Mammals | 448 | 6919 | 111 | 1729 |
| Plants | 364 | 5432 | 90 | 1358 |
Physicochemical properties and groupings of amino acids [13].
| Physicochemical Properties | Group | ||
|---|---|---|---|
| Class 1 | Class 2 | Class 3 | |
| Charge | Positive | Neutral | Negative |
| Hydrophobicity | Polar | Neutral | Hydrophobic |
| Polarity | Polarity value 4.9~6.2 | Polarity value 8.0~9.2 | Polarity value 10.4~13 |
| Polarizability | Polarizability value 0~0.108 | Polarizability value | Polarizability value 0.219~0.409 |
| Secondary Structure | Helix | Strand | Coil |
| Normalized van der Waals volume | Volume range 0~2.78 | Volume range 2.95~4.0 | Volume range 4.03~8.08 |
| Solvent accessibility | Buried | Exposed | Intermediate |