| Literature DB >> 29374199 |
Pratiti Bhadra1, Jielu Yan1, Jinyan Li1, Simon Fong1, Shirley W I Siu2.
Abstract
Antimicrobial peptides (Entities:
Mesh:
Substances:
Year: 2018 PMID: 29374199 PMCID: PMC5785966 DOI: 10.1038/s41598-018-19752-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
A comparison of four RF classifiers using different feature sets by 10-fold cross-validation with the AMP data ratio of 1:1. Values shown are the mean and standard deviation (in parentheses).
| Feature set {#} |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| CTD {147} | 0.979 (0.002) | 0.944 (0.004) | 0.961 (0.002) | 0.924 (0.005) | 0.988 (0.001) | 0.698 (0.023) | 0.923 (0.005) |
| C {21} | 0.958 (0.002) | 0.943 (0.004) | 0.950 (0.002) | 0.901 (0.004) | 0.983 (0.001) | 0.747 (0.018) | 0.901 (0.004) |
| T {21} | 0.959 (0.002) | 0.943 (0.004) | 0.951 (0.002) | 0.901 (0.004) | 0.983 (0.001) | 0.745 (0.018) | 0.901 (0.005) |
| D {105} | 0.978 (0.002) | 0.945 (0.004) | 0.962 (0.002) | 0.924 (0.004) | 0.988 (0.001) | 0.698 (0.024) | 0.923 (0.005) |
Figure 1Pearson correlation coefficients (PCCs) between AMP and non-AMP distributions of the same descriptor in the Mmodel_train dataset.
Figure 2Performance of RF classifiers during 10-fold cross-validation on datasets with different AMP/non-AMP ratios.
Performance comparison of classifiers trained with P:N ratio of 1:1 and 1:3 against classifiers with applied SMOTE data rebalancing technique (k = 5) on initial data ratio of 1:3 and 1:10. Values shown were obtained from 10-fold cross validation.
| Method |
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| AmPEP (1:1) |
| 0.945 | 0.962 |
| 0.988 | 0.698 |
| 0.588 |
| AmPEP (1:3) | 0.950 | 0.965 | 0.962 | 0.900 | 0.989 | 0.830 | 0.899 |
|
| AmPEP with SMOTE (1:3) | 0.957 | 0.966 | 0.964 | 0.905 |
| 0.817 | 0.905 | 0.663 |
| AmPEP with SMOTE (1:10) | 0.858 |
|
| 0.835 |
|
| 0.835 | 0.594 |
Performance of four RF classifiers involving different feature subsets during 10-fold cross-validation at the AMP/non-AMP data ratio of 1:3. Values shown are the mean and standard deviation (in parentheses). Each experiment uses all AMP data and a set of non-AMP data randomly drawn without replacement from the non-AMPs of the dataset. The best-performing models on a particular measure are highlighted.
| Feature set {#} |
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| DF {105} | ||||||||
| DF_PCC<0.7 {80} | 0.961 (0.002) | 0.819 (0.011) | ||||||
| DF_PCC<0.6 {43} | 0.961 (0.002) | 0.899 (0.005) | 0.793 (0.008) | 0.633 (0.008) | ||||
| DF_PCC<05 {23} | 0.949 (0.004) | 0.961 (0.002) | 0.898 (0.005) | 0.988 (0.001) | 0.779 (0.011) | 0.898 (0.005) | 0.620 (0.009) |
A comparison of RF classifiers using different descriptors by 10-fold cross-validation with the AMP data ratio of 1:3. Values shown are averages and standard deviations (in brackets) over 10 times of 10-fold cross validation.
| Feature set {#} |
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| AmPEP {105} | 0.965 (0.002) | 0.830 (0.009) | 0.665 (0.006) | |||||
| AmPEP {23} | 0.965 (0.002) | 0.779 (0.011) | 0.620 (0.009) | |||||
| AAC {20} | 0.910 (0.002) | 0.956 (0.001) | 0.881 (0.002) | 0.862 (0.002) | 0.881 (0.002) | 0.662 (0.004) | ||
| PAAC {24} | 0.910 (0.002) | 0.970 (0.000) | 0.955 (0.000) | 0.881 (0.001) | 0.985 (0.000) | 0.891 (0.001) | 0.881 (0.001) | 0.681 (0.002) |
| K-mer {400} | 0.898 (0.002) | 0.953 (0.001) | 0.875 (0.002) | 0.985 (0.000) | 0.875 (0.002) |
| ||
| Auto Covariance (AC) {6} | 0.613 (0.003) | 0.942 (0.001) | 0.860 (0.001) | 0.604 (0.003) | 0.874 (0.001) | 0.742 (0.003) | 0.597 (0.003) | 0.234 (0.003) |
| Cross Covariance (CC) {12} | 0.661 (0.004) | 0.949 (0.001) | 0.877 (0.001) | 0.657 (0.004) | 0.905 (0.001) | 0.769 (0.002) | 0.651 (0.004) | 0.298 (0.004) |
| Auto-Cross Covariance (ACC) {18} | 0.710 (0.003) | 0.951 (0.001) | 0.891 (0.001) | 0.698 (0.003) | 0.922 (0.001) | 0.825 (0.002) | 0.695 (0.003) | 0.369 (0.004) |
| Parallel Correlation Pseudo Amino Acid Composition (PC-PseAAC) {22} | 0.908 (0.003) | 0.955 (0.001) | 0.881 (0.002) | 0.985 (0.000) | 0.884 (0.005) | 0.881 (0.002) | 0.676 (0.006) | |
| Series Correlation Pseudo Amino Acid Composition (SC-PseAAC) {26} | 0.907 (0.002) | 0.955 (0.001) | 0.880 (0.002) | 0.985 (0.000) | 0.882 (0.004) | 0.880 (0.002) | 0.673(0.005) | |
| General Parallel Correlation Pseudo Amino Acid Composition (PC-PseAAC-General){22} | 0.909 (0.002) | 0.970 (0.000) | 0.955 (0.001) | 0.880 (0.002) | 0.985 (0.000) | 0.880 (0.002) |
| |
| Parallel Series Correlation Pseudo Amino Acid Composition (SC-PseAAC-General) {26} | 0.908 (0.002) | 0.970 (0.001) | 0.955 (0.001) | 0.879 (0.002) | 0.985 (0.000) | 0.894 (0.003) | 0.879 (0.002) | 0.680 (0.003) |
The best two results in each performance measure are highlighted. AAC: Amino Acid Composition, PAAC: Pseudo Amino Acid Composition AAC and PseAAC were generated using propy 1.0 package (default parameter of propy is used). Other descriptors, K-mer, AC, CC, ACC, PC-PseAAC, SC-PseAAC, PC-PseAAC-General, SC-PseAAC-General were generated by Pse-in-One-1.0.4 using default parameters.
A comparison of our AMP prediction method with state-of-the-art methods on AUC-ROC, AUC-PR, MCC, and κ by means of datasets Ctrain and Ctest.
| Method | ML algorithm | Number of features | AUC-ROC | AUC-PR | MCC |
|
|---|---|---|---|---|---|---|
| iAMPpred# | SVM | 66 | 0.98 |
| 0.91 | — |
| iAMP-2L# | FKNN | 46 | 0.95 | — | 0.9 | — |
| AmPEP (DF) | RF | 105 |
| 0.957 |
| 0.962 |
| AmPEP (DF_PCC < 0.7) | RF | 80 | 0.994 | 0.950 | 0.914 | 0.913 |
| AmPEP (DF_PCC < 0.6) | RF | 43 | 0.994 | 0.934 | 0.919 | 0.918 |
| AmPEP (DF_PCC < 0.5) | RF | 23 |
| 0.905 |
| 0.923 |
#Results were taken from refs[5,6].
A summary of the positive and negative datasets. Values in braces are the numbers of sequences collected in that category.
| Dataset | Model Design | Comparative Study | |
|---|---|---|---|
| Training (Mmodel_train) | Benchmark Training (Ctrain) | Benchmark Testing (Ctest) | |
| Positive | APD3, CAMPR3, LAMP {3268} | Xiao {770} | Xiao {920} |
| Negative | UniProt {166791} | Xiao {2405} | Xiao {920} |
Physicochemical properties and groupings of amino acids.
| Property | Grouping | ||
|---|---|---|---|
| Class 1 (C1) | Class 2 (C2) | Class 3 (C3) | |
| Hydrophobicity | Polar R, K, E, D, Q, N | Neutral G, A, S, T, P, H, Y | Hydrophobic |
| Normalized van der Waals volume | Volume range 0-2.78 | Volume range 2.95-4.0N, V, E, Q, I, L | Volume range 4.03-8.08 M, H, K, F, R, Y, W |
| Polarity | Polarity value 4.9-6.2 L, I, F, W, C, M, V, Y | Polarity value 8.0-9.2 P, A, T, G, S | Polarity value 10.4-13 H, Q, R, K, N, E, D |
| Polarizability | Polarizability value 0-0.108 | Polarizability value 0.128-0.186 C, P, N, V, E, Q, I, L | Polarizability value 0.219-0.409 K, M, H, F, R, Y, W |
| Charge | Positive K, R | Neutral A, N, C, Q, G, H, I, L, M, F, P, S, T, W, Y, V | Negative D, E |
| Secondary structure | Helix E, A, L, M, Q, K, R, H | Strand V, I, Y, C, W, F, T | Coil G, N, P, S, D |
| Solvent accessibility | Buried A, L, F, C, G, I, V, W | Exposed P, K, Q, E, N, D | Intermediate M, P, S, T, H, Y |
Figure 3Illustration of the calculations of DF with a sample antibacterial peptide.