| Literature DB >> 30522443 |
Kyle Boone1, Kyle Camarda2, Paulette Spencer3, Candan Tamerler4.
Abstract
BACKGROUND: Antimicrobial peptides attract considerable interest as novel agents to combat infections. Their long-time potency across bacteria, viruses and fungi as part of diverse innate immune systems offers a solution to overcome the rising concerns from antibiotic resistance. With the rapid increase of antimicrobial peptides reported in the databases, peptide selection becomes a challenge. We propose similarity analyses to describe key properties that distinguish between active and non-active peptide sequences building upon the physicochemical properties of antimicrobial peptides. We used an iterative supervised machine learning approach to classify active peptides from inactive peptides with low false discovery rates in a relatively short computational search time.Entities:
Keywords: Antibacterial peptides; Classification; Functional peptide search; Machine learning; Physicochemical properties; Rough set theory; Sequence similarity; Supervised learning
Mesh:
Substances:
Year: 2018 PMID: 30522443 PMCID: PMC6282327 DOI: 10.1186/s12859-018-2514-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Schematic Data table representing the training data set before feature correlation analysis. The three sections of the table are the sequences from iAMP-2 L training set [50], the features derived from the 544 amino acid properties in the AAindex1 [63], and the classification label of antibacterial activity from the positive or negative training data set. an denotes a sequence, bn indicates the sum of the sequence for an AAindex1 property, cn indicates the mean and dn indicates the maximum sum of three adjacent residues in the sequence
| Sequence | Sum of | Mean of | Window of | Antibacterial Activity |
|---|---|---|---|---|
| A1 …A544 | A1 …A544 | A1 …A544 | ||
| a1 | (b1)1…(b1)544 | (c1)1…(c1)544 | (d1)1…(d1)544 | Active |
| … | … | … | … | Active |
| a1274 | (b1274)1…(b1274)544 | (c1274)1…(c1274)544 | (d1274)1…(d1274)544 | Active |
| a1,275 | (b1,275)1…(b1,275)544 | (c1,275)1…(c1,275)544 | (d1,275)1…(d1,275)544 | Inactive |
| … | … | … | … | Inactive |
| a 2714 | (b 2714)1…(b 2714)544 | (c 2714)1…(c 2714)544 | (d 2714)1…(d 2714)544 | Inactive |
Fig. 1Rough Set Theory Rule Generation. A) Venn diagram of active and inactive peptides. A rule (R1) is the intersection of conditions (C1and C2). Each rule must be selective for either active or inactive peptides. The minimum accuracy allowed for a rule is a user-defined parameter α. B) Venn diagram showing multiple rules as the intersection of conditions in 2-D space. The selection of conditions that lead to rules is a feature selection process that chooses the most relevant conditions to describe the physicochemical boundaries. A rule set is the collection of all rules describing the boundaries for either activity or inactivity
Fig. 2Auto-Correlation and Selection of AAindex1 Properties. a Auto-correlation plot of 544 different AAindex1 properties. Magenta represents positive correlation, cyan represents negative correlation and white represents the lack of correlation between properties. b Remaining number of AAindex1 properties following filtering by cut-off value for the absolute value of correlation
Rough set theory rules generated with maximum support from large training dataset. The first rule describes antibacterial sequences. The accuracy of this rule is 97.8% (446/456) for the peptides that met the conditions from either the dataset from Xiao, et al [50] or the dataset from Fernandes, et al. [64]
| Calculation | AAindex1 Property | Lower Value | Upper Value |
|---|---|---|---|
| Window 3 | NAKH900111 | 31.21 | 48.66 |
| Window 3 | FINA910104 | 3.45 | 5.10 |
| Window 3 | KUMS000101 | 23.6 | 28.20 |
| Sum | GEIM800102 | 12.68 | 39.90 |
| Window 3 | VASM830102 | 1.67 | 2.12 |
| Window 3 | QIAN880139 | 0.38 | 0.98 |
| Sum | FAUJ880112 | 0 | 3 |
| Sum | CHAM820102 | −0.61 | 19.51 |
Performance of rough set theory rule induction compared to motif-search in 10-fold cross validation
| Method | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|
| 5-kmer SVM | 75.7 | 75.0 | 0.54 |
| 6-kmer SVM | 74.8 | 74.1 | 0.46 |
| 7-kmer SVM | 73.0 | 72.4 | 0.40 |
| 8-kmer SVM | 73.0 | 72.4 | 0.36 |
| EFC-FCBF | 87.1 | 87.2 | 0.76 |
| CLN-MLEM2 | 86.9 | 86.3 | 0.75 |
Performance comparison among prediction servers for antimicrobial peptides, a motif-based classification method and rough set theory approach
| Method | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|
| CAMP SVM | 95.8 | 39.8 | 0.43 |
| CAMP RF | 97.1 | 33.5 | 0.40 |
| CAMP ANN | 89.1 | 70.9 | 0.61 |
| CAMP DA | 94.1 | 49.5 | 0.49 |
| iAMP-2 L | 97.7 | 92.0 | 0.90 |
| EFC-FCBF | 92.0 | 90.0 | 0.73 |
| EFC + 307-FCBF | 92.4 | 96.1 | 0.86 |
| CLN-MLEM2 | 88.0 | 95.4 | 0.85 |
Fig. 3False discovery rates of comparative antimicrobial peptide classification methods. CLN-MLEM2 achieves a low false discovery rate among currently available antimicrobial peptide classification methods
Fig. 4CLN-MLEM2 Method. CLN-MLEM2 Rule induction process based on rough set theory approach to classify peptides with antibacterial activity
Data table consists of six selected sequences with two features
| Sequence | Sum of Sum of FAUJ880111 | Sum of Sum of FAUJ880112 | Antibacterial Activity |
|---|---|---|---|
| FFPVIGRILNGIL | 1 | 0 | Active |
| KFHEKHHSHRGY | 3 | 1 | Active |
| GNNRPVYIPQPRPPHPRL | 3 | 0 | Active |
| QDVDHVFLRF | 1 | 2 | Inactive |
| QQDYTGWMDF | 0 | 1 | Inactive |
| QLTFTSSWG | 0 | 0 | Inactive |