| Literature DB >> 32280870 |
Fereshteh Fallah Atanaki1, Saman Behrouzi1, Shohreh Ariaeenejad2, Amin Boroomand3, Kaveh Kavousi1.
Abstract
Biofilms are biological systems that are formed by a community of microorganisms in which microbial cells are connected on a surface within a self-produced matrix of an extracellular polymeric substance. On some occasions, microorganisms use biofilms to protect themselves against the harmful effects of the host body immune system and the surrounding environment, hence increasing their chances of survival against the various anti-microbial agents. Biofilms play a crucial role in medicine and industry because of the problems they cause. Designing agents that inhibit bacterial biofilm formation is very costly and takes too much time in the laboratory to be discovered and validated. Therefore, developing computational tools for the prediction of biofilm inhibitor peptides is inevitable and important. Here, we present a computational prediction tool to screen the vast number of peptide sequences and select potential candidate peptides for further lab experiments and validation. In this learning model, different feature vectors, extracted from the peptide primary structure, are exploited to learn patterns from the sequence of biofilm inhibitory peptides. Various classification algorithms including SVM, random forest, and k-nearest neighbor have been examined to evaluate their performance. Overall, our approach showed better prediction in comparison with other prediction methods. In this study, for the first time, we applied features extracted from NMR spectra of amino acids along with physicochemical features. Although each group of features showed good discrimination potential alone, we used a combination of features to enhance the performance of our method. Our prediction tool is freely available.Entities:
Year: 2020 PMID: 32280870 PMCID: PMC7144140 DOI: 10.1021/acsomega.9b04119
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Comparison of the average AAC between two classes.
Figure 2Comparison of the average dipeptide AAC between two classes.
Presentation of Different Feature Vectors Used in This Study with the Number of Generated Features in Each Set
| feature sets | AAC | DPC | CTD | PCP | NMR | total |
| number of generated features | 20 | 400 | 504 | 15 | 40 | 979 |
Figure 3Power of various features in the right prediction of samples in the first independent data set.
Performance Evaluation Metrics with Different Classification Models Based on the Best 150 Features Selected
| methods | accuracy | sensitivity | specificity | AUC |
|---|---|---|---|---|
| RF | 0.87 | 0.87 | 0.87 | 0.90 |
| kNN | 0.88 | 0.88 | 0.88 | 0.90 |
| Naïve Bayes | 0.85 | 0.87 | 0.85 | 0.91 |
Figure 4Ability of various binary classifier systems in term of ROC curves.
Figure 5Comparison of various evaluation parameters was calculated in all runs for the five-fold cross-validation method. The red points in the figure illustrate the distribution of the different performance measures in the 150 runs.
Comparison of Different Evaluation Metrics in a 5-Fold Cross Validation Method
| evaluation parameters | accuracy | sensitivity | specificity | F1-score | MCC | AUC |
| five-fold cross validation | 0.95 | 0.97 | 0.96 | 0.97 | 0.89 | 0.96 |
Evaluation Parameters Value Among Various BIP Predictive Tools on Our First Independent Data set
| independent data set | predictive tools | sensitivity (%) | specificity (%) | accuracy (%) | MCC (%) |
|---|---|---|---|---|---|
| first independent data set | dPABBs | 70 | 100 | 85 | 33.33 |
| BioFIN | 0 | 100 | 10 | 0 | |
| our model | 90 | 90 | 90 | 80 | |
| second independent data set | dPABBs | 94.44 | 81.25 | 88.82 | 77.19 |
| BioFIN | 0 | 100 | 42.55 | 0 | |
| our model | 98.14 | 85 | 92.55 | 85.03 |
Figure 6BaAMP and APD3 data set used for constructing our positive data set. As shown in the figure, these two data sets had 12 peptide intersections.