| Literature DB >> 33159146 |
Abu Sayed Chowdhury1, Sarah M Reehl2, Kylene Kehn-Hall3,4,5, Barney Bishop6, Bobbie-Jo M Webb-Robertson7.
Abstract
The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR .Entities:
Mesh:
Substances:
Year: 2020 PMID: 33159146 PMCID: PMC7648056 DOI: 10.1038/s41598-020-76161-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
List of 649 peptide features.
| Peptide feature | Feature dimension |
|---|---|
| Amino acid composition | 20 |
| Dipeptide composition | 400 |
| Pseudo-amino acid composition | 25 |
| Amphiphilic pseudo-amino acid composition | 30 |
| Composition/transition/distribution | 168 |
| Secondary structure sequence | 6 |
Performance comparison of our models with existing models on independent validation data.
| Model | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC |
|---|---|---|---|---|
| FIRM-AVP (SVM) | 93.3 | 91.1 | 92.4 | 0.84 |
| FIRM-AVP (RF) | 95.0 | 82.2 | 89.5 | 0.79 |
| FIRM-AVP (DL) | 91.7 | 80.0 | 86.7 | 0.73 |
| AVP-649D (SVM) | 95.0 | 82.2 | 89.5 | 0.79 |
| AVP-649D (RF) | 90.0 | 82.2 | 86.7 | 0.73 |
| AVPcompo | 83.3 | 88.9 | 85.7 | 0.72 |
| AVPphysico | 88.3 | 82.2 | 85.7 | 0.71 |
| RFcompo + structure + agg | 91.7 | 86.7 | 89.5 | 0.79 |
| Meta-iAVP | 95.2 | 96.7 | 93.2 | 0.90 |
Top-5 features obtained in SVM and RF methods from RFE analysis.
| Feature rank | Features for SVM | Features for RF |
|---|---|---|
| 1 | Distribution (25% residues) feature for positive charge (group 1) | |
| 2 | Composition feature for intermediate solvent accessibility (group 3) | |
| 3 | PseAAC feature for | |
| 4 | ||
| 5 | Composition feature for neutral hydrophobicity (group 2) |
Common features are highlighted in bold.
Figure 1Online FIRM-AVP software interface (https://msc-viz.emsl.gov/AVPR/). Where (A) is the starting page that allows users to either paste in a single peptide sequence or upload a FASTA file containing a collection of peptide sequences. Example sequences and files are given. (B) The probability of AVP versus non-AVP is returned for each sequence based on the pasted peptide sequence or the uploaded FASTA file.