| Literature DB >> 26213387 |
Abid Qureshi1, Himani Tandon1, Manoj Kumar1.
Abstract
Peptide-based antiviral therapeutics has gradually paved their way into mainstream drug discovery research. Experimental determination of peptides' antiviral activity as expressed by their IC50 values involves a lot of effort. Therefore, we have developed "AVP-IC50 Pred," a regression-based algorithm to predict the antiviral activity in terms of IC50 values (μM). A total of 759 non-redundant peptides from AVPdb and HIPdb were divided into a training/test set having 683 peptides (T(683)) and a validation set with 76 independent peptides (V(76)) for evaluation. We utilized important peptide sequence features like amino-acid compositions, binary profile of N8-C8 residues, physicochemical properties and their hybrids. Four different machine learning techniques (MLTs) namely Support vector machine, Random Forest, Instance-based classifier, and K-Star were employed. During 10-fold cross validation, we achieved maximum Pearson correlation coefficients (PCCs) of 0.66, 0.64, 0.56, 0.55, respectively, for the above MLTs using the best combination of feature sets. All the predictive models also performed well on the independent validation dataset and achieved maximum PCCs of 0.74, 0.68, 0.59, 0.57, respectively, on the best combination of feature sets. The AVP-IC50 Pred web server is anticipated to assist the researchers working on antiviral therapeutics by enabling them to computationally screen many compounds and focus experimental validation on the most promising set of peptides, thus reducing cost and time efforts. The server is available at http://crdd.osdd.net/servers/ic50avp.Entities:
Keywords: IC50; antiviral; machine learning; peptide; prediction
Mesh:
Substances:
Year: 2015 PMID: 26213387 PMCID: PMC7161829 DOI: 10.1002/bip.22703
Source DB: PubMed Journal: Biopolymers ISSN: 0006-3525 Impact factor: 2.505
Performance Evaluation During 10‐Fold Cross Validation
| S. No. | Feature | No. of Features | PCC | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Training/Testing, T683 (10×) | Validation, V76 | |||||||||
| SVM | RF | IBk | K* | SVM | RF | IBk | K* | |||
| 1 | Amino acid composition (Mono) | 20 | 0.59 | 0.61 | 0.44 | 0.41 | 0.64 | 0.64 | 0.42 | 0.41 |
| 2 | Di‐peptide composition (Di) | 400 | 0.61 | 0.60 | 0.47 | 0.43 | 0.66 | 0.62 | 0.47 | 0.45 |
| 3 | C8 Binary profile (C8 Bin) | 160 | 0.56 | 0.57 | 0.45 | 0.42 | 0.59 | 0.60 | 0.43 | 0.41 |
| 4 | N8 Binary profile (N8 Bin) | 160 | 0.51 | 0.54 | 0.45 | 0.43 | 0.48 | 0.60 | 0.45 | 0.43 |
| 5 | Physicochemical properties (Physico) | 315 | 0.59 | 0.54 | 0.46 | 0.44 | 0.63 | 0.68 | 0.46 | 0.45 |
| 6 | Solvent accessibility (SA) | 21 | 0.22 | 0.20 | 0.18 | 0.19 | 0.21 | 0.18 | 0.15 | 0.16 |
| 7 | Secondary structure (SS) | 3 | 0.18 | 0.18 | 0.16 | 0.17 | 0.19 | 0.16 | 0.17 | 0.18 |
| 8 | 1 + 2 | 420 | 0.60 | 0.61 | 0.47 | 0.45 | 0.67 | 0.62 | 0.48 | 0.48 |
| 9 | 3 + 4 | 320 | 0.59 | 0.62 | 0.51 | 0.48 | 0.62 | 0.65 | 0.52 | 0.50 |
| 10 | 1 + 2+5 | 735 | 0.63 | 0.61 | 0.52 | 0.51 | 0.70 | 0.64 | 0.54 | 0.51 |
| 11 | 3 + 4+5 | 635 | 0.63 | 0.60 | 0.51 | 0.50 | 0.72 | 0.67 | 0.52 | 0.50 |
| 12 | 1 + 2+3 + 4 | 740 | 0.61 | 0.62 | 0.51 | 0.49 | 0.67 | 0.63 | 0.51 | 0.50 |
| 13 | 1 + 2+3 + 4+5 | 1055 | 0.62 | 0.61 | 0.50 | 0.51 | 0.66 | 0.64 | 0.54 | 0.53 |
| 14 | 6 + 7 | 23 | 0.22 | 0.20 | 0.18 | 0.21 | 0.23 | 0.19 | 0.20 | 0.18 |
| 15 | 1 + 2+5 + 6+7 | 758 | 0.66 | 0.63 | 0.55 | 0.54 | 0.74 | 0.68 | 0.59 | 0.57 |
| 16 | 3 + 4+5 + 6+7 | 658 | 0.65 | 0.64 | 0.56 | 0.55 | 0.73 | 0.70 | 0.58 | 0.56 |
10‐Fold cross validation performance of predictive models on AVP dataset of 683 sequences (T683) and evaluation of performance of predictive models on validation dataset of 76 peptides (V76) using SVM, RF, IBk, and K* MLTs.
Abbreviations: SVM: support vector machine; RF: random forest; IBk: instance‐based classifier (Weka); K*: KStar (Weka); T685: Training dataset of 683 AVPs; 10×: 10‐fold cross validation; V76: independent dataset of 76 AVPs.
Figure 1Correlation between actual and predicted IC50 values of the independent dataset using (a) SVM and (b) RF.
Performance of the SVM Model for Each Virus in the 759 AVP Dataset Using LOVOCV Method
| S. No. | Virus | Abbreviation | No. of Peptides | PCC | ||
|---|---|---|---|---|---|---|
| Training | Validation | Training | Validation | |||
| 1 | Hepatitis C virus | HCV | 635 | 124 | 0.55 | 0.8 |
| 2 | SARS coronavirus | SARS‐CoV | 733 | 26 | 0.58 | 0.53 |
| 3 | Porcine reproductive and respiratory syndrome Virus | PRRSV | 746 | 13 | 0.58 | 0.53 |
| 4 | Hepatitis B virus | HBV | 747 | 12 | 0.58 | 0.53 |
| 5 | Dengue 2 virus | DENV 2 | 752 | 7 | 0.58 | 0.53 |
| 6 | Newcastle disease virus | NDV | 752 | 7 | 0.58 | 0.53 |
| 7 | Transmissible gastroenteritis virus | TGEV | 756 | 3 | 0.64 | 0.53 |
| 8 | West Nile virus | WNV | 756 | 3 | 0.63 | 0.53 |
| 9 | Human papillomavirus | HPV | 753 | 6 | 0.59 | 0.52 |
| 10 | Human metapneumovirus | hMPV | 754 | 5 | 0.68 | 0.52 |
| 11 | Human parainfluenza virus type 3 | HPIV 3 | 734 | 25 | 0.57 | 0.51 |
| 12 | HSV 2 | HSV 2 | 754 | 5 | 0.62 | 0.51 |
| 13 | Hendra Virus | HeV | 755 | 4 | 0.63 | 0.51 |
| 14 | Human cytomegalovirus | HCMV | 755 | 4 | 0.65 | 0.5 |
| 15 | Marek's disease virus | MDV | 754 | 5 | 0.61 | 0.49 |
| 16 | Dengue 1 virus | DENV 1 | 756 | 3 | 0.61 | 0.49 |
| 17 | Feline immunodeficiency virus | FIV | 730 | 29 | 0.55 | 0.46 |
| 18 | Measles virus | MV | 739 | 20 | 0.57 | 0.45 |
| 19 | Human T‐cell leukemia virus 1 | HTLV 1 | 753 | 6 | 0.65 | 0.41 |
| 20 | HSV 1 | HSV 1 | 729 | 30 | 0.58 | 0.2 |
| 21 | Influenza A virus | INFV A | 720 | 39 | 0.59 | 0.15 |
| 22 | Human immunodeficiency virus | HIV | 464 | 295 | 0.6 | 0.13 |
| 23 | Respiratory syncytial virus | RSV | 694 | 65 | 0.58 | 0.02 |
| 24 | Others | Oth | 736 | 23 | 0.65 | 0.51 |
Other viruses include: ASLV‐A, JV, SeV, VACV, AIV, AMV, ASFV, BKV, BoHV 1, BRV, DENV 4, EBoV, HPIV 2, INFV B, JEV, LCMV, MHV, NiV, and SNV.
Performance of the SVM Models Using LOOCV Method on Virus Specific Datasets
| S. No. | Feature | No. of features | PCC | |||
|---|---|---|---|---|---|---|
| Training/Testing | Validation | |||||
| HIV | HCV | HIV | HCV | |||
| 1 | Amino acid composition (Mono) | 20 | 0.54 | 0.56 | 0.51 | 0.52 |
| 2 | Di‐peptide composition (Di) | 400 | 0.56 | 0.58 | 0.51 | 0.53 |
| 3 | C8/N8 Binary profile (C8/N8 Bin) | 160 | 0.57 | 0.57 | 0.54 | 0.55 |
| 4 | Physicochemical properties (Physico) | 315 | 0.55 | 0.55 | 0.51 | 0.51 |
| 5 | 1 + 2+3 | 735 | 0.58 | 0.64 | 0.54 | 0.61 |
| 6 | 1 + 2+4 | 635 | 0.60 | 0.67 | 0.54 | 0.58 |
| 7 | 1 + 2+3 + 4 | 740 | 0.60 | 0.65 | 0.53 | 0.63 |
Figure 2Two sample logo (TSL) comparison. TSLs Two sample logos of a) 8‐N terminal and b) 8‐C terminal residues of 97 highly effective peptides (IC50 < 1 µM) and an equal number of least effective peptides (IC50 > 100 µM).
Figure 3AVP‐IC50 Pred result output.
Mutational Analysis
| Peptide | Length | Mutation Position | IC50 (µM) | Fold Change | PubMed ID |
|---|---|---|---|---|---|
| YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF (Enfuvirtide/T‐20) | 36 | No Mutation | 7.57 | – | 19949052 |
| YTSLIHSLIAESQNQQEKNEQELLELDKWASLWNWF | 36 | E10A | 0.01 | 757.0 | |
| YTSLIHSLIEASQNQQEKNEQELLELDKWASLWNWF | 36 | E11A | 0.02 | 378.5 | |
| YTSLIHSLIEESQNQQVKNEQELLELDKWASLWNWF | 36 | E17V | 0.03 | 252.3 | |
| YTSLIHSLIEESQNQQEKNDQELLELDKWASLWNWF | 36 | E20D | 0.03 | 252.3 | |
| YTSLIHSLIEESQNQQEKNEQELLELDKWASLWGWF | 36 | N34G | 0.03 | 252.3 | |
| KVINPEPIVEPFMSKPFALF (Scr alpha1‐antitrypsin peptide) | 20 | No Mutation | 100.00 | – | 17448989 |
| KVINPEPIVEPFMSKPFLLF | 20 | A18L | 5.24 | 19.1 | |
| LVINPEPIVEPFMSKPFALF | 20 | K1L | 7.38 | 13.6 | |
| KVINPEPIVEPFMSKPFVLF | 20 | A18V | 7.46 | 13.4 | |
| KVINPEPIVEPFMSLPFALF | 20 | K15L | 7.95 | 12.6 | |
| KVILPEPIVEPFMSKPFALF | 20 | M4L | 14.55 | 6.9 | |
| GLQLLGFILAFLGWIGAI (CL58.1 peptide) | 18 | No Mutation | 25.00 | – | 22378192 |
| GLQLLYFILAFLGWIGAI | 18 | G6Y | 2.71 | 9.2 | |
| GLQLLGFILAYLGWIGAI | 18 | F11Y | 2.99 | 8.4 | |
| GLQLLGFILAFLGWIGAY | 18 | I18Y | 2.99 | 8.4 | |
| GLQLLGFILAFLRWIGAI | 18 | G13R | 3.25 | 7.7 | |
| GLQLLGFILAFLGWIYAI | 18 | G16Y | 3.5 | 7.1 | |
| MANAGLQLLGFILAFLGWIG (peptide CL58‐2) | 20 | No Mutation | 17.8 | – | 22378192 |
| MANAGLQLLGFILAFLGWIV | 20 | I20V | 0.46 | 38.7 | |
| MANAGLQLLGFILAFLGWIR | 20 | I20R | 0.51 | 34.9 | |
| MANAGLQLLRFILAFLGWIG | 20 | G10R | 0.9 | 19.8 | |
| MANAGLQLLGFILAFLRWIG | 20 | G17R | 0.99 | 18.0 | |
| MANAGLQLLGFILAFLVWIG | 20 | G17V | 1.07 | 16.6 |
Mutational analysis of different AVPs showing the top five best performing mutations and their predicted IC50 values from each peptide.