| Literature DB >> 27490990 |
Abid Qureshi1, Gazaldeep Kaur1, Manoj Kumar1.
Abstract
Viral infections constantly jeopardize the global public health due to lack of effective antiviral therapeutics. Therefore, there is an imperative need to speed up the drug discovery process to identify novel and efficient drug candidates. In this study, we have developed quantitative structure-activity relationship (QSAR)-based models for predicting antiviral compounds (AVCs) against deadly viruses like human immunodeficiency virus (HIV), hepatitis C virus (HCV), hepatitis B virus (HBV), human herpesvirus (HHV) and 26 others using publicly available experimental data from the ChEMBL bioactivity database. Support vector machine (SVM) models achieved a maximum Pearson correlation coefficient of 0.72, 0.74, 0.66, 0.68, and 0.71 in regression mode and a maximum Matthew's correlation coefficient 0.91, 0.93, 0.70, 0.89, and 0.71, respectively, in classification mode during 10-fold cross-validation. Furthermore, similar performance was observed on the independent validation sets. We have integrated these models in the AVCpred web server, freely available at http://crdd.osdd.net/servers/avcpred. In addition, the datasets are provided in a searchable format. We hope this web server will assist researchers in the identification of potential antiviral agents. It would also save time and cost by prioritizing new drugs against viruses before their synthesis and experimental testing.Entities:
Keywords: zzm321990QSARzzm321990; algorithm; antiviral compounds; drug design; inhibition; prediction
Mesh:
Substances:
Year: 2016 PMID: 27490990 PMCID: PMC7162012 DOI: 10.1111/cbdd.12834
Source DB: PubMed Journal: Chem Biol Drug Des ISSN: 1747-0277 Impact factor: 2.817
Creation of datasets for the development of prediction models
| S. no. | Virus | Overall data | Data filter | ||
|---|---|---|---|---|---|
| Percent inhibition | Reference | Non‐redundant | |||
| 1 | Human immunodeficiency virus (HIV) | 1383 | 594 | 535 | 389 |
| 2 | Hepatitis C virus (HCV) | 803 | 648 | 618 | 467 |
| 3 | Hepatitis B virus (HBV) | 416 | 284 | 283 | 112 |
| 4 | Human herpesvirus (HHV) | 473 | 312 | 278 | 124 |
| 5 | General (26 viruses) | 5684 | 2662 | 1635 | 1391 |
Data from ChEMBL were filtered, and only compounds with [1] percent inhibition, [2] reference, and [3] non‐redundant SMILES were considered.
The general dataset is comprised of below viruses with unique number of AVCs in brackets: Dengue virus 1,1 dengue virus 2,16 enterovirus,30 human adenovirus 5,41 human cox B1,4 human cox B5,21 human echovirus 13,3 human echovirus 9,2 human enterovirus 71,19 human enterovirus C,1 human polio virus 1,4 human rhinovirus,1 human rhinovirus 14,29 human rhinovirus 1B,18 human rhinovirus 2,2 human T lymphotropic virus,42 influenza A,36 influenza A (H1N1),16 influenza B,1 monkeypox virus,1 respiratory syncytial virus,4 Rift Valley fever virus (Cercopithecidae),1 sandfly fever Sicilian virus,2 SARS coronavirus,23 simian virus 40,45 Sindbis virus,4 vaccinia virus,12 vaccinia virus WR,22 variola virus,1 vesicular stomatitis virus,63 West Nile virus,17 yellow fever virus.51
Figure 1Schematic diagram demonstrating workflow of AVCpred
Pearson correlation values obtained for each viral dataset on their respective QSAR models
| S. no. | Virus | Antiviral compounds | No. of selected descriptors | Pearson's correlation coefficient (PCC) | |||
|---|---|---|---|---|---|---|---|
| Total | Training | Validation | Training | Validation | |||
| (10x) | |||||||
| 1 | Human immunodeficiency virus (HIV) | 389 | 351 | 38 | 45 | 0.72 | 0.63 |
| 2 | Hepatitis C virus (HCV) | 467 | 421 | 46 | 52 | 0.74 | 0.65 |
| 3 | Hepatitis B virus (HBV) | 112 | 101 | 11 | 15 | 0.66 | 0.61 |
| 4 | Human herpesvirus (HHV) | 124 | 112 | 12 | 20 | 0.68 | 0.64 |
| 5 | General (26 viruses) | 1391 | 1252 | 139 | 65 | 0.71 | 0.67 |
10‐fold cross‐validation.
The general dataset is comprised of below viruses with unique number of AVCs in brackets: Dengue virus 1,1 dengue virus 2,16 enterovirus,30 human adenovirus 5,41 human cox B1,4 human cox B5,21 human echovirus 13,3 human echovirus 9,2 human enterovirus 71,19 human enterovirus C,1 human polio virus 1,4 human rhinovirus,1 human rhinovirus 14,29 human rhinovirus 1B,18 human rhinovirus 2,2 human T lymphotropic virus,42 influenza A,36 influenza A (H1N1),16 influenza B,1 monkeypox virus,1 respiratory syncytial virus,4 Rift Valley fever virus (Cercopithecidae),1 sandfly fever Sicilian virus,2 SARS coronavirus,23 simian virus 40,45 Sindbis virus,4 vaccinia virus,12 vaccinia virus WR,22 variola virus,1 vesicular stomatitis virus,63 West Nile virus,17 yellow fever virus.51
Figure 2Scatter plot between actual and predicted percentage inhibition on independent validation datasets of (A) HIV, (B) HCV, (C) HBV, (D) HHV, and (E) general (26 viruses)
Performance of QSAR models obtained for each viral dataset using classification mode of machine learning
| S. no. | Virus | Training/Testing (10‐fold) | Validation | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | Sensitivity | Specificity | Accuracy | MCC | ||
| 1 | HIV | 94.30 | 96.40 | 95.10 | 0.91 | 88.10 | 82.30 | 86.10 | 0.70 |
| 2 | HCV | 96.90 | 96.40 | 96.60 | 0.93 | 86.61 | 87.20 | 86.80 | 0.73 |
| 3 | HBV | 87.10 | 81.60 | 85.80 | 0.70 | 87.20 | 80.40 | 84.30 | 0.69 |
| 4 | HHV | 93.40 | 92.30 | 93.50 | 0.89 | 87.10 | 91.30 | 88. 6 | 0.77 |
| 5 | General (26 viruses) | 88.30 | 82.20 | 85.70 | 0.71 | 81.70 | 82.10 | 81.90 | 0.64 |
Machine learning parameters selected for the development of the QSAR models
| S. no. | Model | Parameters | |||||||
|---|---|---|---|---|---|---|---|---|---|
| SMOreg | SVMlight | ||||||||
| Kernel | Optimizer |
| ω | σ | Kernel |
|
| ||
| 1 | HIV | Puk | RegSMOImproved | 4 | 2 | 3 | RBF | 0.02 | 200 |
| 2 | HCV | Puk | RegSMOImproved | 5 | 5 | 5 | RBF | 0.02 | 50 |
| 3 | HBV | Puk | RegSMOImproved | 0.1 | 0.3 | 0.3 | RBF | 0.001 | 300 |
| 4 | HHV | Puk | RegSMOImproved | 3 | 3 | 3 | RBF | 0.1 | 100 |
| 5 | General (26 viruses) | Puk | RegSMOImproved | 3 | 3 | 5 | RBF | 0.01 | 50 |
Abbreviations: Puk: Pearson VII function‐based universal kernel. RegSMOImproved: optimizer for algorithm speed improvement. c: regularization constant/complexity parameter allows trade‐off between training error and margin. ω: omega exponent value (controls half‐width of the peak) σ: sigma bandwidth value (controls tailing factor of the peak). RBF: radial basis function g: parameter gamma in RBF kernel.
Figure 3ROC curves depicting performance of QSAR models for (A) HIV, (B) HCV, (C) HBV, (D) HHV, and (E) general (26 viruses)
Figure 4AVCpred submission form with output
Figure 5Web interface of ‘ pred Draw’ tool
Existing QSAR studies pertaining to antiviral compounds
| S. no. | Compound type | No. of compounds | Correlation | Target virus | Web server/Software | Year | References |
|---|---|---|---|---|---|---|---|
| 1 | PA endonuclease inhibitors | 40 | 0.76 | INFV | No | 2014 |
|
| 2 | Thiourea derivatives | 85 | 0.92 | HCV | No | 2013 |
|
| 3 | Integrase inhibitors | 77 | 0.98 | HIV | No | 2012 |
|
| 4 | Three different series of HBV inhibitors | 30 | 0.92 | HBV | No | 2010 |
|
| 5 | HIV‐1 entry inhibitors | 36 | 0.72 | HIV | No | 2010 |
|
| 6 | Neuraminidase flavonoid inhibitors | 20 | 0.75–0.97 | H1N1 | No | 2010 |
|
| 7 | Protease inhibitors | 170 | 0.6–0.83 | HIV | No | 2010 |
|
| 8 | HPV6‐E1 helicase ATPase inhibitors | Full text not available | 0.92 | HPV | No | 2010 |
|
| 9 | Thymidine kinase N2‐phenylguanine inhibitors | 20 | 0.85–0.98 | HSV | No | 2000 |
|
Figure 6Applicability domain plots of the QSAR models for (A) HIV, (B) HCV, (C) HBV, (D) HHV, and (E) general (26 viruses)