| Literature DB >> 35832613 |
Sakshi Kamboj1,2, Akanksha Rajput1, Amber Rastogi1,2, Anamika Thakur1,2, Manoj Kumar1,2.
Abstract
Hepatitis C virus (HCV) infection causes viral hepatitis leading to hepatocellular carcinoma. Despite the clinical use of direct-acting antivirals (DAAs) still there is treatment failure in 5-10% cases. Therefore, it is crucial to develop new antivirals against HCV. In this endeavor, we developed the "Anti-HCV" platform using machine learning and quantitative structure-activity relationship (QSAR) approaches to predict repurposed drugs targeting HCV non-structural (NS) proteins. We retrieved experimentally validated small molecules from the ChEMBL database with bioactivity (IC50/EC50) against HCV NS3 (454), NS3/4A (495), NS5A (494) and NS5B (1671) proteins. These unique compounds were divided into training/testing and independent validation datasets. Relevant molecular descriptors and fingerprints were selected using a recursive feature elimination algorithm. Different machine learning techniques viz. support vector machine, k-nearest neighbour, artificial neural network, and random forest were used to develop the predictive models. We achieved Pearson's correlation coefficients from 0.80 to 0.92 during 10-fold cross validation and similar performance on independent datasets using the best developed models. The robustness and reliability of developed predictive models were also supported by applicability domain, chemical diversity and decoy datasets analyses. The "Anti-HCV" predictive models were used to identify potential repurposing drugs. Representative candidates were further validated by molecular docking which displayed high binding affinities. Hence, this study identified promising repurposed drugs viz. naftifine, butalbital (NS3), vinorelbine, epicriptine (NS3/4A), pipecuronium, trimethaphan (NS5A), olodaterol and vemurafenib (NS5B) etc. targeting HCV NS proteins. These potential repurposed drugs may prove useful in antiviral drug development against HCV.Entities:
Keywords: Antiviral; Drug repurposing; Hepatitis C Virus; Machine learning; NS3-NS3/NS4A-NS5A-NS5B; Non-structural protein; Prediction; QSAR
Year: 2022 PMID: 35832613 PMCID: PMC9271984 DOI: 10.1016/j.csbj.2022.06.060
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Overall methodology used in “Anti-HCV” to develop predictive algorithms to identify inhibitors targeting HCV non-structural proteins – NS3, NS3/4A, NS5A and NS5B. HCV non structural proteins inhibitors were taken from ChEMBL. Molecular descriptors were calculated using PaDEL-descriptor software followed by feature selection using support vector regression (SVR), decision tree regression (DTR) and perceptron method. Selected features were used to develop predictive models using support vector machine (SVM), random forest (RF), k-nearest neighbour (kNN), and artificial neural network (ANN) machine-learning techniques during ten-fold cross validation on training/testing and independent validation datasets. Predictive models performance were assessed along with prediction of repurposed drugs for these NS proteins as well as their structural validation using molecular docking.
The statistical measures of performance of the best predictive models developed for NS3 protein using different machine-learning techniques and selected features during ten-fold cross validation on training/testing and independent validation datasets.
| Support Vector Machine | SVR | gamma:0.0005 C:100 | T408 | 0.40 | 0.64 | 0.72 | 0.86 |
| V46 | 0.22 | 0.47 | 0.83 | 0.92 | |||
| DTR | gamma:0.01 C:10 | T408 | 0.45 | 0.71 | 0.69 | 0.84 | |
| V46 | 0.42 | 0.65 | 0.69 | 0.83 | |||
| PCT | gamma:0.001 C:200 | T408 | 0.43 | 0.65 | 0.71 | 0.85 | |
| V46 | 0.25 | 0.50 | 0.81 | 0.90 | |||
| Random Forest | SVR | n:500 depth:None split:2 leaf:4 | T408 | 0.55 | 0.76 | 0.62 | 0.80 |
| V46 | 0.47 | 0.68 | 0.65 | 0.81 | |||
| DTR | n:100 depth:12 split:5 leaf:2 | T408 | 0.44 | 0.69 | 0.70 | 0.84 | |
| V46 | 0.29 | 0.54 | 0.78 | 0.89 | |||
| PCT | n:500 depth:8 split:2 leaf:1 | T408 | 0.56 | 0.74 | 0.62 | 0.80 | |
| V46 | 0.42 | 0.65 | 0.68 | 0.83 | |||
| SVR | k:5 | T408 | 0.56 | 0.76 | 0.62 | 0.80 | |
| V46 | 0.31 | 0.56 | 0.77 | 0.88 | |||
| DTR | k:9 | T408 | 0.55 | 0.77 | 0.62 | 0.79 | |
| V46 | 0.32 | 0.57 | 0.76 | 0.87 | |||
| PCT | k:5 | T408 | 0.56 | 0.76 | 0.62 | 0.80 | |
| V46 | 0.33 | 0.57 | 0.76 | 0.87 | |||
| Artificial Neural Network | SVR | solver:sgd activation:tanh learning:constant | T408 | 0.40 | 0.64 | 0.71 | 0.85 |
| V46 | 0.25 | 0.50 | 0.81 | 0.90 | |||
| DTR | solver:sgd activation:tanh learning:constant | T408 | 0.65 | 0.81 | 0.54 | 0.76 | |
| V46 | 0.52 | 0.72 | 0.61 | 0.81 | |||
| PCT | solver:sgd activation:tanh learning:constant | T408 | 0.40 | 0.63 | 0.71 | 0.85 | |
| V46 | 0.28 | 0.53 | 0.79 | 0.89 |
* SVR = Support Vector Regression, DTR = Decision tree regression, PCT = Perceptron method, MAE = Mean absolute Error; RMSE = Root Mean Absolute Error, PCC = Pearson’s correlation coefficient, R2 = Coefficient of Determination, T = Training or Testing dataset, V = Validation dataset (independent).
The statistical measures of performance of the best predictive models developed for NS3/4A heterodimer protein complex using different machine-learning techniques and selected features during ten-fold cross validation on training/testing and independent validation datasets.
| Support Vector Machine | SVR | gamma:0.001 C:100 | T445 | 0.38 | 0.62 | 0.82 | 0.92 |
| V50 | 0.20 | 0.44 | 0.92 | 0.96 | |||
| DTR | gamma:0.01 C:10 | T445 | 0.53 | 0.72 | 0.75 | 0.88 | |
| V50 | 0.41 | 0.64 | 0.83 | 0.91 | |||
| PCT | gamma:0.05 C:1 | T445 | 0.74 | 0.85 | 0.68 | 0.83 | |
| V50 | 0.52 | 0.72 | 0.78 | 0.88 | |||
| Random Forest | SVR | n:300 depth:10 split:5 leaf:1 | T445 | 0.54 | 0.69 | 0.76 | 0.88 |
| V50 | 0.47 | 0.68 | 0.80 | 0.90 | |||
| DTR | n:100 depth:8 split:2 leaf:1 | T445 | 0.48 | 0.67 | 0.78 | 0.89 | |
| V50 | 0.39 | 0.62 | 0.84 | 0.91 | |||
| PCT | n:200 depth:8 split:10 leaf:1 | T445 | 0.72 | 0.79 | 0.69 | 0.84 | |
| V50 | 0.44 | 0.67 | 0.81 | 0.90 | |||
| SVR | k:3 | T445 | 0.53 | 0.74 | 0.76 | 0.88 | |
| V50 | 0.46 | 0.68 | 0.80 | 0.90 | |||
| DTR | k:7 | T445 | 0.65 | 0.79 | 0.70 | 0.85 | |
| V50 | 0.62 | 0.79 | 0.74 | 0.86 | |||
| PCT | k:5 | T445 | 0.77 | 0.86 | 0.67 | 0.83 | |
| V50 | 0.43 | 0.65 | 0.82 | 0.91 | |||
| Artificial Neural Network | SVR | solver:sgd activation:tanh learning:adaptive | T445 | 0.50 | 0.68 | 0.76 | 0.89 |
| V50 | 0.32 | 0.57 | 0.86 | 0.93 | |||
| DTR | solver:sgd activation:tanh learning:adaptive | T445 | 0.81 | 0.81 | 0.60 | 0.81 | |
| V50 | 0.49 | 0.70 | 0.79 | 0.89 | |||
| PCT | solver:sgd activation:tanh learning:adaptive | T445 | 1.08 | 0.92 | 0.47 | 0.77 | |
| V50 | 0.52 | 0.72 | 0.78 | 0.89 |
* SVR = Support Vector Regression, DTR = Decision tree regression, PCT = Perceptron method, MAE = Mean absolute Error; RMSE = Root Mean Absolute Error, PCC = Pearson’s correlation coefficient, R2 = Coefficient of Determination, T = Training or Testing dataset, V = Validation dataset (independent).
The statistical measures of performance of the best predictive models developed for NS5A protein using different machine-learning techniques and selected features during ten-fold cross validation on training/testing and independent validation datasets.
| Support Vector Machine | SVR | gamma:0.001 C:300 | T444 | 0.77 | 0.82 | 0.78 | 0.88 |
| V50 | 1.01 | 1.01 | 0.73 | 0.86 | |||
| DTR | gamma:0.01 C:50 | T444 | 0.89 | 0.91 | 0.74 | 0.87 | |
| V50 | 0.96 | 0.98 | 0.74 | 0.87 | |||
| PCT | gamma:0.05 C:10 | T444 | 1.30 | 1.12 | 0.62 | 0.80 | |
| V50 | 1.53 | 1.24 | 0.59 | 0.78 | |||
| Random Forest | SVR | n:500 depth:None split:2 leaf:1 | T444 | 1.10 | 1.01 | 0.68 | 0.83 |
| V50 | 1.37 | 1.17 | 0.64 | 0.81 | |||
| DTR | n:500 depth:12 split:2 leaf:2 | T444 | 0.86 | 0.90 | 0.75 | 0.87 | |
| V50 | 0.83 | 0.91 | 0.78 | 0.88 | |||
| PCT | n:100 depth:8 split:10 leaf:2 | T444 | 1.20 | 1.04 | 0.64 | 0.81 | |
| V50 | 1.12 | 1.06 | 0.70 | 0.85 | |||
| k-Nearest Neighbour | SVR | k:3 | T444 | 0.99 | 0.97 | 0.71 | 0.85 |
| V50 | 1.17 | 1.08 | 0.69 | 0.83 | |||
| DTR | k:5 | T444 | 0.92 | 0.96 | 0.73 | 0.86 | |
| V50 | 0.99 | 0.99 | 0.74 | 0.86 | |||
| PCT | k:7 | T444 | 1.27 | 1.10 | 0.63 | 0.80 | |
| V50 | 1.16 | 1.08 | 0.69 | 0.83 | |||
| Artificial Neural Network | SVR | solver:sgd activation:tanh learning:constant | T444 | 0.87 | 0.89 | 0.75 | 0.87 |
| V50 | 1.15 | 1.07 | 0.69 | 0.84 | |||
| DTR | solver:sgd activation:tanh learning:constant | T444 | 1.05 | 1.05 | 0.69 | 0.84 | |
| V50 | 1.13 | 1.06 | 0.70 | 0.84 | |||
| PCT | solver:sgd activation:tanh learning:constant | T444 | 1.87 | 1.44 | 0.47 | 0.73 | |
| V50 | 2.53 | 1.59 | 0.32 | 0.72 |
* SVR = Support Vector Regression, DTR = Decision tree regression, PCT = Perceptron method, MAE = Mean absolute Error; RMSE = Root Mean Absolute Error, PCC = Pearson’s correlation coefficient, R2 = Coefficient of Determination, T = Training or Testing dataset, V = Validation dataset (independent).
The statistical measures of performance of the best predictive models developed for NS5B protein using different machine-learning techniques and selected features during ten-fold cross validation on training/testing and independent validation datasets.
| Support Vector Machine | SVR | gamma:0.05 C:1 | T1503 | 0.54 | 0.74 | 0.70 | 0.84 |
| V168 | 0.57 | 0.76 | 0.70 | 0.84 | |||
| DTR | gamma:0.01 C:10 | T1503 | 0.58 | 0.78 | 0.67 | 0.82 | |
| V168 | 0.65 | 0.81 | 0.66 | 0.81 | |||
| PCT | gamma:1 C:10 | T1503 | 1.10 | 1.04 | 0.38 | 0.62 | |
| V168 | 1.24 | 1.11 | 0.35 | 0.60 | |||
| Random Forest | SVR | n:200 depth:None split:2 leaf:1 | T1503 | 0.56 | 0.75 | 0.69 | 0.83 |
| V168 | 0.66 | 0.81 | 0.65 | 0.81 | |||
| DTR | n:400 depth: None split:2 leaf:1 | T1503 | 0.51 | 0.71 | 0.71 | 0.85 | |
| V168 | 0.52 | 0.72 | 0.73 | 0.86 | |||
| PCT | n:100 depth: 12 split:5 leaf:1 | T1503 | 0.99 | 0.99 | 0.44 | 0.67 | |
| V168 | 1.12 | 1.06 | 0.41 | 0.64 | |||
| SVR | k:5 | T1503 | 0.59 | 0.78 | 0.67 | 0.82 | |
| V168 | 0.60 | 0.78 | 0.68 | 0.83 | |||
| DTR | k:7 | T1503 | 0.57 | 0.76 | 0.68 | 0.83 | |
| V168 | 0.55 | 0.74 | 0.71 | 0.85 | |||
| PCT | k:9 | T1503 | 1.06 | 1.03 | 0.40 | 0.64 | |
| V168 | 1.20 | 1.09 | 0.37 | 0.61 | |||
| Artificial Neural Network | SVR | solver:adam activation:tanh learning:constant | T1503 | 0.60 | 0.76 | 0.66 | 0.81 |
| V168 | 0.58 | 0.76 | 0.70 | 0.84 | |||
| DTR | solver:adam activation:tanh learning:constant | T1503 | 0.62 | 0.81 | 0.65 | 0.81 | |
| V168 | 0.70 | 0.84 | 0.63 | 0.80 | |||
| PCT | solver:sgd activation:tanh learning:constant | T1503 | 1.19 | 1.08 | 0.33 | 0.59 | |
| V168 | 1.17 | 1.08 | 0.39 | 0.62 |
* SVR = Support Vector Regression, DTR = Decision tree regression, PCT = Perceptron method, MAE = Mean absolute Error; RMSE = Root Mean Absolute Error, PCC = Pearson’s correlation coefficient, R2 = Coefficient of Determination, T = Training or Testing dataset, V = Validation dataset (independent).
Fig. 2William plots for applicability domain analysis of the support vector machine based predictive models developed for each HCV NS protein – (A) NS3, (B) NS3/4A, (C) NS5A and (D) NS5B.
Fig. 3Support vector machine based developed predictive models robustness shown by the plots between actual and predicted pIC50 of molecules for each HCV NS protein - (A) NS3, (B) NS3/4A, (C) NS5A and (D) NS5B.
Fig. 4Scatter plots to display correlation between actual and predicted pIC50 of decoys and active molecules for HCV NS proteins - (A) NS3, (B) NS3/4A, (C) NS5A and (D) NS5B.
Fig. 5The chemical analysis of inhibitors shown by 3-dimensional multiscaling plots among the compounds for each HCV NS protein - (A) NS3, (B) NS3/4A, (C) NS5A and (D) NS5B.
Table showing information for top 10 predicted repurposed drugs for HCV NS3 protein namely drug, DrugBank ID, primary use, predicted pIC50 and clinical status for HCV.
| DB00970 | Dactinomycin | Anticancer | 8.95 | Not yet tested |
| DB00735 | Naftifine | Antifungal drug | 8.80 | Not yet tested |
| DB01410 | Ciclesonide | Obstructive airway diseases | 8.70 | Not yet tested |
| DB13253 | Proxibarbal | Migraines treatment | 8.61 | Not yet tested |
| DB00241 | Butalbital | Treatment of tension-type headache | 8.51 | Not yet tested |
| DB13170 | Plecanatide | Chronic idiopathic constipation and IBS | 8.48 | Not yet tested |
| DB00474 | Methohexital | Anesthetic for deep sedation | 8.48 | Not yet tested |
| DB15465 | Benzhydrocodone | Pain reliever | 8.42 | Not yet tested |
| DB06711 | Naphazoline | Vasoconstrictor to relieve eyes itching and redness | 8.24 | Not yet tested |
| DB01091 | Butenafine | Antifungal | 8.12 | Not yet tested |
Table showing information for top 10 predicted repurposed drugs for HCV NS3/4A protein namely drug, DrugBank ID, primary use, predicted pIC50 and clinical status for HCV.
| DB01395 | Drospirenone | Oral contraceptive pills | 13.48 | Not yet tested |
| DB06402 | Telavancin | Antibacterial agent | 13.14 | Not yet tested |
| DB00361 | Vinorelbine | Metastatic non-small cell lung carcinoma (NSLC) | 12.57 | Not yet tested |
| DB11275 | Epicriptine | Idiopathic decline in mental capacity | 12.46 | Not yet tested |
| DB00320 | Dihydroergotamine | Migraine and cluster headache | 12.44 | Not yet tested |
| DB11273 | Dihydroergocornine | Idiopathic decline in mental capacity | 12.42 | Not yet tested |
| DB00696 | Ergotamine | Treatment of migraine disorders | 11.77 | Not yet tested |
| DB04911 | Oritavancin | Antibacterial | 11.52 | Not yet tested |
| DB06663 | Pasireotide | Cushing’s disease treatment | 11.24 | Not yet tested |
| DB00256 | Lymecycline | Acne vulgaris and other infections | 11.10 | Not yet tested |
Table showing information for top 10 predicted repurposed drugs for HCV NS5A protein namely drug, DrugBank ID, primary use, predicted pIC50 and clinical status for HCV.
| DB11585 | Drometrizole trisiloxane | UV ray absorbing agent | 13.66 | Not yet tested |
| DB00728 | Rocuronium | Facilitate tracheal intubation and relax skeletal muscles during surgery | 13.33 | Not yet tested |
| DB01338 | Pipecuronium | Neuromuscular blocking agent, used as anesthetic | 13.17 | Not yet tested |
| DB00210 | Adapalene | Acne vulgaris | 13.04 | Not yet tested |
| DB01116 | Trimethaphan | Ganglionic blocker in hypertension | 12.82 | Not yet tested |
| DB14879 | Cefiderocol | Cephalosporin antibiotic for urinary tract infections | 12.57 | Not yet tested |
| DB11951 | Lemborexant | Insomnia treatment | 12.52 | Not yet tested |
| DB01190 | Clindamycin | Bacterial infections | 12.30 | Not yet tested |
| DB13284 | Meticrane | Diuretic | 12.17 | Not yet tested |
| DB01180 | Rescinnamine | Antihypertensive drug | 12.16 | Not yet tested |
Table showing information for top 10 predicted repurposed drugs for HCV NS5B protein namely drug, DrugBank ID, primary use, predicted pIC50 and clinical status for HCV.
| DB13125 | Lusutrombopag | Thrombocytopenia treatment | 8.06 | Not yet tested |
| DB05294 | Vandetanib | Symptomatic or progressive medullary thyroid cancer treatment | 7.97 | Not yet tested |
| DB00365 | Grepafloxacin | Antibiotic to treat gram positive and gram negative bacterial infections | 7.92 | Not yet tested |
| DB09080 | Olodaterol | Treatment of chronic obstructive pulmonary disease (COPD) | 7.68 | Not yet tested |
| DB12035 | Sarecycline | Inflammatory lesions or acne vulgaris treatment | 7.58 | Not yet tested |
| DB08881 | Vemurafenib | For the treatment of metastatic melanoma | 7.67 | Not yet tested |
| DB00334 | Olanzapine | Antipsychotic drug | 7.37 | Not yet tested |
| DB14033 | Acetyl sulfisoxazole | Antibacterial agent | 7.35 | Not yet tested |
| DB01044 | Gatifloxacin | Treatment of different infections | 7.27 | Not yet tested |
| DB12792 | Boscalid | Glaucoma and Schirmers treatment | 7.17 | Not yet tested |
Table represents the ligand, protein name, protein Id (PDB id), binding affinity, interacting residues, distance between interacting residues (Å), types of molecular interactions.
| NS3 (2XCF) | Naftifine (DB00735) | −7.8 | ALA-A:5 | 5.08 | Van der waals |
| Butalbital (DB00241) | −6.3 | ALA-A:5 | 4.57 | Van der waals | |
| Proxibarbal (DB13253) | −6.2 | TYR-A:6 | 5.35 | Van der waals | |
| NS3/4A (4WF8) | Vinorelbine (DB00361) | −8.9 | ASP-A:1081 | 5.21 | Van der waals |
| Epicriptine (DB11275) | −8.4 | ALA-A:1013 | 4.2, 5.2, 6 | Van der waals | |
| Drospirenone (DB01395) | −8.1 | ALA-A:1005 | 4.26 | Van der waals | |
| NS5A (4CL1) | Pipecuronium (DB01338) | −9.8 | ARG-A:12, 15 | 4.3, 5.8, | Carbon hydrogen bond |
| Trimethaphan (DB01116) | −9.4 | ARG-A:131 | 4.31 | Conventional hydrogen bond | |
| Cefiderocol (DB14879) | −9.2 | THR-A:65 | 4.64 | Van der waals | |
| NS5B | Oladaterol (DB09080) | −8.8 | SER-A:180 | 5.47 | Van der waals |
| Vemurafenib (DB08881) | −8.7 | SER-A:196 | 4.44 | Van der waals | |
| Grepafloxacin (DB00365) | −8.4 | TYR-A:191 | 6.60 | Van der waals |
Fig. 6Ribbon structure of proteins NS3, NS3/4A, NS5A, NS5B binding with respective ligand molecules (A) Represents the structure of NS3 protein and naftifine (B) structure of NS3 and butalbital (C) structure of NS3 protein and proxibarbalb (D) structure of NS3/4a protein and vinorelbine (E) structure of NS3/4a protein and epicriptine (F) structure of NS3/4a protein and drospirenone (G) structure of NS5A and pipecuronium (H) structure of NS5A trimethaphan (I) structure of NS5A and cefiderocol (J) structure of NS5B and olodaterol (K) structure of NS5B and vemurafenib (L) structure of NS5B and grepafloxacin (Protein in Rainbow color and ligand molecule is gray color sphere).
Fig. 7An illustration of molecular interactions of proteins NS3, NS3/4A, NS5A, NS5B binding with respective ligand molecules in two dimensions form depicting NS3 with ligands (A) Naftifine (B) Butalbital (C) Proxibarbal; NS3/4A with ligands (D) Vinorelbine (E) Epicriptine (F) Drospirenone; NS5A with ligands (G) Pipecuronium (H) Trimethaphan (I) Cefiderocol and NS5b with ligands (J) Olodaterol (K) Vemurafenib (L) Grepafloxacin.