| Literature DB >> 34934067 |
Hui-Heng Lin1, Qian-Ru Zhang2, Xiangjun Kong3, Liuping Zhang4, Yong Zhang5, Yanyan Tang6, Hongyan Xu7,8.
Abstract
Persistent infection with high-risk types Human Papillomavirus could cause diseases including cervical cancers and oropharyngeal cancers. Nonetheless, so far there is no effective pharmacotherapy for treating the infection from high-risk HPV types, and hence it remains to be a severe threat to the health of female. Based on drug repositioning strategy, we trained and benchmarked multiple machine learning models so as to predict potential effective antiviral drugs for HPV infection in this work. Through optimizing models, measuring models' predictive performance using 182 pairs of antiviral-target interaction dataset which were all approved by the United States Food and Drug Administration, and benchmarking different models' predictive performance, we identified the optimized Support Vector Machine and K-Nearest Neighbor classifier with high precision score were the best two predictors (0.80 and 0.85 respectively) amongst classifiers of Support Vector Machine, Random forest, Adaboost, Naïve Bayes, K-Nearest Neighbors, and Logistic regression classifier. We applied these two predictors together and successfully predicted 57 pairs of antiviral-HPV protein interactions from 864 pairs of antiviral-HPV protein associations. Our work provided good drug candidates for anti-HPV drug discovery. So far as we know, we are the first one to conduct such HPV-oriented computational drug repositioning study.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34934067 PMCID: PMC8692573 DOI: 10.1038/s41598-021-03000-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Research framework of this study. Predicting antiviral drug-HPV protein interaction could be considered a binary classification task, and machine learning is a good method for such task. In this work, antiviral drug-target pairs’ features were transformed into vectors for constructing machine learning predictors. Through benchmarking, the best predictors were selected to predict antiviral-HPV protein interactions.
Molecular descriptors used for machine learning analysis.
| ID | Molecular descriptor | Vector length |
|---|---|---|
| 1 | Drug molecule fingerprint | 1024 |
| 2 | Amino acid composition | 20 |
| 3 | Dipeptide composition | 400 |
| 4 | Tripeptide composition | 8000 |
| 5 | Normalized Moreau-Broto Autocorrelation | 240 |
| 6 | Moran Autocorrelation | 240 |
| 7 | Geary Autocorrelation | 240 |
| 8 | Composition descriptor | 21 |
| 9 | Transition descriptor | 21 |
| 10 | Distribution descriptor | 105 |
| 11 | Pseudo amino acid composition | 50 |
| 12 | Amphiphilic pseudo amino acid composition | 80 |
| 13 | Conjoint Triad | 343 |
| Total | 10,784 |
Summary of antiviral-target and antiviral-HPV protein interaction dataset used in machine learning processing of this study.
| Dataset | Antiviral drug | Target protein | Antiviral-protein interaction pair |
|---|---|---|---|
| Training set | 35 | 34 | 102c |
| Validation seta | 61 | 47 | 182c,d |
| Prediction set | 96 | 9b | 864 |
aValidation set consisted of U.S.FDA-approved antiviral drugs and these drugs’ binding target proteins.
b9 proteins of HPV-16.
cRatio of positive instance to negative instance was 1:1.
dNumber of validation set was greater than that of training set because (1) more FDA-approved antivirals were desired for validating the real-world application value of our machine learning models; (2) generalization performance of machine learning models could be reflected using smaller training set but larger validation set.
Performance of 6 machine learning predictors with default parameters.
| ID | Predictor | Precision | Recall | F1-measure | Accuracy | AUCa |
|---|---|---|---|---|---|---|
| 1 | SVM | 0.36 | 0.48 | 0.37 | 0.48 | 0.44 |
| 2 | Logistic Regression | 0.52 | 0.57 | 0.54 | 0.59 | 0.56 |
| 3 | KNN | 0.61 | 0.62 | 0.60 | 0.59 | 0.61 |
| 4 | Naïve Bayes | 0.46 | 0.65 | 0.52 | 0.49 | 0.52 |
| 5 | Random Forest | 0.61 | 0.61 | 0.63 | 0.66 | 0.68 |
| 6 | AdaBoost | 0.73 | 0.48 | 0.55 | 0.62 | 0.48 |
aAUC indicates the metric of Area Under Curve of Receiver-Operating Characteristic Curve.
Precision scores of optimized machine learning predictors on the validation dataset of antiviral-HPV protein interaction pairs.
| Predictor | SVM | Logistic regression | KNN | Naïve Bayes | Random Forest | AdaBoost |
|---|---|---|---|---|---|---|
| Precision Score | 0.80a | 0.50 | 0.85a | 0.65 | 0.68 | 0.75 |
aMetrics of optimized SVM and KNN used for predicting antiviral-HPV protein interaction are available at Supplementary Table S4.
Summary of prediction result of antivirals targeting each protein of HPV-16.
| HPV-16 protein | Number of antiviralsa | Example |
|---|---|---|
| Protein E7 | 7 | Docosanol targeting GP340 or GP350 protein of Epstein-Barr Virus has been approved to treat herpes labialis, fever blisters, etc |
| Regular Protein E2 | 5 | Voxilaprevir targeting NS3/4A protein of Hepatitis C Virus has been approved to treat chronic Hepatitis C caused by Hepatitis C Virus infection |
| Protein E6 | 6 | Telaprevir is an NS3/4A viral protease inhibitor. It has been approved to treat chronic Hepatitis C Virus infection in combination with other drugs |
| Minor capsid protein L2 | 4 | Grazoprevir targeting NS3/4A protein of Hepatitis C Virus has been approved to treat Hepatitis C viral infection |
| Protein E4 | 8 | Nelfinavir is a potent viral protease inhibitor for treating infections of Human Immunodeficiency Virus (HIV), and it targets the protease of HIV -1 |
| Probable protein E5 | 7 | Maraviroc is a chemokine receptor antagonist drug targeting C–C chemokine receptor type 5. It has been approved to treat HIV-1 infection |
| Replication protein E1 | 7 | Pirodavir (investigational drug) targets the genome polyprotein of Polioviruses and it seems to have broad-spectrum antiviral effects on multiple kinds of Human Rhinoviruses |
| Major capsid protein L1 | 5 | Docosanol targeting GP340 or GP350 protein of Epstein-Barr Virus has been approved to treat herpes labialis, fever blisters, etc |
| Protein E8^E2C | 7 | TMC-310911 (investigational drug) is a protease inhibitor targeting HIV-1 protease and it seems to have effect on treating HIV-1 infection |
a Indicating the number of antivirals which was predicted to have potential interaction with specific HPV-16 protein.