| Literature DB >> 31881687 |
Leonid A Stolbov1, Dmitry S Druzhilovskiy1, Dmitry A Filimonov1, Marc C Nicklaus2, Vladimir V Poroikov1.
Abstract
Despite the achievements of antiretroviral therapy, discovery of new anti-HIV medicines remains an essential task because the existing drugs do not provide a complete cure for the infected patients, exhibit severe adverse effects, and lead to the appearance of resistant strains. To predict the interaction of drug-like compounds with multiple targets for HIV treatment, ligand-based drug design approach is widely applied. In this study, we evaluated the possibilities and limitations of (Q)SAR analysis aimed at the discovery of novel antiretroviral agents inhibiting the vital HIV enzymes. Local (Q)SAR models are based on the analysis of structure-activity relationships for molecules from the same chemical class, which significantly restrict their applicability domain. In contrast, global (Q)SAR models exploit data from heterogeneous sets of drug-like compounds, which allows their application to databases containing diverse structures. We compared the information for HIV-1 integrase, protease and reverse transcriptase inhibitors available in the EBI ChEMBL, NIAID HIV/OI/TB Therapeutics, and Clarivate Analytics Integrity databases as the sources for (Q)SAR training sets. Using the PASS and GUSAR software, we developed and validated a variety of (Q)SAR models, which can be further used for virtual screening of new antiretrovirals in the SAVI library. The developed models are implemented in the freely available web resource AntiHIV-Pred.Entities:
Keywords: (Q)SAR models; GUSAR; HIV-1; PASS; SAVI library; inhibitors; integrase; protease; reverse transcriptase; virtual screening
Mesh:
Substances:
Year: 2019 PMID: 31881687 PMCID: PMC6983201 DOI: 10.3390/molecules25010087
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Numbers of HIV-1 IN, PR, and RT inhibitors exported from three databases before/after the cleaning procedure.
| IN | PR | RT | |
|---|---|---|---|
|
| 10,377/3459 | 7604/5972 | 8936/5675 |
|
| 2283/1430 | 2387/1437 | 2149/1390 |
|
| 563/328 | 316/268 | 731/615 |
Figure 1Total number of molecules exported from three databases and intersection between different datasets after the cleaning procedure.
Characteristics of classification models (PASS) for datasets NIAID and ChEMBL/NIAID and Integrity.
| Inhibitors of | Active | Inactive | IAP |
|---|---|---|---|
| HIV-1 IN | 1813/1622 | 2108/1930 | 0.924/0.922 |
| HIV-1 PR | 4762/4504 | 1337/1298 | 0.938/0.937 |
| HIV-1 RT | 3142/3054 | 2854/2752 | 0.878/0.878 |
Characteristics of the regression models (GUSAR) for datasets NIAID and ChEMBL/NIAID and Integrity.
| Inhibitors of | N | R2 | Q2 | RMSD | V |
|---|---|---|---|---|---|
| IN | 3987/3597 | 0.960/0.960 | 0.821/0.819 | 0.587/0.592 | 384/371 |
| PR | 6462/6068 | 0.956/0.957 | 0.829/0.827 | 0.696/0.710 | 494/485 |
| RT | 6093/5894 | 0.943/0.942 | 0.723/0.715 | 0.760/0.776 | 455/441 |
Figure 2Predicted pIC50 vs. observed values for QSAR models based on NIAID and ChEMBL (a) and NIAID and Integrity (b) datasets.
Results of prediction for the test sets with classification models.
| Model | Test Set | Sensitivity | Specificity | Balanced Accuracy |
|---|---|---|---|---|
| IN, NIAID and ChEMBL | IN, Integrity test set | 0.753 | 0.677 | 0.715 |
| IN, NIAID and Integrity | IN, ChEMBL test set | 0.813 | 0.820 |
|
| PR, NIAID and ChEMBL | PR, Integrity test set | 0.697 | 0.857 | 0.777 |
| PR, NIAID and Integrity | PR, ChEMBL test set |
| 0.788 | 0.807 |
| RT, NIAID and ChEMBL | RT, Integrity test set | 0.611 |
|
|
| RT, NIAID and Integrity | RT, ChEMBL test set |
|
| 0.732 |
Figure 3Results of prediction for the Integrity test sets with the GUSAR regression models.
Figure 4Results of prediction for the ChEMBL test sets with the GUSAR regression models.
Figure 5Distributions of new MNA descriptors for the ChEMBL test sets.
Characteristics of classification models (PASS) based on the complete dataset.
| Inhibitors of | Active | Inactive | IAP LOO CV | IAP 20-fold CV |
|---|---|---|---|---|
|
| 1884 | 2139 | 0.922 | 0.921 |
|
| 4840 | 1351 | 0.937 | 0.936 |
|
| 3286 | 2935 | 0.876 | 0.875 |
Characteristics of the regression models (GUSAR) based on the complete dataset.
| Inhibitors of | N | R2 | Q2 | RMSD | V |
|---|---|---|---|---|---|
|
| 4091 | 0.96 | 0.818 | 0.595 | 392 |
|
| 6554 | 0.954 | 0.824 | 0.709 | 470 |
|
| 6309 | 0.941 | 0.714 | 0.767 | 452 |
Figure 6IN (a), PR (b), and RT (c) pIC50 observed vs. predicted values for the (Q)SAR models based on the complete dataset.
Figure 7Example of prediction for an FDA-approved inhibitor by the AntiHIV-Pred web-service.
Figure 8Activity distribution in the NIAID PR dataset.
Number of compounds in the test sets.
| IN | PR | RT | |
|---|---|---|---|
|
| 104 | 92 | 216 |
|
| 494 | 486 | 415 |