| Literature DB >> 31963467 |
Timur I Madzhidov1, Assima Rakhimbekova1, Alina Kutlushuna1,2, Pavel Polishchuk2.
Abstract
Pharmacophore modeling is usually considered as a special type of virtual screening without probabilistic nature. Correspondence of at least one conformation of a molecule to pharmacophore is considered as evidence of its bioactivity. We show that pharmacophores can be treated as one-class machine learning models, and the probability the reflecting model's confidence can be assigned to a pharmacophore on the basis of their precision of active compounds identification on a calibration set. Two schemes (Max and Mean) of probability calculation for consensus prediction based on individual pharmacophore models were proposed. Both approaches to some extent correspond to commonly used consensus approaches like the common hit approach or the one based on a logical OR operation uniting hit lists of individual models. Unlike some known approaches, the proposed ones can rank compounds retrieved by multiple models. These approaches were benchmarked on multiple ChEMBL datasets used for ligand-based pharmacophore modeling and externally validated on corresponding DUD-E datasets. The influence of complexity of pharmacophores and their performance on a calibration set on results of virtual screening was analyzed. It was shown that Max and Mean approaches have superior early enrichment to the commonly used approaches. Thus, a well-performing, easy-to-implement, and probabilistic alternative to existing approaches for pharmacophore-based virtual screening was proposed.Entities:
Keywords: ligand-based virtual screening; machine learning; pharmacophores; virtual screening
Year: 2020 PMID: 31963467 PMCID: PMC7024325 DOI: 10.3390/molecules25020385
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Enrichment factor (EF) curves for the Max, Mean, and common hits approach (CHA) schemes of molecules ranking in virtual screening for selected targets. Levels for OR-consensus models were given as horizontal lines (pharmacophores with precision greater than 0.5 and 0.9 are left). The numbers of corresponding ChEMBL targets are provided.
Figure 2BEDROC curves for the Max, Mean, and CHA schemes of molecules ranking in virtual screening for selected targets. The numbers of corresponding ChEMBL targets are provided.
ChEMBL datasets used for ligand-based pharmacophore modeling and calibration and DUD-E dataset used for external validation.
| ChEMBL ID | Target Name | Number of Compounds (ChEMBL) | Number of Compounds (DUD-E) | ||||
|---|---|---|---|---|---|---|---|
| Actives | Inactives | Total | Actives | Inactives | Total | ||
| CHEMBL205 | Carbonic anhydrase II | 1394 | 2382 | 3776 | 492 | 31,172 | 31,664 |
| CHEMBL206 | Estrogen receptor alpha | 395 | 1442 | 1837 | 383 | 20,685 | 21,068 |
| CHEMBL208 | Progesterone receptor | 448 | 848 | 1296 | 293 | 15,650 | 15,943 |
| CHEMBL213 | Beta-1 adrenergic receptor | 155 | 482 | 637 | 247 | 15,850 | 16,097 |
| CHEMBL235 | Peroxisome proliferator-activated receptor gamma | 228 | 1052 | 1280 | 484 | 25,300 | 25,784 |
| CHEMBL239 | Peroxisome proliferator-activated receptor alpha | 121 | 788 | 909 | 373 | 19,399 | 19,772 |
| CHEMBL242 | Estrogen receptor beta | 477 | 972 | 1449 | 367 | 20,199 | 20,566 |
| CHEMBL244 | Coagulation factor X | 676 | 2009 | 2685 | 537 | 28,325 | 28,862 |
| CHEMBL251 | Adenosine 2a receptor | 1476 | 2276 | 3752 | 482 | 31,550 | 32,032 |
| CHEMBL279 | Vascular endothelial growth factor receptor 2 | 139 | 4627 | 4766 | 409 | 24,950 | 25,359 |
| CHEMBL284 | Dipeptidyl peptidase IV | 281 | 2277 | 2558 | 533 | 40,950 | 41,483 |
| CHEMBL1862 | Tyrosine-protein kinase ABL | 411 | 1515 | 1926 | 182 | 10,750 | 10,932 |
| CHEMBL1871 | Androgen Receptor | 586 | 967 | 1553 | 269 | 14,350 | 14,619 |
| CHEMBL1994 | Mineralocorticoid receptor | 102 | 532 | 634 | 94 | 5150 | 5244 |
| CHEMBL2971 | Tyrosine-protein kinase JAK2 | 131 | 2545 | 2676 | 107 | 6500 | 6607 |
| CHEMBL3105 | Poly [ADP-ribose] polymerase-1 | 259 | 1138 | 1397 | 508 | 30,050 | 30,558 |
The number of models generated for individual targets.
| ChEMBL ID | Target Name | Number of Models | Number of Models with Number of Features ≥ 4 a |
|---|---|---|---|
| CHEMBL205 | Carbonic anhydrase II | 270 | 260 |
| CHEMBL206 | Estrogen receptor alpha | 27 | 26 |
| CHEMBL208 | Progesterone receptor | 37 | 32 |
| CHEMBL213 | Beta-1 adrenergic receptor | 19 | 17 |
| CHEMBL235 | Peroxisome proliferator-activated receptor gamma | 31 | 26 |
| CHEMBL239 | Peroxisome proliferator-activated receptor alpha | 15 | 15 |
| CHEMBL242 | Estrogen receptor beta | 61 | 53 |
| CHEMBL244 | Coagulation factor X | 45 | 35 |
| CHEMBL251 | Adenosine A2a receptor | 110 | 101 |
| CHEMBL279 | Vascular endothelial growth factor receptor 2 | 12 | 11 |
| CHEMBL284 | Dipeptidyl peptidase IV | 34 | 34 |
| CHEMBL1862 | Tyrosine-protein kinase ABL | 27 | 27 |
| CHEMBL1871 | Androgen Receptor | 50 | 48 |
| CHEMBL1994 | Mineralocorticoid receptor | 6 | 6 |
| CHEMBL2971 | Tyrosine-protein kinase JAK2 | 4 | 1 |
| CHEMBL3105 | Poly [ADP-ribose] polymerase-1 | 43 | 40 |
a Number of features having distinct coordinates.