| Literature DB >> 31381559 |
Huikun Zhang1, Spencer S Ericksen2, Ching-Pei Lee3, Gene E Ananiev2, Nathan Wlodarchak4, Peng Yu1, Julie C Mitchell5, Anthony Gitter6,7, Stephen J Wright8, F Michael Hoffmann2,9, Scott A Wildman2, Michael A Newton1,6.
Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31381559 PMCID: PMC6695194 DOI: 10.1371/journal.pcbi.1006813
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1IBR (Informer-Based Ranking) for compound prioritization on a novel target.
From a complete bioactivity data matrix (blue grid), a subset of informer compounds (green stars) are identified from the broader set of compounds (stars) that have been tested against a large set of targets (pink circles). A previously uncharacterized target (red circle) is assayed with just the informer compounds, and the new bioactivity data are used to reveal the new target’s relationship to other targets. The combined data enable activity predictions (purple) on the remaining, non-informer compounds.
Retrieval counts by the various methods on new kinase targets (a) PknB, (b) BGLF4, and (c) ROP18 using PKIS1 or PKIS2 matrices.
The total number of experimentally determined active compounds and distinct active scaffolds is indicated in the total column. The values below each of the IBR methods indicate the number of active informers identified, the number of experimentally determined active compounds that were ranked in the top 10% of predicted active compounds by each method, and the number of unique active scaffolds identified in those top 10%. For a given target, these 10% are the active informers and the top ranking non-informers comprising 10% of the set of all compounds after removing inactive informers.
| (a) PknB | |||||||
| baselines | non-baselines | ||||||
| matrix | hits | BCw | BFw | RS | CS | AS | total |
| PKIS1 | active compounds | 1 | 7 | 7 | 2 | 3 | 8 |
| active scaffolds | 1 | 7 | 7 | 2 | 3 | 8 | |
| PKIS2 | active compounds | 0 | 1 | 2 | 3 | 1 | 7 |
| active scaffolds | 0 | 1 | 2 | 3 | 1 | 7 | |
| (b) BGLF4 | |||||||
| baselines | non-baselines | ||||||
| matrix | hits | BCw | BFw | RS | CS | AS | total |
| PKIS1 | active compounds | 3 | 9 | 3 | 7 | 10 | 11 |
| active scaffolds | 2 | 6 | 3 | 5 | 7 | 8 | |
| PKIS2 | active compounds | 1 | 1 | 8 | 3 | 1 | 10 |
| active scaffolds | 1 | 1 | 7 | 3 | 1 | 8 | |
| (c) ROP18 | |||||||
| baselines | non-baselines | ||||||
| matrix | hits | BCw | BFw | RS | CS | AS | total |
| PKIS1 | active compounds | 4 | 7 | 4 | 4 | 2 | 16 |
| active scaffolds | 3 | 4 | 2 | 3 | 2 | 11 | |
| PKIS2 | active compounds | 7 | 5 | 3 | 3 | 5 | 19 |
| active scaffolds | 4 | 3 | 2 | 2 | 3 | 12 | |
(a) ROCAUC, (b) NEF10, and (c) FASR10 in Leave-One-Target-Out Cross Validation on PKIS1.
IBR methods were evaluated on 224 PKIS1 targets using standard VS metrics that reflect active retrieval: ROCAUC and NEF10. FASR10 was also evaluated to reflect the chemical diversity of the actives retrieved. All baseline outcomes are shown in S4 Table along with p-values from pairwise comparisons in S5 Table. *The only non-baseline IBR that fails to demonstrate statistical improvement (p <0.0085) over all baselines is CS when using the ROCAUC metric. Note: a Šidák multiple comparison correction was applied using 6 baselines against each non-baseline IBR, lowering the α threshold from 0.05 to 0.0085.
| (a) ROCAUC | |||||
| baselines | non-baselines | ||||
| BCw | BFw | RS | *CS | AS | |
| mean | 0.63 | 0.79 | 0.90 | 0.81 | 0.84 |
| median | 0.67 | 0.81 | 0.93 | 0.83 | 0.88 |
| stdev | 0.21 | 0.13 | 0.11 | 0.14 | 0.14 |
| (b)NEF10 | |||||
| baselines | non-baselines | ||||
| BCw | BFw | RS | CS | AS | |
| mean | 0.62 | 0.74 | 0.80 | 0.79 | 0.82 |
| median | 0.60 | 0.72 | 0.81 | 0.79 | 0.85 |
| stdev | 0.13 | 0.13 | 0.13 | 0.14 | 0.13 |
| (c)FASR10 | |||||
| baselines | non-baselines | ||||
| BCw | BFw | RS | CS | AS | |
| mean | 0.31 | 0.52 | 0.68 | 0.65 | 0.71 |
| median | 0.29 | 0.50 | 0.72 | 0.64 | 0.75 |
| stdev | 0.21 | 0.21 | 0.22 | 0.26 | 0.23 |
Fig 2A comparison of models with respect to compound ranking performance as assessed by ROCAUC values.
Each model was evaluated on 224 targets through PKIS1 leave-one-target-out validation. ROCAUC of 0.5 indicates a random ranking of compounds on a given target; ROCAUC of 1.0 represents ideal ranking with all active compounds prioritized above the inactives. The individual target evaluations are shown as light grey dots with median and interquartile ranges displayed as a white circle and black bars, respectively.
Fig 3A comparison of models with respect to compound ranking performance as assessed by active enrichment in the top 10% of ranked compounds.
Each model was evaluated on 224 targets through PKIS1 leave-one-target-out validation. NEF10 represents the fold-enrichment of actives in top 10% above random that is normalized by dividing by the maximum theoretical fold-enrichment that could be achieved at the 10% threshold for the target of interest.
Fig 4A comparison of models with respect to the structural diversity of the active compounds retrieved.
Each model was assessed by FASR10 evaluations on 224 targets through PKIS1 leave-one-target-out validation. The FASR10 metric is the fraction of the total identified active molecule scaffolds, for the target of interest, that were identified in the top 10% of the ranked compounds on that target. Compounds are grouped by their generic (all-carbon skeletons) representations of Bemis-Murcko scaffolds.