Literature DB >> 29193970

Predicted Biological Activity of Purchasable Chemical Space.

John J Irwin¹, Garrett Gaskins^1,2,3,4, Teague Sterling¹, Michael M Mysinger¹, Michael J Keiser^1,2,3,4.

Abstract

Whereas 400 million distinct compounds are now purchasable within the span of a few weeks, the biological activities of most are unknown. To facilitate access to new chemistry for biology, we have combined the Similarity Ensemble Approach (SEA) with the maximum Tanimoto similarity to the nearest bioactive to predict activity for every commercially available molecule in ZINC. This method, which we label SEA+TC, outperforms both SEA and a naïve-Bayesian classifier via predictive performance on a 5-fold cross-validation of ChEMBL's bioactivity data set (version 21). Using this method, predictions for over 40% of compounds (>160 million) have either high significance (pSEA ≥ 40), high similarity (ECFP4MaxTc ≥ 0.4), or both, for one or more of 1382 targets well described by ligands in the literature. Using a further 1347 less-well-described targets, we predict activities for an additional 11 million compounds. To gauge whether these predictions are sensible, we investigate 75 predictions for 50 drugs lacking a binding affinity annotation in ChEMBL. The 535 million predictions for over 171 million compounds at 2629 targets are linked to purchasing information and evidence to support each prediction and are freely available via https://zinc15.docking.org and https://files.docking.org .

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Ligands

Year: 2017 PMID： 29193970 PMCID： PMC5780839 DOI： 10.1021/acs.jcim.7b00316

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

The purchasable chemical space has roughly doubled every two and a half years since 1990, owing to steady progress in efficient parallel synthesis[1−8] and the synthesis of new building blocks. There are now over 400 million compounds one can easily purchase using ZINC,[9] which covers 204 commercial catalogs from 145 companies. Each catalog is categorized by ease of purchase, and each compound in turn inherits a purchasability level from its catalog membership. The growth in catalog size is impressive, particularly among the make-on-demand catalogs. Purchasable compounds in the favored lead-like[10] and fragment-like[11] areas have grown from 3 million and a half million in 2007 to 124 million and 9.2 million today, respectively. Many vendors have incorporated the lessons of lead- and fragment-likeness in library design,[47] often filtering for PAINS.[48] About 340 million (85%) of these compounds are affordable enough for the average academic lab to conduct a ligand discovery project, retaining a price point around $100 per sample or less. A further 60 million compounds are available at higher building-block prices, often $400 USD or more and are included here for completeness. We find that synthesis plus delivery of make-on-demand screening compounds often takes little more than a month or so, just twice the time to source many in-stock compounds. The molecular targets (proteins) that these purchasable compounds bind and modulate—if any—are rarely known. Fewer than 1 million compounds—less than 0.25%—have been reported active in a target-specific assay according to public databases such as ChEMBL[12] or other annotated collections indexed by ZINC.[13] Investigators searching for testable ligands might not consider the remaining readily available compounds, as they are not annotated for targets and the sheer number of options can be daunting. In the absence of target activity information, the process of selecting compounds for general purpose screening will often be target-naïve, relying on chemical or physical-property diversity to sample chemical and property space, respectively.[14] If information on target bias—the likelihood that a compound is more disposed to bind to a particular target or class of targets—were readily available, libraries more likely to cover biological targets of interest could be designed. Systematically assaying every commercially available compound against every target is experimentally impractical, so prioritizing compounds through computational predictions is a pragmatic alternative. There are many methods for predicting biological activities by chemical similarity;[15−36] here, we use two. The Similarity Ensemble Approach (SEA)[37,38] predicts biological targets of a compound based on its resemblance to ligands annotated in a reference database, such as ChEMBL.[12] SEA relates proteins by their pharmacology by aggregating chemical similarity among entire sets of ligands. By leveraging extreme value statistics, SEA filters out unreliable signals and normalizes the aggregate results against a random chemical background to predict the significance of pharmacological similarity. SEA has successfully predicted targets of marketed drugs,[37−39] toxicity targets,[40] and mechanism of action targets for hits in zebrafish[41] and C. elegans(42) phenotypic screens. We also use the maximum Tanimoto coefficient[43] at 0.40[44] or better based on ECFP4 fingerprints[45] to inform predictions. Neither method generates models incorporating discrete chemotypes as do Naïve Bayes classifiers, for instance, but instead consider the molecule holistically. This is advantageous because the method can suggest molecules that do not conform to what has been highly weighted by precedent. Other methods such as Naïve Bayes[46] can explicitly weight for chemical substructures that are potentially important to bioactivity (“warheads”), and thus a future version might use such an approach to complement this work. To be useful for research, predictions should be accessible, searchable, and downloadable. An interface should allow access to predictions for each compound, as well as for each target, vendor, and gene. A mechanism to select more novel or more conservative predictions would cater to a wide range of requirements. And libraries should be downloadable in 2D formats for chemoinformatics as well as in popular 3D formats for docking screens. The prospective user of such a resource expects some way to evaluate the predictions. As one proxy to assess this data set, we performed a retrospective 5-fold cross-validation on the ChEMBL bioactivity data set for our method as compared to SEA and a naïve-Bayesian classifier, at a variety of threshold parameters (Figure ; Supporting Information Figures S1 and S2). Second, in assessing performance, we reencountered the observation that whereas the canonical targets of all but a few drugs are known,[47] hundreds of established drugs and investigational compounds nonetheless lack their respective target annotations in ChEMBL. We turned this deficit to our advantage, by testing the method’s prediction of targets for several such drugs, corroborating our predictions with the literature when available. Finally, as these predictions are based on protein−ligand annotations derived from ChEMBL, we expect that this method will be silent about chemotypes and targets not contained in this approximation of the public pharmacopeia.

Figure 1

Comparative performance of SEA, SEA+TC, and a multinomial naive-Bayesian classifier (NBC) on ChEMBL cross-validation sets. (A) Receiver operating characteristic (ROC) curves from independent 5-fold cross-validation runs for each method. Methods are evaluated on independent cross-validation sets filtered for >5 ligands per ChEMBL protein target (equivalent analyses at >50 ligands per target reported in Supporting Information Figure S2). Overall performance is gauged by the area under the ROC curve (AUROC). Note, for SEA+TC cross-validation sets, ROC curves are the result of stepping a decision threshold across MaxTc values, while holding a separate pSEA decision threshold at 40 (yellow curve) or 80 (cyan curve) (see Methods). Complementary curves stepping across SEA p-values are available in Supporting Information Figures S1 and S2. Dotted lines span the distance between a fully stratified classifier (TPR = 0; FPR = 0) and the minimum point at which both SEA+TC decision thresholds begin to affect performance. Pink and blue circles indicate the recommended upper and lower bounds for MaxTc thresholding on their respective pSEA-threshold curves, respectively (upper = 0.80; lower = 0.40). (B) Corresponding precision-recall curves (PRCs) for cross-validation runs described in part A. Positive-class prevalence (dashed red line) indicates the chance of selecting a positive association from the data set at random (0.0014). Performance is measured by the area under the PRC (AUPRC).

Results

The ZINC database contains 400 million commercially available organic molecules with molecular weight between 50 and 1000 Da, sourced from 204 commercial catalogs published by 145 companies. We have created a database of predicted biological activities for the 171 million compounds that had predictions and have made it freely accessible via ZINC (https://zinc15.docking.org) and our file server (https://files.docking.org). All predictions were computed using a combination of the Similarity Ensemble Approach (SEA)[37] and Tanimoto similarity calculations based on compound annotations derived from ChEMBL Version 21[12] (see Methods). We refer to this combinatorial approach as SEA+TC throughout the text. To enhance this resource’s applicability to a broad audience, we sought to increase the specificity of predictions by using more stringent criteria for what constitutes an annotated ligand. In prior work we had used a 10 μM affinity cutoff, but at this scale, we encountered flawed predictions that appeared to arise from similarity to weak binders, possible PAINS, or promiscuous aggregator compounds. Based on our experience with these encounters, we changed the baseline affinity threshold to 1 μM and further required activities of at least 100 nM for compounds containing PAINS patterns or being Tc 0.70 to any compound observed to aggregate.[48−50] We adopted a statistical significance threshold of negative log SEA p-value[54] (pSEA) ≥ 40 and a MaxTc cutoff ≥0.40 guided by the work on belief theory from the Abbvie group.[34] MaxTc is complementary to pSEA as it provides a single-nearest-neighbor-molecule view of similarity, compared to SEA’s global view arising from the ensemble of annotated ligands. To quantify how this bivariate threshold improves predictive capability, we evaluated the performance of SEA, SEA+TC, and a Naïve-Bayesian classifier (NBC) via 5-fold cross-validation of ChEMBL’s bioactivity data set (version 21; Figure ). SEA+TC’s ability to correctly predict compound−target interactions as either positive (does bind) or negative (does not bind) outperformed both SEA and the NBC, as measured by the area under the receiver operating characteristic (AUROC) curve, (AUROC = 0.995, Figure A). Further, when predicting a compound−target interaction as positive, SEA+TC was correct in its prediction more often than SEA or the NBC, as indicated by its area under the precision-recall (AUPRC) curve (AUPRC = 0.684, Figure B). In performing this analysis, we additionally identified a more stringent bivariate threshold, which some users may wish to adopt. At a threshold of MaxTc ≥ 0.80 with pSEA ≥ 80, the retrospective analyses achieve higher precision than the baseline threshold (Figure A and B, blue circle) at acceptable recall (pink circle). Users of the ZINC interface may choose thresholds to suit their needs. In addition to controlling the sensitivity and specificity of predictions, the significance threshold (i.e., pSEA and MaxTc values)[17] also influences the novelty of the predictions. Novel compounds can be desirable because they likely have unrelated off-target effects, which can help establish the signaling and toxicity role of a receptor, as well as selectively activate downstream signaling, which is important for many receptors such as GPCRs.[38] Accordingly, we designed the ZINC interface to help users rapidly identify predictions with their desired precision. The user can control the MaxTc and pSEA limits, and each prediction can be compared with the most similar annotated actives (Figure ) allowing side-by-side comparison. Each SEA prediction is accompanied by a pSEA to the set of actives and MaxTc to the nearest active. Clicking on the MaxTc value in the interface performs a real-time search for the most similar ligands annotated at 10 μM or better for that target.

Figure 2

Predictions supported by evidence. (A) Here, Bucumolol (ZINC100) is shown with a SEA prediction for ADRB2 at a pSEA = 33 and MaxTc to the nearest annotated compound of 0.44. The user may click on the “44” to go to the URL shown, which lists bucumolol’s closest-match known ADRB2 ligands in decreasing order of similarity (the first four are shown). The user may also click on “Run SEA” to rerun a SEA calculation on the molecule, providing comprehensive statistics. To find predictions for a given target using ZINC15 (zinc15.docking.org), the user may select Genes from the Biological dropdown menu to browse a listing of all genes and predictions (Figure A). In this work, we use genes and their identifiers as convenient shorthand for their protein products—or molecular targets. To find a specific gene, the user may type part of the gene name in the top right search bar, here SLC6, and click the blue search button on the top right. To display predictions for this gene, the user clicks on the link in the predictions column, here for SLC6A1 (Figure B). The user may for example use the subset selector to specify strong predictions (which we chose to mean pSEA = 80) and purchasability (Figure C). Some advanced features are currently only accessible by hand-editing the URL. Here, the user adds table.html?sort=-maxtc and &maxtc-between=40+45 to display the information in a tabular format, to sort by decreasing MaxTc, and to select only predictions between MaxTc of 40 and 45, respectively (Figure D). We plan to make these API-level features available via a point and click interface soon. Documentation is available via the help pages https://zinc15.docking.org/genes/help and https://zinc15.docking.org/predictions/help.

Figure 3

Tools to display predictions for a gene and filter and sort them by MaxTc and pSEA. (A) Gene page showing predictions, with search bar to locate genes by name, top right. https://zinc15.docking.org/genes. (B) Gene listings for genes matching “SLC6” https://zinc15.docking.org/genes/search?q=SLC6. (C) Strongly predicted ligands for SLC6A11, showing the popup for subset selections https://zinc15.docking.org/genes/SLC6A11/predictions/subsets/strong. (D) Individual predictions, showing MaxTc and pSEA for each prediction, sorted by pSEA, with a MaxTc (novelty/similarity) limit specified https://zinc15.docking.org/genes/SLC6A1/predictions/subsets/strong/table.html?sort=-pvalue&maxtc-between=40+45. Predictions are available for 2629 genes[51] (Figure ). The number of predictions per gene varies substantially, reflecting both the diversity of annotated ligands for the target as well as how well these chemotypes are represented in current vendor catalogs. For example, natural products and their analogs are often difficult to access synthetically and are therefore generally sparsely represented. At the high end of predictions per gene, the eukaryotic GPCRs D4 dopamine receptor (DRD4), C−C chemokine receptor type 3 (CCR3), and the voltage gated ion channels KCNK3 and KCNK9 each have over 4.8 million purchasable predicted ligands. The number of strong predictions (pSEA ≥ 80) varies from over 500 000 for KCNK3 to as few as 9181 for DRD4. Filtering at MaxTc ≥ 0.60 instead, corresponding to a precision exceeding 0.334 using ECFP4 fingerprints,[44] the predictions for these four genes varied from as many as 25 728 for DRD4 to as few as 8912 for KCNK9. At the other extreme of predictions per gene, fungal laccase-2 precursor (LCC2), human C−C chemokine receptor type 6 (CCR6), voltage-gated sodium channel Nav1.9 (SCN11A), and fruit fly DNA topoisomerase 2 (TOP2) each had fewer than 50 predicted commercially available ligands. The small number of predicted ligands can often be explained by a paucity of reference ligands; here, SCN11A and CCR6 have only 1 ligand each at 10 μM or better. Another reason for the lack of ligands is that the knowns are in an area of chemical space that is difficult to access synthetically, such as natural products for both SCN11A and CCR6.

Figure 4

Predictions available for 2629 genes. (A) The web interface allows genes and their predictions to be found by name or gene symbol: https://zinc15.docking.org/genes. Enter the gene name in the search field (1). Click on the predictions link (2) to display the predicted ligands. (B) Predictions and purchasable compounds for 2629 genes. The horizontal axis is genes, sorted by number of predictions. The vertical axis is number of compounds, log scale, labeled by exponent. Dark gray circles indicate the number of predicted purchasable compounds for a gene. Green triangles represent the number of purchasable annotated compounds for the same gene.

Access by Gene Groupings

In addition to individual genes, predictions may also be accessed by groups of genes. This could be helpful if the investigator is looking for new aminergic GPCR ligands or ligands for voltage gated ion channels or simply wishes to ensure balanced coverage of major target classes in a library. The interface offers convenient ways to access gene groupings based on a protein classification scheme inherited from ChEMBL. There are 15 major target classes (Figure A) further organized into 42 target subclasses (Figure B). Thus, there are 67 million predictions for membrane proteins, of which 1 million are strong (pSEA ≥ 80). Considered separately, there are 873,000 less chemically novel predictions having a Tanimoto coefficient ≥0.60 to an annotated active. At a higher level of granularity, there are 4.7 million predictions for epigenetic reader proteins, of which 2.4 million are strong predictions (pSEA ≥ 80) and 38 000 are highly similar (Tc ≥ 0.60). At the organism level (Figure C), 18 million ligands are predicted for specific bacterial targets, 1.0 million of which are stronger (pSEA ≥ 80) and 92 000 of which are highly similar (Tc ≥ 0.60). The user may select purchasable compounds based on this classification. These compounds will resemble precedented bacterial protein inhibitors far more strongly than compounds selected at random. Ligands predicted for specific bacterial targets are available to browse interactively at https://zinc15.docking.org/organisms/bacteria/genes/ or to download by gene at https://files.docking.org/predictions/current/. A plot of predictions per gene vs annotated ligands per gene shows a general trend toward more predicted ligands when more known ligands are available (see Supporting Information Figure S3).

Figure 5

Prediction counts and purchasable compounds. The gray line indicates the number of predictions, and the green line represents the number of annotated compounds. (A) By major target class. Data from https://zinc15.docking.org/majorclasses. (B) By target subclass. Most target predictions have a maximum tanimoto coefficient between 0.30 and 0.39 and 0.40−0.49. Percent of predictions for each target subclass relative to MaxTc are plotted in the inset to show the full spread of prediction across bins. (C) By Kingdom, called organism class in ChEMBL and ZINC. Data from https://zinc15.docking.org/organisms.

Benchmarks

We predicted the targets of established drugs that nonetheless lack a protein binding affinity annotation in ChEMBL to benchmark our approach. We found hundreds of drugs, withdrawn drugs, and investigational compounds with target predictions that agreed with the literature. Fifty of these were selected and tabulated as illustration of our predictions (Table ). Thus, the beta blocker bufetolol[52] (ZINC101) is predicted to be a β2 adrenergic receptor ligand with pSEA = 47 and MaxTc = 0.46 and to be a β1 adrenergic receptor ligand with pSEA = 51and MaxTc = 0.44. Aranidipine[53] (ZINC600803) is predicted for the calcium voltage-gated ion channel CACNA1C with pSEA = 121 and MaxTc = 0.75. Ancarolol (ZINC39) illustrates the discriminatory value of the SEA prediction, with pSEA = 59 and MaxTc = 0.43 for ADRB1: 255 656 purchasable ligands have higher MaxTc than ancarolol to this target while only 46 753 have a higher pSEA score.

Table 1

Drugs with No Binding Data in ChEMBL, Predicted by SEA or MaxTc, Corroborated by the Literature

drug^(ref)	ZINC ID	target	pSEA	MaxTc
Acemetacin[63]	601272	PTGS2	40	0.76
Afeletecan[64]	150339966	TOP1	69	0.41
Alclometasone[65]	4172330	NR3C1	15	0.58
Alminoprofen[66]	22	PTGS2		0.47
Amisulpride[67]	1846088	DRD3	22	0.66
Ancarolol[68]	39	ADRB2	42	0.44
		ADRB1	59	0.43
		ADRB3	29	0.44
Aranidipine[53]	600803	CACNA1C	121	0.75
Aranidipine[53]	600803	CACNA1D	132	0.51
Azasetron[69]	4132	HTR3A	25	0.61
Azelnidipine[70]	38141706	CACNA1C	91	0.56
Azelnidipine[70]	38141706	CACNA1D	124	0.57
Azetirelin[71]	3804057	TRHR	95	0.59
Azetirelin[71]	3804057	TRHR2		0.61
Besifloxacin[72]	3787097	PARC		0.46
Bevantolol[73]	1542891	ADRB1	89	0.51
		ADRB2	73	0.58
		ADRB3	73	0.53
Bilastine[74]	3822702	HRH1	48	0.51
Binospirone[75]	1999423	HTR1A		0.48
Bufetolol[52]	101	ADRB1	51	0.44
Bufetolol[52]	101	ADRB2	47	0.46
Bunazosin[76]	601249	ADRA1B	52	0.61
Bupranolol[77]	106	ADRB2	45	0.44
Bupranolol[77]	106	ADRB1	19	0.45
Butofilolol[78]	112	ADRB1	50	0.40
Butofilolol[78]	112	ADRB2	34	0.46
Calcifediol[79]	12484926	VDRA		0.79
Calcifediol[79]	12484926	GC		0.79
Camazepam[80]	2008504	GABARA5	25	0.53
Camazepam[80]	2008504	GABARA2	15	0.53
Cellcept[81]	21297660	IMPDH1		0.70
Cellcept[81]	21297660	IMPDH2		0.70
Ciprokiren[82]	8214528	REN	178	0.68
Dasotraline[83]	2510873	SLC6A3	25	0.63
Dasotraline[83]	2510873	SLC6A2	29	0.63
Demecarium[84]	3875376	ACHE		0.71
Dienesterol[85]	4742540	ESR1	26	0.46
Dienesterol[85]	4742540	ESR2	15	0.46
Edaglitazone[86]	1483899	PPARG	83	0.66
Edaglitazone[86]	1483899	PPARA	83	0.65
Efonidipine[87]	38139973	CACNA1C	81	0.51
Efonidipine[87]	38139973	CACNA1D	118	0.51
Eptazocine[88]	1846076	OPRD1	30	0.42
		OPRK1	30	0.46
		OPRM1	32	0.46
Etanterol[89]	263	ADRB1	23	0.47
Etanterol[89]	263	ADRB2	47	0.40
Ethylmorphine[90]	3629718	OPRD1	28	0.62
		OPRK1	24	0.62
		OPRM1	32	0.75
		OPRL1		0.57
Etomoxir[91]	1851171	CPT1		0.47
Fiduxosin[92]	29747110	ADRA1A	30	0.53
		ADRA1B	45	0.53
		ADRA1D	38	0.46
Floxacillin[93]	4102187	BLAACC-4		0.80
Flurazepam[94]	537752	GABARA5	28	0.50
Flurazepam[94]	537752	GABARA1	17	0.49
Granisetron[95]	347	HTR3A	25	0.75
Halobetasol[96]	4214603	NR3C2	20	0.60
Hexoprenaline[97]	3872806	ADRB2	77	0.52
Ketobemidone[98]	1600	OPRD1	49	0.46
		OPRK1	45	0.48
		OPRM1	44	0.55
Lercanidipine[99]	19685790	CACNA1B		0.49
		CACNA1C	107	0.70
		CACNA1D	146	0.63
Lexacalcitol[100]	4474609	VDR	144	0.62
Meptazinol[101]	854	OPRD1	44	0.48
		OPRK1	39	0.60
		OPRM1	38	0.55
Metipranolol[102]	494	ADRB1	27	0.45
Metipranolol[102]	494	ADRB2	31	0.52
Ormeloxifene[103]	5104028	ESR1	86	0.51
Ormeloxifene[103]	5104028	ESR2	58	0.44
Paroxypropione[104]	1890	ESR1	38	0.58
Paroxypropione[104]	1890	ESR2	30	0.58
Pipenzolate[105]	601314	CHRM1		0.47
		CHRM2	30	0.43
		CHRM3	57	0.53
		CHRM4	35	0.53
		CHRM5	40	0.53
Pozanicline[106]	6562	CHRNA2	33	0.57
		CHRNA4		0.57
		CHRNA10	53	0.55
Propiverine[107]	1530934	CHRM2	24	0.42
Propiverine[107]	1530934	CHRM3	50	0.57
Revatropate[108]	4214265	CHRM1	55	0.53
		CHRM2	33	0.53
		CHRM3	59	0.57
		GPM3		0.57
Temazepam[109]	740	GABA5	28	0.59
Udenafil[110]	13916432	PDE5A	74	0.61
Unoprostone[111]	8214703	PTGER1	45	0.57
		PTGER2	30	0.40
		PTGER3		0.57
		PTGDR	52	0.40
		PTGFR	85	0.51
Valategrast[112]	72190226	ITGA4	60	0.32
Verubulin[113]	35978229	TUBB3	62	0.51

Among the 535 million predictions of protein−ligand affinity we expect numerous false positives and false negatives. These errors stem from three major classes of problem. (1) Issues with target annotation: annotated ligands may not be representative for a gene, such as curcumin (ZINC100067274), which is annotated for 32 genes and is probably artifactual for many of them.[54] Annotated ligands may also be mis-annotations in ChEMBL, leading to false positives. For instance, nicotinamide (CHEMBL1140) is annotated for fatty-acid amide hydrolase 1 (FAAH), because it shares an abbreviation (NAM) with the actual ligand, N-arachidonylmaleimide.[55] (2) Errors with the SEA method: We use ECFP4 fingerprints, which have little specificity for certain classes of molecules, such as peptides and sterols, which share many common features and thus are not well discriminated using this fingerprint. SEA also has high variance for small ligand sets and low sensitivity for large, diverse ligand sets. For instance, SEA fails to predict the well-known antihistamine drugs chlorcyclizine and propiomazine for histamine H1 receptor (HRH1), despite their having Tc values of 0.79 and 0.69, respectively, to the most similar HRH1 ligands. The pSEA values of 11 in each case have been diluted by the 9000 diverse ligands annotated to this target. A remedy might be to split targets with large number of ligands, perhaps by chemical clusters, mode of action, or binding site, if known. Note that Naïve Baysian classifiers can be trained to correctly predict these activities, as can be seen on ChEMBL’s ligand detail pages for these compounds. (3) No explicit model of promiscuity for SEA: We have made some progress here by stringent filtering of ligands we suspect are promiscuous (both PAINS and aggregator-like), but we fail to handle frequent hitters such as staurosporine (ZINC3814434, hits 365 targets in ChEMBL) and its ilk. Our current approach also performs poorly on sigma nonopioid intracellular receptor 1 (SIGMAR1) and cytochromes P450 3A4 (CYP3A4), because the ligands annotated to it are highly diverse. To remedy this problem for targets with many ligands, we could cluster by chemotype.

Genes Lacking Commercially Available Ligands

When a target has purchasable ligands, they can be used to rapidly probe its biological function without requiring synthetic chemistry expertise. Yet there are 69 targets with 20 or more annotated ligands in ChEMBL where none is readily purchasable (Table ). To fill these holes in “target space”, we have identified purchasable compounds that are predicted to be active. In one example, voltage dependent calcium channel subunit alpha-2/delta-2 (CACNA2D2) has 26 ligands in ChEMBL, none of which is for sale, such as CHEMBL1801206 with a pKi of 7.7. The compound ZINC36664273, however, is sold by Specs as AO-476/43421055 and has a pSEA of 132 and a MaxTc of 0.72. Looking at these compounds side by side (Table ) and without detailed experimental knowledge of this target, the Specs compound may be reasonable to try against this target. If successful, such compounds could become a purchasable control for these targets.

Table 2

Selected Plausible Predictions of Purchasable Compounds for Genes with No Purchasable Ligands in ChEMBL

Dark Chemical Matter

Intriguingly, 229 million purchasable compounds have no prediction at all by either pSEA ≥ 40 or Tanimoto similarity Tc ≥ 0.40. Some of these will have just missed our cutoffs, wherever the cutoffs may be drawn. A few will be known actives, or analogs of actives, that simply lack a direct binding annotation in ChEMBL. Still, these compounds are generally interesting because they do not much resemble any direct binding actives in ChEMBL. Should they be found to be active in an assay, they are more likely to have fewer off-targets, at least against well-studied targets, and are less likely to be encumbered by patents. A substantial body of literature explores the strengths and pitfalls of dark chemical matter.[56−59] To illustrate what a user of this resource can expect to find in this underexploited yet commercially available space, we have highlighted ten compounds (Table ). For each commercially available molecule, we show the nearest precedented bioactive from public sources available to ZINC, which may also include compounds not in ChEMBL. Dark chemical matter[56−59] may be browsed online at zinc15.docking.org/substances/having/no-predictions and downloaded at scale by physical property tranches (https://files.docking.org/dark-matter/current), by vendor catalogs (e.g., for ChemBridge at https://files.docking.org/catalogs/50/chbr/chbr.predict.txt.gz) and by the genes they are predicted to bind (https://files.docking.org/genes//.predictions.txt.gz).

Table 3

Compounds with No Predictions “Chemical Dark Matter”a

To browse, use: https://zinc15.docking.org/substances/having/no-predictions. To download: https://files.docking.org/special/dark-matter. To browse annotated compounds similar to any compound (e.g., at least 0.30 similar to ZINC compound 14). https://zinc15.docking.org/substances/having/genes?ecfp4_fp-tanimoto-30=14 or 0.30 similar to SMILES https://zinc15.docking.org/substances/having/genes?ecfp4_fp-tanimoto-30=c1ccccc1NOCOCN. Also try https://zinc15.docking.org/substances/subsets/in-vitro?ecfp4_fp-tanimoto-30={zincorsmiles}. For similarity to natural products, try, https://zinc15.docking.org/substances/subsets/biogenic?ecfp4_fp-tanimoto-30=. Please note: these queries are efficient if there are few matches, but will time out if too many hits are found. As a general rule, use tanimoto-50 first, which will be fast, and decrease progressively to −40 and then −30 only if no matches are found. This calculation is intensive, and we may limit usage if there are too many queries that return multiple thousands of hits to allow us to keep this service freely available.

Use Case One

The user is interested in a well-studied target such as the serotonin 2A receptor (HTR2A) and seeks compounds to purchase that are likely to work but have not been reported active in ChEMBL21. The user first checks how many ligands are annotated active at 10 μM or better (5031, interactively at https://zinc15.docking.org/genes/HTR2A/substances or statically downloaded at https://files.docking.org/genes/current/HTR2A/HTR2A.smi). The user then queries how many commercially available ligands have SEA predictions at an exceptionally strong statistical significance, with pSEA = 80 (30 952 at https://zinc15.docking.org/genes/HTR2A/predictions/subsets/strong+purchasable). For instance, ZINC462039162 available from Enamine, catalog number Z1269906839, with a pSEA = 82 and MaxTc = 0.63 (https://zinc15.docking.org/substances/ZINC000462039162/predictions/table.html). Millions of other commercially available molecules can be obtained in a similar way. All predictions are downloaded immediately using https://files.docking.org/genes/current/HTR2A/HTR2A.predictions.txt.gz, from which compounds may be selected.

Use Case Two

The user wishes to obtain a screening library for projects involving several voltage-gated ions channels. The user wishes to find purchasable compounds that do not seem too similar, yet are more likely to be ligands than purely random compounds, i.e., having a high MaxTc between 0.65 and 0.70, corresponding to an expected precision of 0.35−0.40. The library should be downloaded in 2D for chemoinformatics and 3D for docking. In ZINC, there are 14 849 already annotated ligands for any such channel in ChEMBL21 at 10 μM or better (https://zinc15.docking.org/subclasses/vgic/substances). Of these, 1108 (7.5%) are purchasable and may be a good starting point for the library. A further 21 242 purchasable predicted ligands also are available, such as ZINC629100 (https://zinc15.docking.org/substances/ZINC000000629100/predictions/table.html), which is Tc 0.69 to the nearest annotated active CHEMBL1097858, active at pKi of 7.7. To obtain the first 1000 ZINC codes for these molecules, the user accesses: https://zinc15.docking.org/subclasses/vgic/predictions/subsets/purchasable.txt?maxtc-between=65+70&count=1000. To download 3D models of these compounds, please see Obtaining 3D Models, below. A second approach to download predicted compounds for voltage gated ion channels would be to first obtain the names of all the genes: https://zinc15.docking.org/subclasses/vgic/genes.txt:name. Then, the user would use this list to download the static predictions by gene. For example, for the sodium channel protein type 5 subunit alpha (SCN5A), the predictions are in https://files.docking.org/genes/SCN5A/SCN5A.predictions.txt.gz.

Use Case Three

The user would like to know all predictions for a particular vendor catalog. Vendors may be interested to know possible targets of their compounds for marketing purposes. Vendors may also wish to know which of their make-on-demand compounds might be prioritized for synthesis based on possible activity. Academic centers that screen vendor libraries may be interested in individual vendors because they have negotiated special pricing, or because the vendor makes plates available at a discount to facilitate the mechanics of screening. We have been precomputed searches to enable such investigations to save time. To access them, the user would complete the following steps: Browse to https://files.docking.org/catalogs to select the catalog of interest. Download the file of predictions. For instance, for ChemBridge, the code is chbr and the URL is https://files.docking.org/catalogs/50/chbr/chbr.predict.txt.gz. Each row contains the vendor code, ZINC ID, InChIKey, predicted gene, MaxTc, and pSEA: one molecule per row. Break the downloaded files into subsets using Unix command-line tools to filter by MaxTc, pSEA, and predicted gene. To download these in 3D for docking, please see Obtaining 3D Models, below.

Use Case Four

The user wishes to download dark chemical matter screening libraries in 2D or 3D formats. To do so, the user browses to https://files.docking.org/dark-matter. The compounds have been binned into tranches by physical property using our standard scheme (http://wiki.docking.org/index.php/Physical_property_space). The 2D files are available as compressed text files organized by purchasability. Each row contains one molecule with its SMILES, ZINC ID, physical property tranche, purchasability, and reactivity. The 3D files will likewise be prepared in future but are meanwhile available as described in Obtaining 3D Models, below.

Obtaining 3D Models

To download 3D models for a set of molecules in bulk for one of the above use cases, here is a general approach that will work for any arbitrary set of ZINC IDs: Obtain the codes of the molecules to download using the previous use cases or otherwise and store the codes in zinc-codes.txt. Select mol2, db, or db2 file formats. mol2 may be converted to other formats as required. The latter two are used by the UCSF DOCK 3.x programs only. Download the script getfiles.csh from https://files.docking.org/catalogs/getfiles.csh. Edit the file by hand following the instructions within. Run the script, with the list of ZINC codes in the same directory. The 3D files will be downloaded. Please note that 3D models are currently available for about 120 million of the 400 million compounds in ZINC. We are continually building and rebuilding them, prioritizing the popular lead-like and fragment-like areas best suited to docking. If a 3D model is not available, the molecule detail page contains a “Request Generation” button in the 3D representations section. If a 3D model does not exist, it is either because it fails to build or because it is still on our action list.

Discussion

Four major results emerge in this work. First, using ZINC and ChEMBL, we predict molecular target activities for 171 million commercially available compounds at 2629 targets and store them in an accessible database. Second, we create an interface to search, access, and download the predictions (https://zinc15.docking.org and https://files.docking.org). Predictions can be accessed individually or downloaded in bulk, and are available in a range of formats ready for both docking and chemoinformatics, or for purchase. To demonstrate the utility of these predictions, we perform a retrospective 5-fold cross-validation of the ChEMBL bioactivity data set. Further, we identify likely targets of drugs known in the literature where direct binding annotations are not available in ChEMBL. Finally, this new tool allows us to quantify predicted target biases of purchasable chemical space. Target bias predicted by this model is substantial—some genes are represented by millions of purchasable compounds, others have very few. Nearly 60% of purchasable compounds in ZINC have no prediction at all, allowing us to offer purchasable “dark chemical matter”. We take up each of these results in turn. We predict targets for over 40% of the 400 million compounds currently for sale in ZINC. The number is admittedly arbitrary, as we were obliged to choose pSEA and Tanimoto similarity cutoffs. Knowing that this approach would produce false positives and false negatives, we attempted to strike a useful balance, and equip the user to apply further constraints. Many compounds with MaxTc as low as 0.40 to the nearest active may not bind the predicted target−previous work suggests 18% precision might be a good estimate[44] and this is consistent with the results we found in Figure (blue circle). Likewise, those with a pSEA near our chosen threshold of pSEA = 40 may not be active against the predicted target. Should such chemically novel predictions be confirmed experimentally, they may represent new starting points for optimization and could lead to new biology. If the user wishes higher confidence hits, more stringent cutoffs in pSEA or MaxTc are easily applied. We refer the reader to the set of thresholds examined in our cross-validation of the ChEMBL bioactivity data set (Supporting Information Figure S1) for guidance in choosing pSEA and MaxTc values to optimize the desired output. For the highest rates of precision at an acceptable recall, we recommend threshold values at pSEA ≥ 80 and MaxTc ≥ 80 (Figure , pink circle), noting this may reduce the number of novel compound−target associations that pass the cutoff. For those wishing to buy a compound that works, the user might only consider the most similar compounds, having high Tc to a precedented bioactive. For those seeking chemical novelty against a target, where testing 10 or even 50 more novel compounds to find new chemical matter is acceptable, more novel compounds may be sought. Users of virtual screening methods such as docking may want particularly novel (low MaxTc) compounds, because their screening method makes an independent assessment of each prediction. Some will prefer to pursue the most novel—and potentially most interesting—the purchasable chemical dark matter, those compounds that do not seem similar to any of the annotated compounds used to make these predictions. Whatever the appetite for risk, investigators are empowered by these tools to select predictions that are right for their project. Interfacing the prediction database through ZINC allows predictions to be searched, grouped, filtered, compared, and downloaded using the extensive ZINC machinery. Thus 3D models of predicted compounds may be accessed for molecular docking screens, while SMILES strings or molecular properties may be downloaded for ligand-based methods. Predicted compounds for any of 2629 genes may be accessed and downloaded in any of eight formats. Results may be filtered by prediction statistics (pSEA, MaxTc), molecular properties (e.g., molecular weight, calculated logP, polar surface area, fraction sp3) and purchasability (in stock, make-on-demand, or by vendor). Both 2D and 3D results can be organized by gene (e.g., ADRB2, SRC), minor class (e.g., GPCR Class B, voltage-gated ion channel), major class (e.g., transcription factor or membrane protein), Kingdom (bacterial, eukaryotic, viral), vendor, and physical property tranche. Attributes of predictions may be downloaded in tabular form for analysis. A REST API, exemplified in this work, described previously[9] and documented online,[60] allows automated queries and machine-readable results, so that this database may be incorporated into third-party software applications. We examined drugs and investigational compounds without an established molecular target annotation in ChEMBL to assess the relevance of the predictions. The 50 we highlighted exemplify typical results that can be expected using our approach for the millions of molecules that have never been assayed (Table ). Whereas an exhaustive analysis is impractical, this result supports the view that our predictions are often consistent with experimentally observed binding. A global picture of target bias in commercially available libraries emerges. Of the 535 million compound−target predictions, over 500 000 predictions on 400 000 compounds have a MaxTc better than 0.60 (ECFP4) to a ligand annotated for that target; a level of similarity that suggests 35% precision.[44] A further 1.6 million predictions on 1.4 million compounds with MaxTc between 50 and 59 are also strong candidates for experimental testing. Many of these two million compounds could have been predicted by pairwise Tanimoto similarity alone, without the help of SEA. The pSEA adds most value below MaxTc 0.50, where it provides a global similarity measure to the set of annotated ligands as a group instead of a single pairwise one. This becomes even more acute below MaxTc of 0.40, where we only retain predictions with pSEA ≥ 40 as the Tanimoto coefficient alone becomes too untrustworthy, with precision falling rapidly below 10%. Our analysis provides additional resources. We have predicted compounds for 69 targets[61] for which none of the 20 or more actives is commercially available (Table ). If confirmed experimentally, these genes could now be represented in screening panels of commercially available compounds, and these new ligands used as controls or perhaps even starting points for design. For each of 2629 genes, a range of commercially available compounds from high-confidence, having high MaxTc, to more-novel-yet-intriguing at lower MaxTc are now available. For the most studied targets, there is a deep bench of predictions running into the millions of compounds each. Massive biases for some targets, such as the dopamine D2 (DRD2) and beta-2 adrenergic (ADRB2) receptors for instance, echoes our earlier work[62] that commercial libraries are heavily biased toward long-studied, important biological targets. Correspondingly, less-well-studied targets with few ligands often have sparse representation in commercial libraries, which can occur when the known actives are natural products or their derivatives. We have also assembled a database of “dark chemical matter”, 229 million purchasable compounds that received no target prediction and that generally do not resemble known bioactives, which is available from our website in 2D and 3D formats. If these compounds were active in a screen, they would likely represent new starting points for optimization. Our approach has other liabilities. Our cutoffs in MaxTc and pSEA inevitably exclude sensible predictions. Some classes of compounds such as sterols, peptides, and nucleotides suffer from higher mis-prediction rates, a subject of continuing research. pKa and explicit charge are poorly treated in our current protocol based on stereochemistry-naïve ECFP4 fingerprints, making amide nitrogens and basic amines too much alike, for instance, leading to some obviously wrong predictions. Massive turnover in the chemical marketplace means stored predictions may lag the appearance of new compounds in ZINC. ChEMBL contains artifacts and errors, which this approach can magnify. The SEA and MaxTc approaches quantify whole-molecule similarities and are thereby naïve of critical chemical moieties (often called warheads). Notwithstanding these limitations, our database of predicted biological activities for purchasable chemical space is a pragmatic tool that should be useful to a broad audience. It affords both a retail view—buy this compound for this target—as well as a wholesale one—this target is well represented, and here are some compounds for it. Our predictions can be rapidly tested because the compounds are purchasable. We intend to continue to update the database as purchasable chemical space evolves and ChEMBL is enhanced. This database is provided in the hope that it will be useful, but you must use it at your own risk.

Methods

Library Preparation

We used CHEMBL21 compounds annotated for targets better than 10 μM and grouped by Uniprot gene symbol across eukaryotes, as previously described in ZINC15.[9] Thus in this scheme, DRD2_HUMAN, DRD2_RAT, and DRD2_MOUSE are all grouped into a single gene annotation DRD2, and predictions are made against the unified collection for the gene and not the individual orthologs. In situations where the target is composed of several gene products, as in some ion channels for instance, we used the ChEMBL name. When no gene has been formally assigned by Uniprot, we use the Uniprot accession code itself as the gene name, as in ZINC15.

SEA Reference Library Construction

We grouped ligands by affinity. We computed an affinity bin as the negative log of the molar affinity, which is variously expressed as Ki, IC50, and EC50 among others in ChEMBL21 and which we refer to as pKi in this work for simplicity. Thus in this scheme, bin 6 contains all compounds with 1 μM affinity or better. Lower affinity bins were inclusive of compounds from all higher affinity bins. We built three SEA libraries as follows. In the first library, we only proceed if there are at least five distinct compounds active against a single gene, we only accept activities of 1 μM or better. We found 1382 such genes, which we defined as being well described by their ligands. In the second library, we only predict for those single gene targets that did not qualify for the first pass, accepting activities as weak as 10 μM, and as few as one good ligand. We found 1347 of these less-well-described genes. The third library was an attempt to overcome a statistical weakness, which diluted the signal of genes having many diverse ligands. We clustered ligands to describe individual chemotypes of 302 genes having 300 ligands or more each. For each library we computed a statistical background for SEA based on the 410 624 annotated compounds. We computed the pSEA based on an extreme value distribution and the maximum Tanimoto similarity of the prediction to the annotated compounds (MaxTc). Throughout we suppressed from the libraries compounds with PAINS patterns or similarity to a precedented aggregator by 0.70 (ECFP4) having an affinity worse than 100 nM.[48] This was likely too conservative, but earlier, more permissive attempts at this library often suffered from excessive erroneous predictions, likely owing to these fraught compounds.

Database Loading

Predictions were loaded into ZINC. To minimize ligands whose charge differed sharply from precedent, we computed the mean and the standard deviation of the average microspecies charge using ChemAxon’s CXCALC program for each gene. When loading each prediction, if the charge of a 3D representation at pH 7.4 (reference model) was available, we suppressed loading if the charge on the molecule fell outside 1.5 standard deviations from the mean charge for ligands annotated to that gene. This remains an area of ongoing research. The result was to suppress predictions that we likely would have thrown out on inspection, in a scalable if incomplete and imperfect way.

ChEMBL Cross-Validation

We evaluated the predictive performance of SEA+TC using ChEMBL’s bioactivity data set (version 21). Receiver operating characteristic curves were generated from independent 5-fold cross-validation runs for each method examined (SEA, SEA+TC, NBC). For SEA and NBC cross-validation sets, each point on the curve represents the average true-positive rate (TPR) and false-positive rate (FPR) from all 5 folds. TPRs and FPRs along the curve were determined by stepping a decision threshold across the range of possible SEA p-values (0.0−1.0), for all predicted compound-target interactions. To examine the sensitivity of these results to how well the target is described by ligands, we ran the analysis using targets with a minimum of 5 ligands and also with 50 ligands. For SEA+TC cross-validation sets, TPRs and FPRs along the curve were determined by two separate decision thresholds; one for the SEA p-value and another for the maximum Tanimoto coefficient (MaxTc). As ROC curves evaluate a binary classifier using a single discrimination threshold, assessing performance by simultaneously stepping across both metrics was not ideal. To account for this, we generated ROC curves by stepping across all possible values of MaxTc, while holding the pSEA decision threshold constant (Figure ). Predicted compound−target associations are therefore positive if their pSEA or MaxTc passes either of the respective cutoffs. A consequence of this bivariate thresholding is that the static pSEA threshold prevents the TPR and FPR from ever reaching zero. To highlight this, the distance between a fully stratified classifier (TPR = 0; FPR = 0) and the minimum point at which both decision thresholds begin to affect performance is shown in dashed lines (Figure ). Performance metrics for a range of pSEA decision thresholds are shown in Supporting Information Figure S1A and B. Complementary curves stepping across pSEA while holding a separate MaxTc decision threshold constant are shown in Supporting Information Figure S1C and D.

Interface

We added support for SEA predictions to the user interface on the Molecule Detail, Target Detail and Gene Detail pages of ZINC. The interface classifies each gene by one of 15 major target classes (e.g., membrane receptor, ion channel, transporter) and by one of 42 subclasses (e.g., Class A GPCR, voltage gated ion channel, etc) whose pages also allow access to the SEA predictions. The results are downloadable in eight formats: SMILES, mol2, SDF, pdbqt, json, xml, txt, and xls. The predictions may be accessed visually via a web browser or programmatically using an application program interface, both located at https://zinc15.docking.org/predictions/home. Static files are accessible via https://files.docking.org/predictions, https://files.docking.org/genes, https://files.docking.org/catalogs, and https://files.docking.org/dark-matter.

Caveats

Vendors often advertise stereochemically ambiguous molecular descriptions and thus the number of compounds and predictions strongly depends on how these are treated. Since ZINC is a 3D focused database, we are obliged to commit to a 3D representation. Where there is ambiguity, we enumerate up to a maximum of four possible stereoisomers (R/S and E/Z) and readily admit that this inflates the numbers in this work.

118 in total

1. Prediction and evaluation of protein farnesyltransferase inhibition by commercial drugs.

Authors: Amanda J DeGraw; Michael J Keiser; Joshua D Ochocki; Brian K Shoichet; Mark D Distefano
Journal: J Med Chem Date: 2010-03-25 Impact factor: 7.446

2. Virtual affinity fingerprints for target fishing: a new application of Drug Profile Matching.

Authors: Ágnes Peragovics; Zoltán Simon; László Tombor; Balázs Jelinek; Péter Hári; Pál Czobor; András Málnási-Csizmadia
Journal: J Chem Inf Model Date: 2012-12-18 Impact factor: 4.956

3. Scoring of de novo Designed Chemical Entities by Macromolecular Target Prediction.

Authors: Alexander L Button; Jan A Hiss; Petra Schneider; Gisbert Schneider
Journal: Mol Inform Date: 2016-09-19 Impact factor: 3.353

4. [Pharmacokinetics of azasetron (Serotone), a selective 5-HT3 receptor antagonist].

Authors: S Tsukagoshi
Journal: Gan To Kagaku Ryoho Date: 1999-06

5. Inhibition of T-type and L-type Ca(2+) currents by aranidipine, a novel dihydropyridine Ca(2+) antagonist.

Authors: H Masumiya; Y Tanaka; H Tanaka; K Shigenobu
Journal: Pharmacology Date: 2000-08 Impact factor: 2.547

6. Hexoprenaline: a review of its pharmacological properties and therapeutic efficacy with particular reference to asthma.

Authors: R M Pinder; R N Brogden; T M Speight; G S Avery
Journal: Drugs Date: 1977-07 Impact factor: 9.546

7. Selectivity of ABT-089 for alpha4beta2* and alpha6beta2* nicotinic acetylcholine receptors in brain.

Authors: Michael J Marks; Charles R Wageman; Sharon R Grady; Murali Gopalakrishnan; Clark A Briggs
Journal: Biochem Pharmacol Date: 2009-05-27 Impact factor: 5.858

8. Atypical cardiostimulant beta-adrenoceptor in the rat heart: stereoselective antagonism by bupranolol but lack of effect by some bupranolol analogues.

Authors: Barbara Malinowska; Katarzyna Kieć-Kononowicz; Karsten Flau; Grzegorz Godlewski; Hanna Kozłowska; Markus Kathmann; Eberhard Schlicker
Journal: Br J Pharmacol Date: 2003-08 Impact factor: 8.739

9. Bis(2,2,2-trifluoroethyl) carbonate as a condensing agent in one-pot parallel synthesis of unsymmetrical aliphatic ureas.

Authors: Andrey V Bogolubsky; Yurii S Moroz; Pavel K Mykhailiuk; Dmitry S Granat; Sergey E Pipko; Anzhelika I Konovets; Roman Doroschuk; Andrey Tolmachev
Journal: ACS Comb Sci Date: 2014-04-18 Impact factor: 3.784

10. Bronchodilator activity of the selective muscarinic antagonist revatropate in horses with heaves.

Authors: B C McGorum; D R Nicholas; A P Foster; D J Shaw; R S Pirie
Journal: Vet J Date: 2012-07-31 Impact factor: 2.688

8 in total

1. Photochemical synthesis of an epigenetic focused tetrahydroquinoline library.

Authors: Adam I Green; George M Burslem
Journal: RSC Med Chem Date: 2021-08-25

2. Phenotypic Screening of Chemical Libraries Enriched by Molecular Docking to Multiple Targets Selected from Glioblastoma Genomic Data.

Authors: David Xu; Donghui Zhou; Khuchtumur Bum-Erdene; Barbara J Bailey; Kamakshi Sishtla; Sheng Liu; Jun Wan; Uma K Aryal; Jonathan A Lee; Clark D Wells; Melissa L Fishel; Timothy W Corson; Karen E Pollok; Samy O Meroueh
Journal: ACS Chem Biol Date: 2020-05-21 Impact factor: 5.100

Review 3. In silico approach in reveal traditional medicine plants pharmacological material basis.

Authors: Fan Yi; Li Li; Li-Jia Xu; Hong Meng; Yin-Mao Dong; Hai-Bo Liu; Pei-Gen Xiao
Journal: Chin Med Date: 2018-06-19 Impact factor: 5.455

4. Discovery of Kinase and Carbonic Anhydrase Dual Inhibitors by Machine Learning Classification and Experiments.

Authors: Min-Jeong Kim; Sarita Pandit; Jun-Goo Jee
Journal: Pharmaceuticals (Basel) Date: 2022-02-16

5. Random-forest model for drug-target interaction prediction via Kullbeck-Leibler divergence.

Authors: Sangjin Ahn; Si Eun Lee; Mi-Hyun Kim
Journal: J Cheminform Date: 2022-10-03 Impact factor: 8.489

6. 3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery.

Authors: Albert J Kooistra; Márton Vass; Ross McGuire; Rob Leurs; Iwan J P de Esch; Gert Vriend; Stefan Verhoeven; Chris de Graaf
Journal: ChemMedChem Date: 2018-02-14 Impact factor: 3.466

7. Predicting kinase inhibitors using bioactivity matrix derived informer sets.

Authors: Huikun Zhang; Spencer S Ericksen; Ching-Pei Lee; Gene E Ananiev; Nathan Wlodarchak; Peng Yu; Julie C Mitchell; Anthony Gitter; Stephen J Wright; F Michael Hoffmann; Scott A Wildman; Michael A Newton
Journal: PLoS Comput Biol Date: 2019-08-05 Impact factor: 4.475

8. Application of a High-Content Screening Assay Utilizing Primary Human Lung Fibroblasts to Identify Antifibrotic Drugs for Rapid Repurposing in COVID-19 Patients.

Authors: John A Marwick; Richard J R Elliott; James Longden; Ashraff Makda; Nik Hirani; Kevin Dhaliwal; John C Dawson; Neil O Carragher
Journal: SLAS Discov Date: 2021-06-02 Impact factor: 3.341

8 in total