Literature DB >> 23451880

Locating sweet spots for screening hits and evaluating pan-assay interference filters from the performance analysis of two lead-like libraries.

Abstract

The efficiency of automated compound screening is heavily influenced by the design and the quality of the screening libraries used. We recently reported on the assembly of one diverse and one target-focused lead-like screening library. Using data from 15 enzyme-based screenings conducted using these libraries, their performance was investigated. Both libraries delivered screening hits across a range of targets, with the hits distributed across the entire chemical space represented by both libraries. On closer inspection, however, hit distribution was uneven across the chemical space, with enrichments observed in octants characterized by compounds at the higher end of the molecular weight and lipophilicity spectrum for lead-like compounds, while polar and sp(3)-carbon atom rich compounds were underrepresented among the screening hits. Based on these observations, we propose that screening libraries should not be evenly distributed in lead-like chemical space but be enriched in polar, aliphatic compounds. In conjunction with variable concentration screening, this could lead to more balanced hit rates across the chemical space and screening hits of higher ligand efficiency will be captured. Apart from chemical diversity, both screening libraries were shown to be clean from any pan-assay interference (PAINS) behavior. Even though some compounds were flagged to contain PAINS structural motifs, some of these motifs were demonstrated to be less problematic than previously suggested. To maximize the diversity of the chemical space sampled in a screening campaign, we therefore consider it justifiable to retain compounds containing PAINS structural motifs that were apparently clean in this analysis when assembling screening libraries.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23451880 PMCID： PMC3739413 DOI： 10.1021/ci300382f

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

The advent of high-throughput screening (HTS) for drug discovery in the 1980s enabled the rapid screening of diverse chemical compounds during hit identification. In the early days of HTS, compound collections were mainly assembled from internal resources and often also contained compounds from previous company activities such as dyes and fine chemicals.[1] Through the introduction of combinatorial chemistry, the size of screening libraries expanded. However, their quality could remain poor owing to limited compound diversity and undesirable compound properties.[1] Since the establishment of drug-like and lead-like concepts into drug discovery,[2−4] physicochemical properties represented key parameters for compound selection that led to higher quality hits in HTS campaigns.[1] Nowadays, the focus of enhancing screening libraries lies on increasing scaffold diversity for general screening purposes and the assembly of target- or gene-family tailored libraries, often under consideration of physicochemical property constraints to maintain the lead-like character of the selected compounds.[1,5,6] Lead-like compounds are considered to be smaller and structurally less complex than drug-like molecules.[3,4] This allows expansion of molecules in lead optimization, and at the same time enables more efficient sampling of chemical space since the latter is estimated to expand exponentially for every extra heavy atom in a molecule.[7] Moreover, compound affinities of lead-like molecules can be detected at typical screening concentrations in the low micromolar range using conventional HTS laboratory setup without the need for sensitive detection methodologies such as biophysical techniques commonly required for fragment-based hit discovery.[8] A common problem associated with compound libraries is the presence of compounds displaying promiscuous behavior. These false positives, sometimes also termed “frequent hitters” or more recently, pan-assay interference compounds (PAINS),[9] display nonspecific enzyme inhibition through mechanisms including, but not limited to, compound aggregation and covalent protein binding through the presence of reactive functional groups (see the work of Thorne et al. for a comprehensive review).[10] Compounds displaying promiscuous behavior reduce the efficiency of hit identification in library screening by wasting valuable resources in attempted, but unsuccessful, compound optimization efforts. Several recent publications have highlighted structural characteristics and motifs responsible for such behavior that the authors suggested as useful filters in screening libraries to enhance the efficiency of hit discovery.[9,11,12] At the University of Dundee, we have reported the assembly of several screening libraries, including a diverse screening library (DSL) and target-focused libraries against kinases (FKL) and ion channels, all compiled using physicochemical properties compliant to lead-like criteria (Table 1).[13,14] To date, a number of enzyme- and cell-based screens have been carried out using these libraries, with a wide target spectrum across multiple species of organisms using various assay readout technologies. These screening results provide a valuable opportunity to assess the performance of lead-like screening libraries. In the current work, we report the analysis of results collected from 15 enzyme-based screenings conducted using DSL and FKL. We evaluated the utilization of chemical space represented by each library and the distribution of screening hits within this chemical space. We then assessed whether any library compounds should be classified as pan-assay interference (PAINS) according to the definitions of Baell and Holloway. Finally, we investigated if compounds containing previously identified structural motifs of PAINS were indeed promiscuous inhibitors in our screens.[9] On the basis of these analyses, we give recommendations on the composition of lead-like libraries and associated screening practice to obtain an even distribution of hit compounds in the chemical space represented, and the application of PAINS filters to remove compounds when assembling screening libraries.

Table 1

Lead-like Selection Criteria Used for Compounds in DSL and FKL[13]

selection criteria	definition
size and physicochemical properties	10–27 heavy atoms
	<4 hydrogen-bond donors
	<7 hydrogen-bond acceptors
	0 < (hydrogen-bond donors + hydrogen-bond acceptors) < 10
	0 ≤ ClogP/ClogD ≤ 4
	at least one nonring atom if compound contains only one ring system
limited complexity	<8 rotatable bonds
	<5 ring systems
	no ring systems with more than two fused rings
absence of unwanted functionalities	exclusion of compounds containing potentially reactive, metabolically labile or toxic groups (as defined in the work of Brenk et al.[13])

Results

Data Collection

The data from 15 enzyme-based screening campaigns were selected for analysis (Tables 2 and 3).[15−29] The applied assays were end point assays using a variety of readout technologies, with typical compound concentration at 30 μM. All campaigns discussed in this analysis had Z′ values >0.5, indicating excellent assay performance.[30] To allow consistent comparison across multiple assays, only compounds that have been screened against all targets were included in the analysis. This led to a collection of 59 443 compounds from DSL against seven targets and 3287 compounds from FKL against 10 targets, which together represented data from five different assay readout technologies (Tables 2 and 3). Primary hits were defined as compounds above a certain threshold percentage inhibition value that was derived from the mean percentage inhibition value and its standard deviation for each individual assay. Compounds interfering with the particular assay readout technology, for example, colored compounds in colorimetric assays, were excluded. Followed-up hits were defined as primary hits that subsequently had identity and purity confirmed using LC-MS, IC50 values determined (a minimum of two independent measurements), and a Hill slope of the log concentration–response curve within the range 0.7–1.5. The latter criterion was applied to only include inhibitors that were potentially competitive with respect to the substrate and to exclude promiscuous inhibitors due to aggregate formation that often result in high Hill slopes.[31] It is noteworthy that the number of primary hits selected for subsequent IC50 determination was dependent on the capacity of the individual biological assay and the presence of structure–activity relationships within the primary screening data. Therefore, not every compound was followed up in certain assays, particularly those which resulted in a large number of primary hits. Hence, one should not draw conclusions about false positives based on the difference in the number of compounds between the two stages. In total, DSL delivered 1720 primary hits and 302 followed-up hits, whereas FKL delivered 747 primary hits and 255 followed-up hits (Tables 2 and 3).

Table 2

Number of Compounds Reported As Primary Hits and Followed-up Hits, Together with the Hit Rates and Readout Technology Used, for Each Biological Target Screened Using DSL

target	target class	no. of primary hits (hit rate (%))	no. of followed-up hits	readout technology
HsOGA[28]	glycosidase	38 (0.06)	6	fluorescence
picornaviral 3C cysteine protease[20]	cysteine protease	3 (0.005)	0	fluorescence
TbNMT[15,18]	acyltransferase	275 (0.46)	111	scintillation proximity
HsOGT[27]	glycosyltransferase	132 (0.22)	10	scintillation proximity
TbTryS[29]	ligase	611 (1.03)	127	colorimetric
TbTryR[17]	oxidoreductase	722 (1.21)	51	colorimetric
TbUAP[24]	nucleotidyltransferase	7 (0.01)	3	colorimetric
total		1720a (2.9)b	302a

After removing duplicate compounds.

Percentage of compounds that were active in at least one screen.

Table 3

Number of Compounds Reported As Primary Hits and Followed-up Hits, Together with the Hit Rates and Readout Technology Used, for Each Biological Target Screened Using FKL

target	target class	no. of primary hits (hit rate (%))	no. of followed-up hits	readout technology
HsOGT[27]	glycosyltransferase	5 (0.15)	1	scintillation proximity
TbTryS[29]	ligase	25 (0.76)	19	colorimetric
BpHSP90[21]	ATP-dependent chaperone	14 (0.43)	1	fluorescence polarization
LmCRK3[16]	Ser/Thr kinase	72 (2.19)	45	fluorescence polarization
PfCDPK5[25]	Ser/Thr kinase	43 (1.31)	20	fluorescence
TbPLK[22]	Ser/Thr kinase	62 (1.89)	6	luminescence
TbGSK3[23]	Ser/Thr kinase	406 (12.4)	55	luminescence
TbPK53[26]	Ser/Thr kinase	199 (6.05)	62	luminescence
TbPK50[26]	Ser/Thr kinase	425 (12.9)	82	luminescence
EcIspE[19]	GHMP kinase	1 (0.03)	1	luminescence
total		747a (22.7)b	255a

After removing duplicate compounds.

Percentage of compounds that were active in at least one screen.

After removing duplicate compounds. Percentage of compounds that were active in at least one screen. After removing duplicate compounds. Percentage of compounds that were active in at least one screen.

Hit Compound Distribution in Chemical Space

The chemical space represented by each screening library was defined using 15 descriptors characterizing the physicochemical properties and molecular complexity of the screening compounds (Table 4). These descriptors are mainly common parameters used for describing molecular features and binding capabilities of small molecules.[32,33] All categorical descriptors with discrete unit values were normalized relative to the number of heavy atoms or the number of carbon atoms to reflect the intrinsic trends of each descriptor independent of the size of a molecule.

Table 4

Descriptors Used for Describing the Chemical Space Represented by Each Screening Library

descriptor	abbreviation
molecular weight	MW
number of heavy atoms	HevAtoms
logarithmic octanol/water partition coefficient	ALogP
polar surface area	PSA
fraction of a
hydrogen-bond acceptors	fHBA
hydrogen-bond donors	fHBD
heteroatoms	fHetAtoms
rotatable bonds	fRotBonds
unsaturated bonds	fUnsatBonds
rings	fRings
heterocycles	fHetRings
aromatic rings	fAromRings
ring systems	fRingSys
sp³-hybridized carbon atoms b	fSP3C
normalized functional class extended connectivity fingerprints[32]a	FCFP4density

Normalized relative to the number of heavy atoms unless stated otherwise.

Normalized relative to the number of carbon atoms.[33]

Normalized relative to the number of heavy atoms unless stated otherwise. Normalized relative to the number of carbon atoms.[33] Principal component analysis (PCA) was performed on the descriptor matrix to visualize the chemical space represented by each screening library (Figure 1). For DSL, the first three principal components accounted for 22%, 20%, and 16% of the X-variance, respectively, with a cumulative R2 of 0.58 (Figure 1a and b). The mapping of hit compounds in the projected chemical space suggested that all the primary hits and followed-up hits were distributed across the entire chemical space, with no particular regions observed where no screening hits were reported. Similarly, the mapping of hit compounds in the 3D PCA projection for FKL (cumulative R2 = 0.62, Figure 1c and d) displayed a scattered distribution of all the primary hits and followed-up hits across the entire chemical space. Again, there were no particular regions of the chemical space where no screening hits were reported.

Figure 1

In an attempt to quantify the distribution of primary hits and followed-up hits in the screening libraries, the volume of chemical space represented in the 3D PCA plots was divided into eight regions (octants) around the center of origin (Figure 2a). The percentage of each category of compounds in all eight octants of the PCA plots was then assessed (Figure 2b and c).

Figure 2

(a) Illustrative diagram of octant assignments of the PCA diagrams in Figure 1. The center of origin (0, 0, 0) is at the intersection of all eight octants in the middle. (b and c) Percentage distribution of the full library, primary hits, and followed-up hits in each of the eight octants in DSL (b) and FKL (c).

The compounds in DSL were evenly distributed across all eight octants, with 10–15% of compounds in each octant (Figure 2b). Each octant also contained primary hits and followed-up hits from a broad range of targets (Figures S1 and S2, Supporting Information). However, the hit rates per octant varied. Of notable differences were octants 1 and 2, where approximately a 1.5-fold enrichment of primary hits and follow-up hits relative to the percentage of all screening compounds in the particular octant was observed. Mapping of descriptors in these octants on the loading plot (Table 5) revealed that these regions of chemical space were characterized by aromatics (octant 1) and heavy, lipophilic compounds (octant 2). The average molecular weight of compounds within these octants was, respectively, 21 and 45 Da higher than the average of the full library (318 Da), whereas the average ALogP was increased by 0.6–0.8 units compared to the DSL average (2.6) (Figure 3a). On the contrary, octants 4 and 8 displayed a 2-fold decrease in the percentage of primary hits and followed-up hits as compared to the percentage of all screening compounds (Figure 2b). These regions of chemical space featured more polar and heteroatom rich compounds (PSA = 113 (octant 4) vs 77 Å2 for the DSL average; fHetAtoms 20% and 32% above the DSL average, respectively), compounds with higher fraction of heterocycles (fHetRings 23% and 35% above the DSL average, respectively), and compounds with higher FCFP4density (FCFP4density 7% and 13% above the DSL average, respectively) (Figure 3a). A decrease in the percentage of primary hits and followed-up hits was also observed in octant 7 (Figure 2b), where compounds were characterized by a high fraction of sp3-carbon atoms (fSP3C 0.49, 88% above the DSL average, Figure 3a).

Table 5

Location of the 15 Descriptors around the Center of Origin of the 3D Loading Plots of the PCA Diagrams in Figure 1b (DSL) and d (FKL)

octant	DSL	FKL
1	fAromRings, fUnsatBonds	ALogP, HevAtoms, MW
2	ALogP, HevAtoms, MW	fAromRings, fUnsatBonds
3	fRotBonds	fHBD
4	fHBA, fHBD, fHetAtoms, PSA	fHBA, fHetAtoms, fRotBonds, PSA
5	fRings, fRingSys	–
6	–	fRings, fRingSys
7	fSP3C	FCFP4density, fHetRings
8	FCFP4density, fHetRings	fSP3C

Figure 3

Plots showing the average values of each descriptor within octants 1, 2, 4, 7, 8, and that for the full library for DSL (a) and FKL (b).

Scoring plots (left) and corresponding loading plots (right) of the PCA of the chemical space represented by DSL (a and b) and FKL (c and d). The gray ellipsoid corresponds to a confidence level of 95% of Hotelling’s T2 distribution. Primary hits are colored in blue, and followed-up hits are colored in red. (a) Illustrative diagram of octant assignments of the PCA diagrams in Figure 1. The center of origin (0, 0, 0) is at the intersection of all eight octants in the middle. (b and c) Percentage distribution of the full library, primary hits, and followed-up hits in each of the eight octants in DSL (b) and FKL (c). For FKL, the entire library was again evenly distributed across all eight octants, with each comprising of 10–15% of screening compounds (Figure 2c). Again, all octants contained primary hits and followed-up hits from a range of targets (Figures S1 and S2, Supporting Information). A similar trend as in DSL was observed with an enrichment of primary hits and followed-up hits in octants 1 (1.4-fold increase) and 2 (2-fold increase) where the chemical space was characterized by heavy, lipophilic compounds (octant 1) and aromatics (octant 2) (Table 5). For instance, the average molecular weight of compounds in octant 1 was 70 Da higher than the average of the full library (318 Da), and the average ALogP was 1.1 units higher than the FKL average (2.7) (Figure 3b). The regions of chemical space which had a decrease in percentage of primary hits and followed-up hits were octants 4 (3-fold decrease) and 8 (3.2-fold decrease) (Figure 2c), where the chemical space was characterized by polar compounds (octant 4; PSA = 93 vs 72 Å2 for the FKL average) and aliphatic compounds (octant 8; fSP3C 0.42, 99% above the FKL average, Figure 3b). Plots showing the average values of each descriptor within octants 1, 2, 4, 7, 8, and that for the full library for DSL (a) and FKL (b). We then proceeded to further evaluate the average ligand efficiency of followed-up hits within each octant (Figure 4).[34] As expected, hits with the highest average ligand efficiency were located in octants characterized by compounds with the lowest average molecular weight (octant 8 for DSL and octants 3 and 7 for FKL). However, we also observed differences in the average ligand efficiency in octants where the average molecular weight of followed-up hits was comparable. For instance, octants 1–4 of DSL contained followed-up hits of similar size (MW = 360–372 Da, Figure 4a). Out of those hits, the polar and heteroatom rich compounds in octant 4 (Figure 3a) displayed the highest average ligand efficiency (0.30 kcal mol–1 per heavy atom). This trend was also present in FKL. Compounds in octant 4 (polar compounds, Figure 3b) achieved the highest average ligand efficiency (0.36 kcal mol–1 per heavy atom) among the octants containing followed-up hits of similar size (MW = 332–342 Da, Figure 4b).

Figure 4

Plots showing the average ligand efficiency and molecular weight (MW) of followed-up hits within each octant and that for the full library for DSL (a) and FKL (b).

PAINS Evaluation

The presence of nonspecific frequent hitters within screening libraries is a common problem associated with false positives from screening campaigns.[11] To investigate if any problematic compounds were present in our screening libraries, we followed the definitions of Baell and Holloway[9] which stated that screening compounds might be displaying PAINS behavior if reported active in more than 50% of the number of assays screened. Compounds within each screening library were grouped according to the number of assays in which each individual compound was reported as active (Table 6). Primary hits and followed-up hits were tabulated separately. Screened against seven different targets (Table 2), DSL had no individual compound reported as primary hits in more than three assays. At the level of followed-up hits, no compound was active in more than two assays (Table 6). Similarly, FKL was screened against ten different targets (Table 3), and no compound was active in more than five assays at the level of primary hits or followed-up hits (Table 6). These observations suggested that both DSL and FKL did not contain any compounds displaying PAINS behavior according to the definitions of Baell and Holloway.

Table 6

Breakdown of the Number of Assays in Which Each Compound Was Reported Active As a Primary Hit or Followed-up Hit

	no. of assays
DSL	0	1	2	3	4	5	6+	total
primary hits	57723	1657	58	5	0	0	0	59443
followed-up hits	59141	296	6	0	0	0	0	59443

In addition, we evaluated whether compounds containing structural motifs mapping to literature PAINS filters[9] were frequently reported as active in multiple assays using our screening data. We applied the PAINS substructure filters published by Baell and Holloway[9] to flag any compounds within these libraries that contained structural motifs which are likely to display PAINS behavior (Table 7 and Supporting Information). For DSL, 1725 compounds (2.9%) matching 97 literature PAINS structural motifs were flagged by the substructure filters as potential PAINS, whereas 50 compounds (1.5%) matching 9 literature PAINS structural motifs were flagged for FKL (Supporting Information Tables S1 and S2). Only 85 of the flagged 1725 compounds in DSL were reported as a primary hit, with 28 compounds also satisfying our followed-up hit criteria (Table 7). This illustrated that over 95% of the flagged compounds were inactive against all seven targets screened. Switching to FKL, 31 of the 50 flagged compounds were not active against any of the ten targets screened, which represented a 62% clean rate of these flagged compounds. Most of the remaining flagged compounds were only active in one or two assays.

Table 7

Breakdown of the Number of Assays in Which Each Compound Flagged As PAINS Was Reported Active As a Primary Hit or Followed-up Hit

	no. of assays
DSL	0	1	2	3	4	5	6+	total
primary hits	1640	76	9	0	0	0	0	1725
followed-up hits	1697	27	1	0	0	0	0	1725

Further, we assessed PAINS behavior on a structural motif level instead of on an individual compound level. For this analysis, we grouped the flagged compounds within each library according to the PAINS structural motifs and investigated in how many different assays representatives of each motif appeared as actives. Out of the 97 motifs present in DSL compounds, 55 motifs were considered underrepresented with fewer than five examples and were excluded from the following analysis (Supporting Information Table S1). No active compounds were reported for 19 of the remaining 42 motifs, while another 12 motifs contained compounds that were active only in one assay. Only one motif (5-membered alkylidene heterocycles, ene_five_het_B in Baell and Holloway)[9] contained compounds that were altogether reported as primary hits in more than half of the assays (Tables 8 and S1). In FKL, only two of the nine motifs present were reasonably represented by at least five examples, and none of these contained compounds that were altogether reported as primary hits in more than half of the assays (Tables 8 and S2). Since there were only a small number of flagged compounds that were classified as followed-up hits, we decided that the analysis of followed-up hits grouped into PAINS structural motifs would be inconclusive.

Table 8

Breakdown of the Number of Assays in Which Each PAINS Structural Motif Contained Compounds Reported As a Primary Hit

	no. of assays
DSL	0	1	2	3	4	5	6+	total
all motifs	70	15	4	7	1	0	0	97
motifs with at least five representatives	19	12	4	6	1	0	0	42

Discussion

The efficiency of hit identification in automated screening relies heavily on the quality of the screening libraries used. There are numerous ways to evaluate the quality of a screening library. Here, we were interested in the utilization of chemical space represented by a diverse (DSL) and a kinase-focused (FKL) lead-like screening library and the distribution of screening hits within their respective chemical space. We also assessed whether any library compounds were displaying PAINS behavior according to the definitions of Baell and Holloway.[9] Both libraries delivered screening hits across a range of targets (Tables 2 and 3). DSL had hit rates ranging from 0.005 to 1.21%, whereas the hit rates for FKL range from 0.03 to 12.9%, with the highest hit rates against protein kinases for which the library was originally designed.[13] These hit rates are comparable to those typically reported for screening campaigns,[35−37] especially considering that most of the targets apart from the protein kinases have not previously been subjected to automated screening. This indicates that the investigated libraries are overall suitable for hit discovery. According to the chemical space analyses, both DSL and FKL libraries were able to deliver hits across the entire chemical space represented, and there were no apparent regions of chemical space where no hits could be found (Figure 1). This illustrates that the entire chemical space covered by these lead-like libraries can be utilized to probe interactions between proteins and small-molecule ligands. However, despite that hits were identified across the entire chemical space of the respective libraries, the distribution of hits was uneven when we analyzed the occupancy of each octant of the 3D-PCA plots of each library individually (Figure 2). We observed, for both libraries, enrichments in the percentage of reported hits in octants occupied by heavy, lipophilic compounds, whereas the percentage of reported hits decreased in octants characterized by polar compounds or compounds containing a high fraction of sp3-carbon atoms (Figure 3). Nonetheless, it should be emphasized that all of the hits are well within lead-like chemical space. We propose that the observed uneven distribution of screening hits across the analyzed chemical space may be explained by the different intrinsic binding capabilities of compounds in the relevant octants. Hit compounds in the enriched octants are relatively lipophilic and bulky and contain a large fraction of aromatic rings (Figure 3). Accordingly, these molecules are rich in unsaturation and represent relatively flat molecular shapes. Owing to the relatively simple and generic molecular shapes, these compounds are more likely to participate in protein–ligand interactions without requiring a stringent spatial complement, therefore leading to higher hit rates. On the contrary, compounds with a high fraction of sp3-carbon atoms represent more complex molecular shapes that require a higher shape complementarity at the protein–ligand interface to accommodate ligand binding.[33] Similarly, a complementary electrostatics match would be required for the successful binding of polar and heteroatom rich compounds. Hence, the hit rate obtained from polar compounds or compounds containing a high fraction of sp3-carbon atoms would inevitably be lower than that from lipophilic aromatic compounds. Recently, it was argued that compounds that are polar, heteroatom rich, or contain a high fraction of sp3-carbon atoms represent better prospects for drug discovery, as candidates derived from these compounds are more likely to be successful in clinical trials.[33,38,39] Our analysis demonstrated that these compounds are also better lead candidates in terms of average ligand efficiency, exceeding on average the 0.30 kcal mol–1 per heavy atom cutoff that is generally considered favorable for developing a potent, Rule-of-Five compliant drug candidate (octants 4, 7, and 8; Figure 4).[34] It would therefore be desirable to increase the number of hits obtained from these octants. In order to attain this, we suggest that the entire screening library should not be evenly distributed across the octants but instead be enriched in compounds from the underrepresented octants to achieve a more even distribution of screening hits across the entire chemical space represented. In addition to library composition, we also envisage that a departure from screening at the same fixed molar concentration for all library compounds in a single screening campaign may help balance the distribution of screening hits from bias toward heavy, lipophilic compounds that on average have comparably lower ligand efficiency (octants 1 and 2, Figure 4). Since smaller compounds tend to display a lower potency, the commonly used screening paradigm of one-concentration-fits-all favors the identification of heavy compounds, whereas smaller compounds are disadvantaged even when all compounds are within lead-like chemical space.[40] If compounds are screened at variable concentrations, with higher concentrations used for smaller compounds to match with their theoretical binding capacity,[41] screening hits with lower potency but higher ligand efficiency would no longer be discriminated. Both libraries are free from compounds displaying PAINS behavior according to the definitions of Baell and Holloway (Tables 6 and 7).[9] As frequent hitters are a common source of false positives in screening campaigns,[10] this is a surprisingly positive result. It is noteworthy that in the classification criteria used for primary hits and followed-up hits, compounds which might be interfering with a certain assay readout technology (for example compounds that absorb light at a certain wavelength of a colorimetric assay, or compounds which displayed quenching behavior in a fluorescence assay) were excluded as primary hits from the corresponding assays in the first place. Hence, the presented analysis of PAINS should be clean from assay-dependent problematic compounds. When compiling the libraries, apart from removing obviously colored compounds, no specific filters were used to remove potentially promiscuous compounds.[13] However, reactive compounds that potentially bear toxicity issues were discarded. As there is some overlap between these filters and the PAINS motifs, it appears that this also helped to improve the libraries in terms of promiscuous behavior. Even though some compounds were still flagged to contain PAINS structural motifs, upon detailed analysis, the majority of these compounds did not show any activity against the panel of targets screened (Table 7 and the Supporting Information). This was also valid when the analysis was carried out on a structural motif level instead of on an individual compound level (Table 8 and the Supporting Information). Thus, in our hands, many of the reasonably represented PAINS structural motifs in our libraries appeared to be less of a nuisance in biochemical screens for enzyme assays than suggested previously by others.[9,42] For the purpose of enhancing the diversity of a screening library, we therefore consider it justifiable to include compounds containing PAINS structural motifs that were demonstrated to be relatively clean in our analysis, in particular when such compounds contain additional scaffolds that are otherwise not commercially available without the PAINS substituents. However, such compounds should be annotated in the library to ensure that the absence of promiscuous behavior is rigorously verified prior to any optimization efforts.

Conclusions

Using screening data from two lead-like screening libraries against 15 enzyme targets, we demonstrated that both libraries delivered hits across a range of targets. The screening hits spanned the entire lead-like chemical space covered by these libraries, although the distribution of screening hits was found to be uneven. With observed enrichments of screening hits that are at the higher end of the molecular weight and lipophilicity spectrum for lead-like compounds, we propose that screening libraries should in the future be enriched in polar, aliphatic compounds. In conjunction with the introduction of variable concentrations screening, we envisage that these could rectify the uneven distribution of hits observed. Such a movement in future screening library design should assist in discovering a higher proportion of screening hits with higher ligand efficiency and properties that have recently been suggested to lead to better selectivity and reduced likelihood of promiscuity, thereby maximizing potential success in clinical trials. In addition, our analysis suggests a less stringent approach in the application of the literature PAINS filters in removing screening compounds. Both screening libraries were shown to be clean from any PAINS behavior according to the literature definitions. Even though some compounds were flagged as PAINS, the analysis on reasonably represented structural motifs demonstrated that some of these motifs appeared to be less problematic than previously suggested. Although compounds flagged by these PAINS structural motifs may not represent the top candidates for optimization into a drug when there are a large number of screening hits available, it is arguable whether such compounds should be completely excluded from a screening library. This is particularly relevant in diverse screening libraries that are compiled for screening against a wide spectrum of targets and phenotypes, since challenging screening campaigns might not always achieve high hit rates. We therefore consider it justifiable to retain compounds containing PAINS motifs demonstrated to be apparently clean in this study to maximize the chemical diversity in a screening library.

Experimental Procedures

Descriptor Calculations

The 15 descriptors were calculated using Pipeline Pilot professional client 8.0 (Accelrys, Inc.) applying the definitions in the software unless stated otherwise. All categorical descriptors with discrete unit values were normalized relative to the number of heavy atoms unless stated otherwise. A heteroatom was defined as the elements S, O, or N. An unsaturated bond was defined as a bond with a bond order greater than one. A heterocycle was defined as a ring containing S, O, or N in the fragment that resulted from generating fragments by rings. An sp3-hybridized carbon atom was defined as any carbon atom which has an atom hybridization of sp3 according to Pipeline Pilot calculations. The fraction of sp3-hybridized carbon atoms was normalized relative to the total number of carbon atoms in the same molecule.[33] FCFP4density was defined as the ratio between the number of bits in the FCFP4 fingerprint generated and the number of heavy atoms.[32] Ligand efficiency of followed-up hits was determined using the IC50 value (the most potent IC50 was chosen for calculations when a compound has IC50 values for more than one target) following the equationwhere R = 1.98 × 10–3 kcal K–1 mol–1 and T = 300 K.[34] (IC50 values were typically determined with a substrate concentration close to Km so that IC50 ≈ Ki assuming competitive inhibition.)

Chemical Space Analysis

The 3D-PCA plots were generated using Simca-P+ 12.0.1 (Umetrics). The descriptor matrix was normalized to unit variance before carrying out PCA using the PCA-X option under standard settings. The number of principal components was based on automatic cross-validation within the software.

PAINS Analysis

The literature PAINS filters in SLN format (Tables S6, S7, and S9 in the Supporting Information from the work of Baell and Holloway)[9] were applied using Sybyl-X 1.2 (Tripos). The flagged compounds were mapped to individual PAINS substructure motifs using in-house Python scripts.

42 in total

1. Ligand efficiency: a useful metric for lead selection.

Authors: Andrew L Hopkins; Colin R Groom; Alexander Alex
Journal: Drug Discov Today Date: 2004-05-15 Impact factor: 7.851

Review 2. The influence of the 'organizational factor' on compound quality in drug discovery.

Authors: Paul D Leeson; Stephen A St-Gallay
Journal: Nat Rev Drug Discov Date: 2011-09-30 Impact factor: 84.694

3. Analysing the output from primary screening.

Authors: Dawn Nowlin; Patrick Bingham; Andrew Berridge; Philip Gribbon; Philip Laflin; Andreas Sewing
Journal: Comb Chem High Throughput Screen Date: 2006-06 Impact factor: 1.339

4. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays.

Authors: Jonathan B Baell; Georgina A Holloway
Journal: J Med Chem Date: 2010-04-08 Impact factor: 7.446

5. Hsp90 is essential in the filarial nematode Brugia pahangi.

Authors: Eileen Devaney; Kerry O'neill; William Harnett; Luke Whitesell; Jane H Kinnaird
Journal: Int J Parasitol Date: 2005-03-19 Impact factor: 3.981

6. N-myristoyltransferase inhibitors as new leads to treat sleeping sickness.

Authors: Julie A Frearson; Stephen Brand; Stuart P McElroy; Laura A T Cleghorn; Ondrej Smid; Laste Stojanovski; Helen P Price; M Lucia S Guther; Leah S Torrie; David A Robinson; Irene Hallyburton; Chidochangu P Mpamhanga; James A Brannigan; Anthony J Wilkinson; Michael Hodgkinson; Raymond Hui; Wei Qiu; Olawale G Raimi; Daan M F van Aalten; Ruth Brenk; Ian H Gilbert; Kevin D Read; Alan H Fairlamb; Michael A J Ferguson; Deborah F Smith; Paul G Wyatt
Journal: Nature Date: 2010-04-01 Impact factor: 49.962

7. Escape from flatland: increasing saturation as an approach to improving clinical success.

Authors: Frank Lovering; Jack Bikker; Christine Humblet
Journal: J Med Chem Date: 2009-11-12 Impact factor: 7.446

8. Dihydroquinazolines as a novel class of Trypanosoma brucei trypanothione reductase inhibitors: discovery, synthesis, and characterization of their binding mode by protein crystallography.

Authors: Stephen Patterson; Magnus S Alphey; Deuan C Jones; Emma J Shanks; Ian P Street; Julie A Frearson; Paul G Wyatt; Ian H Gilbert; Alan H Fairlamb
Journal: J Med Chem Date: 2011-09-01 Impact factor: 7.446

9. IspE inhibitors identified by a combination of in silico and in vitro high-throughput screening.

Authors: Naomi Tidten-Luksch; Raffaella Grimaldi; Leah S Torrie; Julie A Frearson; William N Hunter; Ruth Brenk
Journal: PLoS One Date: 2012-04-25 Impact factor: 3.240

10. Trypanosoma brucei Polo-like kinase is essential for basal body duplication, kDNA segregation and cytokinesis.

Authors: Tansy C Hammarton; Susanne Kramer; Laurence Tetley; Michael Boshart; Jeremy C Mottram
Journal: Mol Microbiol Date: 2007-07-27 Impact factor: 3.501

6 in total

1. FAF-Drugs3: a web server for compound property calculation and chemical library design.

Authors: David Lagorce; Olivier Sperandio; Jonathan B Baell; Maria A Miteva; Bruno O Villoutreix
Journal: Nucleic Acids Res Date: 2015-04-16 Impact factor: 16.971

Review 2. Drug-Like Protein-Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology.

Authors: Bruno O Villoutreix; Melaine A Kuenemann; Jean-Luc Poyet; Heriberto Bruzzoni-Giovanelli; Céline Labbé; David Lagorce; Olivier Sperandio; Maria A Miteva
Journal: Mol Inform Date: 2014-06-02 Impact factor: 3.353

3. Open Source Drug Discovery: Highly Potent Antimalarial Compounds Derived from the Tres Cantos Arylpyrroles.

Authors: Alice E Williamson; Paul M Ylioja; Murray N Robertson; Yevgeniya Antonova-Koch; Vicky Avery; Jonathan B Baell; Harikrishna Batchu; Sanjay Batra; Jeremy N Burrows; Soumya Bhattacharyya; Felix Calderon; Susan A Charman; Julie Clark; Benigno Crespo; Matin Dean; Stefan L Debbert; Michael Delves; Adelaide S M Dennis; Frederik Deroose; Sandra Duffy; Sabine Fletcher; Guri Giaever; Irene Hallyburton; Francisco-Javier Gamo; Marinella Gebbia; R Kiplin Guy; Zoe Hungerford; Kiaran Kirk; Maria J Lafuente-Monasterio; Anna Lee; Stephan Meister; Corey Nislow; John P Overington; George Papadatos; Luc Patiny; James Pham; Stuart A Ralph; Andrea Ruecker; Eileen Ryan; Christopher Southan; Kumkum Srivastava; Chris Swain; Matthew J Tarnowski; Patrick Thomson; Peter Turner; Iain M Wallace; Timothy N C Wells; Karen White; Laura White; Paul Willis; Elizabeth A Winzeler; Sergio Wittlin; Matthew H Todd
Journal: ACS Cent Sci Date: 2016-09-14 Impact factor: 14.553

4. Antibacterial Evaluation and Virtual Screening of New Thiazolyl-Triazole Schiff Bases as Potential DNA-Gyrase Inhibitors.

Authors: Cristina Nastasă; Dan C Vodnar; Ioana Ionuţ; Anca Stana; Daniela Benedec; Radu Tamaian; Ovidiu Oniga; Brînduşa Tiperciuc
Journal: Int J Mol Sci Date: 2018-01-11 Impact factor: 5.923

5. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS.

Authors: Stephen J Capuzzi; Eugene N Muratov; Alexander Tropsha
Journal: J Chem Inf Model Date: 2017-02-25 Impact factor: 4.956

6. In Silico and In Vitro Screening Constituents of Eclipta alba Leaf Extract to Reveal Antimicrobial Potential.

Authors: Rahul Kumar Sharma; Shabana Bibi; Hitesh Chopra; Muhammad Saad Khan; Navidha Aggarwal; Inderbir Singh; Syed Umair Ahmad; Mohammad Mehedi Hasan; Mahmoud Moustafa; Mohammed Al-Shehri; Abdulaziz Alshehri; Atul Kabra
Journal: Evid Based Complement Alternat Med Date: 2022-08-17 Impact factor: 2.650

6 in total