Literature DB >> 30023635

Application of a New Scaffold Concept for Computational Target Deconvolution of Chemical Cancer Cell Line Screens.

Ryo Kunimoto¹, Dilyana Dimova¹, Jürgen Bajorath¹.

Abstract

Target deconvolution of phenotypic assays is a hot topic in chemical biology and drug discovery. The ultimate goal is the identification of targets for compounds that produce interesting phenotypic readouts. A variety of experimental and computational strategies have been devised to aid this process. A widely applied computational approach infers putative targets of new active molecules on the basis of their chemical similarity to compounds with activity against known targets. Herein, we introduce a molecular scaffold-based variant for similarity-based target deconvolution from chemical cancer cell line screens that were used as a model system for phenotypic assays. A new scaffold type was used for substructure-based similarity assessment, termed analog series-based (ASB) scaffold. Compared with conventional scaffolds and compound-based similarity calculations, target assignment centered on ASB scaffolds resulting from screening hits and bioactive reference compounds restricted the number of target hypotheses in a meaningful way and lead to a significant enrichment of known cancer targets among candidates.

Entities: CellLine Chemical Disease Gene Species

Year: 2017 PMID： 30023635 PMCID： PMC6044569 DOI： 10.1021/acsomega.7b00215

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Drug discovery research is experiencing a renaissance of phenotypic approaches.[1,2] Especially high-content and phenotypic screening assays have become a hot topic in recent years.[3,4] It is generally thought that phenotypic screens might produce leads that are more relevant for addressing complex biology in vivo than other compounds identified in target-based assays. Whether or not such expectations might generally be true remains to be determined. Be that as it may, phenotypic discovery is challenged by the need to identify—or at least narrow down—cellular targets for compounds with interesting phenotypic readouts, a process often referred to as target deconvolution.[5,6] For compound selection and optimization as well as late-stage preclinical evaluation, target knowledge continues to be required in many cases, regardless of how candidate compounds have originally been identified. In addition, there is strong scientific interest in identifying target(s) whose inhibition in cellular environments might result in interesting functional effects. For target deconvolution from phenotypic screens, different experimental approaches have been developed or adapted,[5−7] including, among others, various proteomics techniques and the use of small molecular probes with confirmed activity against selected targets. Moreover, target identification has also become an attractive task for computational analysis using different methods. For example, drug-target networks[8,9] establish compound-based links between targets and help to better understand complex interactions involving multiple compounds and targets. For drugs, new targets can often be proposed on the basis of network representations that might rationalize side effects.[9] Such networks can also be generated for bioactive compounds other than drugs and can be computationally analyzed. Furthermore, machine-learning models combining small molecule and target information (e.g., chemical descriptors and protein sequences) have been generated to predict novel compound-target pairings.[10,11] Moreover, targets of novel active compounds are often inferred from molecular similarity between these compounds and known actives.[12−14] For similarity calculations, a variety of chemical descriptors and functions exist.[15,16] Target hypotheses for new chemical entities can be derived not only by molecular similarity calculations producing numerical values but also by assessing substructure relationships between compounds as a measure of similarity. For example, targets can be predicted for new active compounds by identifying structural analogues and comparing their target annotations[17] or on the basis of molecular scaffolds,[18] which are generated to capture core structures of compounds.[19] As such, scaffolds often represent a series of known active compounds sharing the same core. A systematic scaffold analysis provides a structural organization scheme, and target annotations of compounds containing the same scaffold can be assigned to each scaffold.[19] This approach generates activity-annotated scaffold libraries to which new active compounds without known targets can be mapped. If scaffolds of new actives match the existing ones, target hypotheses can be inferred. The classical way of defining scaffolds for medicinal chemistry applications is according to Bemis and Murcko, which gave rise to BM scaffolds.[20] These scaffolds are obtained from compounds by the removal of all R-groups while retaining the ring systems and linker fragments connecting the rings.[20] Various extensions of the BM scaffold concept such as the Scaffold Tree[21] have been introduced. The Scaffold Tree decomposes BM scaffolds along tree branches according to chemical rules until only individual rings remain and thereby establishes structural relationships between the scaffolds.[21] Herein, we report the application of a new scaffold concept termed analog series-based (ASB) scaffold[22] to computationally assign potential targets to hits from cancer cell line screens, which are a major resource for phenotypic discovery.[23]

Results and Discussion

ASB Scaffold Concept and Substructure-Based Similarity Assessment

Figure compares the generation of ASB and BM scaffolds. Compared with conventional scaffolds, ASB scaffolds were designed to further increase the medicinal chemistry relevance by (i) omitting a formal hierarchical distinction of ring systems, linkers, and substituent; (ii) representing a series of analogues (rather than individual compounds); and (iii) incorporating reaction rules.[22] The definition of ASB scaffolds is thus more inclusive and restrictive than compound-based scaffold concepts. From an ASB scaffold, all analogues of the corresponding series can be regenerated following retrosynthetic rules. The ASB scaffold contains all substructures that are conserved within a series and a consensus substitution site where R-groups distinguish different analogues comprising the series.

Figure 1

Generation of ASB and BM scaffolds. For a compound series (A–C), the generation of ASB and BM scaffolds is illustrated. Two unique BM scaffolds were isolated from these compounds by removing substituents. RECAP-MMP cores of compounds A–C are shown. The core shared among all compounds (highlighted in orange) represents the ASB scaffold. For a substructure-based similarity assessment, all compounds represented by the same (BM or ASB) scaffold were assigned to the scaffold and classified as similar. For a compound-based similarity evaluation, pairwise Tanimoto coefficient values for the chosen reference (ChEMBL) and query compounds (from NCI screens) were calculated and a similarity threshold was applied.

Analysis Concept and Protocol

A major goal of our analysis was the evaluation of a new scaffold concept for the assignment of potential targets to hits from cancer cell line screens. This setup served as a model system for target deconvolution from phenotypic assays. The underlying idea was that structurally very similar active compounds are likely to share targets (which is well-appreciated in medicinal chemistry). Therefore, analog series were systematically extracted from combined screening and ChEMBL compounds to comprehensively capture structural relationships, and the resultant ASB scaffolds were collected. ASB scaffolds representing both screening hits and ChEMBL compounds were prioritized, and known target annotations of ChEMBL compounds were assigned to hits sharing the same ASB scaffold. Then, target annotations were collected for each cell line. The analysis was centered on ASB scaffolds to ensure that only close structural analogues were considered for target transfer from known bioactive compounds to hits. As such, ASB scaffolds provided a “meta structure” for target deconvolution. The analysis protocol that was systematically applied to all 73 cell line screens is illustrated in Figure .

Figure 2

Analysis scheme. For a given cell line, screening compounds (hits, colored in blue; inactive compounds, pink) and bioactive compounds from ChEMBL (green) were pooled. From this compound pool, analog series were extracted (depicted as clusters) and series yielding ASB scaffolds (orange) identified. ASB scaffolds resulting from series containing screening hits and ChEMBL compounds (i.e., ASB3 and ASB4) were determined. Target annotations of all bioactive compounds represented by the shared ASB scaffolds were assembled and the union of these targets (i.e., T1, T2, and T3) was assigned to screening hits of this cell line. The approach is conceptually based on molecular similarity to derive compound-target hypotheses, specifically on substructure-based similarity; that is, compounds are classified as similar if they are represented by the same scaffold. Accordingly, we have compared ASB scaffolds and conventional BM scaffolds in the same analysis context and, in addition, carried out conventional similarity searching as another reference calculation. In the latter case, screening hits were used as templates for similarity searching in ChEMBL. If similar compounds were identified, their target annotations were assigned to the hits. For our analysis, many properties assigned to scaffolds such as promiscuity, selectivity, or privileged substructure characteristics that are often discussed in medicinal chemistry[19] are not relevant. Neither do we need to consider relative contributions of core structures and R-groups to biological activity. Rather, in the context of our analysis, the use of scaffolds for the structural organization of active compounds becomes critically important, which is only one of many aspects often considered in the scaffold-based analysis of compound activity data.[19]

Scaffold and Compound Statistics

Our analysis protocol identified 99 unique ASB scaffolds shared by screening hits and ChEMBL compounds, 927 shared BM scaffolds, and 25 390 ChEMBL compounds classified on the basis of similarity searching as being similar to screening hits (Table ). Hence, there were many more compound-based BM than ASB scaffolds and many more similar compounds than scaffolds. For shared ASB and BM scaffolds, 7–40 and 56–388 scaffolds were obtained per cell line screen, with a mean of 18.8 and 209.7, respectively (Table ). Thus, many scaffolds were detected multiple times in different cell line screens. In addition, the number of similar compounds per cell line ranged from 962 to 9465, with a mean of 4883.

Table 1

Scaffold and Similarity Search Statisticsa

	per cell line
	MIN–MAX	AVG	TOTAL
ASB Scaffolds
# shared ASB scaffolds	7–40	18.8	99
# targets	30–119	73.7	232
# cancer targets	14–62	26.5	108
cancer target rate (%)	23.3–59.8	36.4	46.6
BM Scaffolds
# shared BM scaffolds	56–388	209.7	927
# targets	595–1030	925.1	1130
# cancer targets	197–303	275.9	330
cancer target rate (%)	29.0–34.0	30.0	29.2
Similarity Search
# similar ChEMBL CPDs	962–9465	4883	25 390
# targets	393–972	756.8	1249
# cancer targets	147–311	264.1	366
cancer target rate (%)	31.1–39.5	34.8	34.1

The table reports statistics for scaffold analysis and similarity searching. For ASB and BM scaffolds, ranges (MIN–MAX), averages (AVG), and total numbers (TOTAL) of scaffolds from screening hits and scaffolds that were shared with ChEMBL reference compounds, corresponding targets, and cancer targets are provided across all 73 cell lines. For similarity search calculations, ranges, averages, and total numbers are reported for similar compounds, all targets, and cancer targets. Exemplary shared ASB scaffolds are shown in Figure together with the compound series from which they originated. These examples illustrate another important aspect of the ASB scaffold analysis. In these cases, close screening compound analogues were detected that were either active or inactive in the cell line screen, thus providing immediate opportunities for reassessing assay results by retesting selected hits and/or inactive compounds, prior to the target analysis. In many other instances, shared ASB scaffolds represented only active compounds, as illustrated in Figure .

Figure 3

Shared ASB scaffolds. Examples of shared ASB scaffolds (orange background) are shown for (a) SNB-75 (CNS cancer) and (b) HT-29 (colon cancer) cell lines together with corresponding hits (blue box), inactive compounds (pink), and ChEMBL compounds (green). R-groups distinguishing these analogs are shown in red.

Target Assignment

Global Target Distribution

For each cell line screen, the union of targets associated with shared scaffolds was determined. The 927 shared BM scaffolds yielded a total of 1130 unique targets across all cell lines, with a range of 595 to more than 1000 targets per line, as reported in Table . Thus, on the basis of BM scaffolds, approximately 70% of all investigated human targets were assigned to screening hits as potential targets. Similarity searching suggested a larger number of unique 1249 targets of screening hits. However, when ranges of targets over cell line screens were considered—instead of total numbers of unique targets—BM scaffold analysis yielded more targets than similarity searching, with an average 925 versus 756 targets per cell line, respectively (Table ). Thus, on the basis of compound similarity, individual targets were much less frequently detected than on the basis of shared BM scaffolds. For similarity searching, the number of similar compounds and the resultant targets might be reduced by further increasing the similarity threshold value. Regardless, the control calculations showed that generally applied compound similarity criteria would not be suitable for target assignment across cell line screens. At face value, implicating approximately 70% or more of all preselected targets in activity signals from cell line screens—on the basis of BM scaffolds or compound similarity—was considered not realistic, despite variations observed across different cell lines. By contrast, the structurally more conservative ASB scaffold approach involving multiple compounds significantly reduced the number of target assignments. On the basis of 99 identified shared ASB scaffolds (approximately an order of magnitude less than shared BM scaffolds), a total of 232 unique targets were assigned, with a mean of 74 targets per cell line. Thus, shared ASB scaffolds implicated only approximately 14% of all targets in cell line screens and also controlled the number of targets per line.

Cancer Targets

To specifically focus observed differences in target distributions on the cancer cell line screening, the assignment of known cancer targets was analyzed, which represented a subset of all monitored targets. ASB scaffolds, BM scaffolds, and similarity searching identified 108, 330, and 366 known cancer targets, respectively, as potential targets for screening hits across all cell lines, with ranges of 14–62 (ASB), 197–303 (BM), and 147–311 (similarity) cancer targets per line (Table ). With one exception (macrophage colony stimulating factor receptor; CSF1R; ChEMBL TID 1844), the set of targets identified on the basis of ASB scaffolds overlapped with the other sets. Table S2 reports the cancer targets assigned on the basis of ASB scaffolds to each cell line screen. ASB scaffolds assigned approximately one-third of cancer targets compared with BM scaffolds and similarity searching, although the number of all targets differed by more than one order of magnitude. This corresponded to a significant enrichment of cancer targets among all assigned targets, as illustrated in Figure . Although the application of ASB scaffolds resulted in comparably low numbers of assigned targets (Figure a), the ratio of cancer targets relative to all targets was higher for ASB than for BM scaffolds and similarity searching (Figure b). Given that absolute target numbers were more realistic for ASB than BM scaffolds and similarity searching, the observed enrichment of cancer targets for ASB scaffolds was considered a significant finding. The corroborating evidence for cancer target assignment was provided by the frequent occurrence of established cancer targets across different cell lines, which was clearly evident for ASB scaffolds, given the reduced “target background”. For example, on the basis of ASB scaffolds, well-known cancer targets such as P-glycoprotein 1 and tyrosine-protein kinases Fyn and Src were implicated in 73, 62, and 66 cell line screens, respectively. In total, for ASB scaffolds, 46.6% of all assigned targets were cancer targets, with an average of 36.4% per cell line.

Figure 4

Target distribution. For ASB scaffolds (orange), BM scaffolds (cyan), and similarity searching (SIM, magenta), boxplots report the distribution of (a) all targets and (b) the percentage of cancer targets for all 73 cell lines. Boxplots show the smallest value (bottom), first quartile (lower boundary of the box), median value (red line), third quartile (upper boundary of the box), largest value (top), and outliers (blue dots).

Conclusions

In this work, we have investigated a substructure-based similarity approach to computationally deconvolute targets from 73 chemical cancer cell screens used as a model system for phenotypic assays. Assigning targets on the basis of ligand similarity is a major approach to target identification in phenotypic discovery. The analysis was focused on a recently introduced molecular scaffold definition, ASB scaffolds, designed to further increase the medicinal chemistry relevance of scaffolds as core structure representations. Calculations on the basis of conventional BM scaffolds and whole-molecule Tanimoto similarity served as references. ASB scaffolds are structurally more comprehensive and conservative than other molecular representations for similarity assessment, given their default dependence on compound series. As a consequence, ASB scaffolds produced fewer target hypotheses than BM scaffolds and similarity searching, thereby counteracting the “target inflation” observed for ligand similarity-based target prediction. Moreover, for ASB scaffolds, a significant enrichment of known cancer targets among candidates assigned to screening hits was observed, suggesting that the ASB scaffold approach provides a promising addition to current computational target deconvolution methods.

Materials and Methods

Scaffolds

Conventional BM scaffolds were generated from active compounds by the removal of all R-groups while retaining ring systems and linker fragments connecting rings.[20] Furthermore, new ASB scaffolds[22] were isolated from compounds. To generate ASB scaffolds, analog series were first systematically identified by applying the matched molecular pair (MMP) approach.[24] An MMP is defined as a pair of compounds that are distinguished only by a chemical change at a single site.[24] As such, an MMP consists of a conserved MMP core structure and a pair of exchanged substituents. MMPs were generated by applying an algorithm that systematically fragments molecules at exocyclic single bonds and stores resulting cores and substituent fragments in an index table from which MMPs are enumerated.[25] Retrosynthetic (RECAP) rules[26] were applied to fragment source compounds in which exchanged fragments conform to chemical reactions (thereby replacing random fragmentation steps), yielding RECAP-MMPs.[27] From all RECAP-MMPs of active compounds, a network was computed in which nodes represented compounds and edges pairwise RECAP-MMP relationships.[28] In this network, each disjoint cluster contained a unique series of analogs[28] from which ASB scaffolds were isolated.[22] A series of analogs often yielded multiple MMP cores. Therefore, for each series, a computational search was carried out for a core that matched all MMP relationships within the series. If identified, the largest qualifying core then represented the ASB scaffold of the series.[22] The generation of ASB scaffolds is computationally efficient as it relies on effective MMP enumeration. Therefore, ASB scaffolds can be generated for large data sets comprising millions of compounds (such as the entire ChEMBL database).[22] The generation of BM and ASB scaffolds is schematically illustrated in Figure . BM scaffolds were calculated with an in-house implementation using the OpenEye toolkit.[29]

Similarity Calculations

As a control for scaffold-based similarity assessment, similarity search calculations were carried out using the extended connectivity fingerprint with bond diameter 4 (ECFP4)[30] and a similarity threshold of 0.4 for the Tanimoto coefficient.[16] This threshold value is often used for ECFP4 in virtual compound screening.[16]

Cell Lines and Screening Data

The human tumor cell line growth inhibition assay data from the National Cancer Institute (NCI)[31] were extracted from PubChem.[32] Only compounds screened in confirmatory assays originating from NCI Developmental Therapeutics Program (DTP/NCI) were considered. In total, 2 396 398 assay compounds were screened in 73 cell lines representing 10 different neoplasia (including breast, CNS, colon, leukemia, melanoma, nonsmall cell lung, ovarian, prostate, and renal cancers). Table reports screening statistics for each neoplasia type. Details for all cell lines are provided in Table S1. Assay compounds were designated as active or inactive on the basis of PubChem records. In the following, active compounds are also referred to as hits.

Table 2

Cancer Cell Lines and Screening Dataa

	neoplasia	cell lines	assayed CPDs	active CPDs	inactive CPDs
1	breast	6	161 953	10 031	151 922
2	CNS	8	265 511	13 865	251 646
3	colon	9	310 533	17 070	293 463
4	leukemia	8	231 398	20 082	211 316
5	melanoma	10	360 686	18 693	341 993
6	nonsmall cell lung	11	378 082	19 683	358 399
7	ovarian	7	242 571	12 446	230 125
8	prostate	2	56 284	3195	53 089
9	renal	10	324 513	16 244	308 269
10	small cell lung	2	27 527	1882	25 645

The table provides statistics for the 10 neoplasia types and corresponding screening data. For each neoplasia, the name and number of cell lines are given. In addition, the total number of assayed compounds (CPDs) and the number of active and inactive compounds are reported.

Reference Compounds

For the scaffold-based similarity analysis, reference compounds were assembled from ChEMBL version 22.[33] Only compounds for which high-confidence activity data were available were considered. Therefore, compounds with direct interactions (type “D”) with human targets at the highest confidence level (ChEMBL confidence score 9) were selected. Only assay-independent equilibrium constants (Ki values) and assay-dependent IC50 values were considered as potency measurements. Approximate measurements (e.g., “>” or “∼”) were discarded. If multiple Ki or IC50 values were available for the same compound, their geometric mean was calculated as the final potency annotation, provided all values fell within the same order of magnitude. Otherwise, the measurements were discarded. Applying these selection criteria, a total of 224 532 unique compounds were obtained with activity against human 1687 targets.

Targets

The set of 1687 ChEMBL targets (in the following referred to as targets) was used to assign targets to screening compounds. The subset of known cancer targets was determined. Therefore, known cancer targets were collected from the Therapeutic Target Database,[34] and targets implicated in malignant neoplasm were identified on the basis of the ICD-10 code.[35] The 1687 ChEMBL targets were found to contain 429 cancer targets.

29 in total

1. Computational Exploration of Molecular Scaffolds in Medicinal Chemistry.

Authors: Ye Hu; Dagmar Stumpfe; Jürgen Bajorath
Journal: J Med Chem Date: 2016-02-03 Impact factor: 7.446

2. Molecular similarity in medicinal chemistry.

Authors: Gerald Maggiora; Martin Vogt; Dagmar Stumpfe; Jürgen Bajorath
Journal: J Med Chem Date: 2013-11-11 Impact factor: 7.446

3. Drug-target network.

Authors: Muhammed A Yildirim; Kwang-Il Goh; Michael E Cusick; Albert-László Barabási; Marc Vidal
Journal: Nat Biotechnol Date: 2007-10 Impact factor: 54.908

4. RECAP--retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry.

Authors: X Q Lewell; D B Judd; S P Watson; M M Hann
Journal: J Chem Inf Comput Sci Date: 1998 May-Jun

5. Systematic identification of scaffolds representing compounds active against individual targets and single or multiple target families.

Authors: Ye Hu; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2013-02-05 Impact factor: 4.956

Review 6. Phenotypic screening in cancer drug discovery - past, present and future.

Authors: John G Moffat; Joachim Rudolph; David Bailey
Journal: Nat Rev Drug Discov Date: 2014-07-18 Impact factor: 84.694

7. Many approved drugs have bioactive analogs with different target annotations.

Authors: Ye Hu; Eugen Lounkine; Jürgen Bajorath
Journal: AAPS J Date: 2014-05-29 Impact factor: 4.009

Review 8. Target deconvolution techniques in modern phenotypic profiling.

Authors: Jiyoun Lee; Matthew Bogyo
Journal: Curr Opin Chem Biol Date: 2013-01-18 Impact factor: 8.822

9. Chemical informatics and target identification in a zebrafish phenotypic screen.

Authors: Christian Laggner; David Kokel; Vincent Setola; Alexandra Tolia; Henry Lin; John J Irwin; Michael J Keiser; Chung Yan J Cheung; Daniel L Minor; Bryan L Roth; Randall T Peterson; Brian K Shoichet
Journal: Nat Chem Biol Date: 2011-12-18 Impact factor: 15.040

10. PubChem BioAssay: 2017 update.

Authors: Yanli Wang; Stephen H Bryant; Tiejun Cheng; Jiyao Wang; Asta Gindulyte; Benjamin A Shoemaker; Paul A Thiessen; Siqian He; Jian Zhang
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

3 in total

1. Phenotypic Screening of Chemical Libraries Enriched by Molecular Docking to Multiple Targets Selected from Glioblastoma Genomic Data.

Authors: David Xu; Donghui Zhou; Khuchtumur Bum-Erdene; Barbara J Bailey; Kamakshi Sishtla; Sheng Liu; Jun Wan; Uma K Aryal; Jonathan A Lee; Clark D Wells; Melissa L Fishel; Timothy W Corson; Karen E Pollok; Samy O Meroueh
Journal: ACS Chem Biol Date: 2020-05-21 Impact factor: 5.100

2. Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.

Authors: Dilyana Dimova; Dagmar Stumpfe; Jürgen Bajorath
Journal: Future Sci OA Date: 2017-11-30

3. Finding Constellations in Chemical Space Through Core Analysis.

Authors: J Jesús Naveja; José L Medina-Franco
Journal: Front Chem Date: 2019-07-16 Impact factor: 5.221

3 in total