Literature DB >> 35508359

DRUG-seq Provides Unbiased Biological Activity Readouts for Neuroscience Drug Discovery.

Jingyao Li, Daniel J Ho, Martin Henault, Chian Yang, Marilisa Neri¹, Robin Ge, Steffen Renner¹, Leandra Mansur, Alicia Lindeman, Brian Kelly, Tayfun Tumkaya, Xiaoling Ke, Gilberto Soler-Llavina, Gopi Shanker, Carsten Russ, Marc Hild, Caroline Gubser Keller¹, Jeremy L Jenkins, Kathleen A Worringer, Frederic D Sigoillot, Robert J Ihry.

Abstract

Unbiased transcriptomic RNA-seq data has provided deep insights into biological processes. However, its impact in drug discovery has been narrow given high costs and low throughput. Proof-of-concept studies with Digital RNA with pertUrbation of Genes (DRUG)-seq demonstrated the potential to address this gap. We extended the DRUG-seq platform by subjecting it to rigorous testing and by adding an open-source analysis pipeline. The results demonstrate high reproducibility and ability to resolve the mechanism(s) of action for a diverse set of compounds. Furthermore, we demonstrate how this data can be incorporated into a drug discovery project aiming to develop therapeutics for schizophrenia using human stem cell-derived neurons. We identified both an on-target activation signature, induced by a set of chemically distinct positive allosteric modulators of the N-methyl-d-aspartate (NMDA) receptor, and independent off-target effects. Overall, the protocol and open-source analysis pipeline are a step toward industrializing RNA-seq for high-complexity transcriptomics studies performed at a saturating scale.

Entities: Chemical

Mesh：

Substances：
RNA

Year: 2022 PMID： 35508359 PMCID： PMC9207813 DOI： 10.1021/acschembio.1c00920

Source DB: PubMed Journal: ACS Chem Biol ISSN： 1554-8929 Impact factor: 4.634

Introduction

In the pharmaceutical industry, it is standard to test thousands of compounds in high-throughput screens to identify regulators of a target or a biological process.[1,2] This massive scale is made possible by focusing on a single readout. However, biological systems are inherently complex, and there is a need for scalable screening methods that can capture the total biological activity of small-molecule libraries. Whole-transcriptome analysis, by RNA-seq, offers a high-dimensional readout but is cost prohibitive and is typically performed on a small number of samples.[3,4] To reduce costs, targeted RNA-seq approaches such as L1000 or RASL-seq have been deployed successfully as large-scale transcriptomic profiling methods.[5,6] Targeted approaches such as these can be tailored to any gene set of interest, but it takes time to optimize gene sets for any given disease or cell model,[7] and this approach may miss unexpected and potentially important transcriptomic signatures. Single-cell RNA-seq can detect a few thousand genes per cell in an unbiased way. Recently, the sci-Plex method identified cell-specific transcriptional responses to hundreds of compounds by labeling the cells in each treated well with a single-stranded DNA barcode that binds to nuclei prior to a single-cell (sc)RNA-seq sample processing, which enables sample multiplexing.[8] It will be promising to apply methods like sci-Plex to study the single-cell effects of treatment in complex tissue models, such as brain organoids.[9] As cell diversity increases, 1000s of cells must be profiled to detect responses in lower abundance or rare cell types. For more homogeneous cell culture models, scRNA-seq plus perturbation methods may offer less of an advantage over bulk RNA-seq methods and cellular resolution comes at the expense of throughput of perturbations tested and the number of genes detected. Digital RNA with pertUrbation of Genes (DRUG)-seq is a low-cost, high-throughput bulk RNA-seq method that uses a direct in-well lysis of cells in 384-well plates and is ideal for studying the transcriptomic effect of many compound treatments in parallel.[10] Multiple groups have been working to develop unbiased whole transcriptome-wide methods that are lower cost.[11−13] The pace of development in this field makes it difficult to compare data or perform exhaustive benchmarking studies to compare the performance of the methods. As a key step toward this, we created an experimental design to thoroughly test the DRUG-seq platform and made the methods, data, and code available for independent verification and reproduction of the results. By exhaustively testing reproducibility across batches and plates, we demonstrate that DRUG-seq provides the granularity to bin compounds by the mechanism of action (MoA) and meets the performance standards required of an industry-scale RNA-seq platform. We also demonstrate how DRUG-seq can impact drug development projects with an example from a study designed to probe both the on- and off-target effects of novel N-methyl-d-aspartate (NMDA) receptor potentiators as therapeutics for schizophrenia.

Results

Summary of Protocol and Experimental Design

DRUG-seq pairs high-throughput cell culture with miniaturized RNA-seq (Figure A). Cells are first cultured in 384-well plates and then treated with compounds for a desired period of time. After the treatment, the cells are directly lysed in each well, without RNA purification. The RNA in each well is used as a template for reverse transcription (RT), where the DRUG-seq RT primers incorporate both the well barcodes and unique molecular identifiers (UMI) (Figure B). After RT, the samples are pooled and used as a template for second-strand cDNA synthesis and subsequent library construction. During library construction, Tn5-mediated cDNA tagmentation is performed, which enzymatically fragments and adds Illumina adaptors to each insert. Next plate-level barcodes are incorporated to track multiple plates simultaneously. Finally, for optimal sequencing, libraries are size-selected and quality checked by DNA fragment analysis. After quantification, libraries for each plate(s) are normalized and pooled, such that we can accurately target sequencing at a read depth equivalent to 1 million reads per well on the Illumina platform. The final product is 3′ biased; full-length transcripts are not sequenced. The protocol scale can be adapted depending on the experimental need and resources available.

Figure 1

Overview of the DRUG-seq protocol. (A) Once cells are plated and treated with perturbagens, the DRUG-seq protocol can be performed and sequenced in 5 days. The process and equipment depicted here is for batches of 18 or fewer plates. Scale up or down is possible depending upon infrastructure available. Echo 555 image—©2022 Beckman Coulter, Inc. Used with permission. Bravo image—©Agilent Technologies, Inc. Reproduced with Permission, Courtesy of Agilent Technologies, Inc. C1000 touch image—Courtesy of Bio-Rad Laboratories, Inc., © 2022. ViiA 7 image—ThermoFisher Scientific Inc. HiSeq. 4000 image—Used under license from Illumina, Inc. All Rights Reserved. (B) Depiction of the DRUG-seq high-throughput RNA-seq chemistry (adapted from Ye et al., 2018). Day 1: the scale of DRUG-seq is enabled using a direct in-well lysis of cells without mRNA purification, which is coupled to a barcoding and sequencing strategy that tracks transcripts (UMI), wells (well barcode) on day 2. On day 3, library construction incorporates plate barcodes and generates cDNA libraries (illumina Nextera or custom Illumina indexes) to be quantified and sequenced on the illumina platform (days 4–5). We designed an experiment to evaluate the reproducibility of the DRUG-seq protocol (Figure A). We generated three independent batches of U-2 OS cells on different days, plated in three replicate 384-well plates per batch for a total of 9 × 384-well plates. In each plate, we treated cells with 14 compounds (Figure B, File S1, Table ), each with an eight-point dose response (3.2 nM to 10 μM) with three replicates for each dose. This design was used to test the performance between the batches of cells and plate(s).

Figure 2

DRUG-seq reproducibility experimental design. (A) Experimental design depicted by nine plate maps. Within each plate, 14 compounds were plated in 8 doses (3.2 nM to 10 μM) in 3 sectors for a total of 3 replicates per condition per plate (n = 3 per condition). Each sector contained 8 DMSO wells for a total of 24 per plate (n = 24). For each week (or batch) of cells, there were three replicate plates. Each batch of cells was plated on independent days to reflect biological variability. (B) Table depicts compound name, identifier, target, and cluster from Ye et al., 2018 publication. Fourteen compounds were selected to represent a diverse set of MoAs.

Table 1

Fourteen Compounds

compound	NSN identifier	target	Ye et al., 2018
triptolide	ND-09-QY33	ERCC3i	IV
CHIR118637	CA-90-VK59	GSK3Bi
Cmp_341	JC-68-AL90	JAK2i	IV
fedratinib	MB-03-IE36	JAK2i	III
NVS–SM2	JC-43-ZO95	SNRPCmod	II
CPI-203	PA-33-CK45	BRD4i	III
brutasol	KA-73-NB69	NRF2i	VI
homoharringtonine	QC-05-UB63	EIF4Ei	VI
BTdCPU	SE-15-AV21	EIF2AK1a	VI
AZD8055	NV-67-DX31	MTORi	VI
NVP-BVB808/Cmp_334	DB-85-YA47	PI3K	I
dilazep	NA-37-JQ34	SLC29A2i
BLU9931	VA-76-OV33	FGFR4	IV
(S)-crizotinib	LD-22-SA99	ALK	IV

DRUG-seq Analysis Pipeline and Activity Threshold

The data processing and primary analysis pipeline is a series of manually run steps, which allow for stepwise review and quality control. After sequencing, reads are demultiplexed using Illumina’s standard bcl2fastq2 method. The upstream DRUG-seq data processing employs highly parallelized mapping, as well as barcode and UMI counting and filtering to efficiently generate count tables (Figure A). Mapping of sequencing reads is performed using a custom version of the Ensembl GRCh38 reference, in which each gene was constructed using all annotated exons.

Figure 3

DRUG-seq analysis pipeline. The DRUG-seq analysis pipeline is composed of three steps (A) outline of the steps required to convert raw next-generation sequencing data into a count matrix by converting to fastq files, aligning to the transcriptome and counting transcripts associated with well and plate barcodes. (B) Flow chart describing how differential expression (DE) results are obtained from the UMI count matrix generated in part A. In step 1, the true null is calculated by performing 500 random DMSO to DMSO comparisons to generate DE results. Next, RSA analysis is used to rank DMSO wells by the number of DE genes they contributed to. Bad DMSO wells are removed and the nine best DMSO wells are used as reference controls (RCs) to calculate a differential expression for the compound-treated wells. (C) Flow chart describing the filtering steps required for secondary analysis. Step 2 of the true null calculation is performed and 500 random DMSO to DMSO comparisons generate DE gene results in the absence of bad wells. The 95th percentile of the 500-comparison distribution is then used as an activity threshold. The threshold is the minimal number of DE genes required to be greater than the technical noise in DMSO 95% of the time. The true null activity threshold is used to filter active samples. DE gene filtering, normalization, and removal of batch effects are applied to generate the final UMAP visualization. The following steps of the analysis code are shared through GitHub (https://github.com/Novartis/DRUG-seq). Before generating differential expression (DE) results, an activity threshold is empirically determined based on the technical noise of the experiment and is coined the “true null calculation.” This takes advantage of the high number of DMSO wells that can be sampled in this low-cost RNA-seq assay. The baseline for transcriptionally active treatments is set by performing multiple permutations of randomly sampled DMSO vs DMSO comparisons using all available wells (step one). In step 1, for each batch of three plates, three random DMSO wells were selected (one per plate) and were compared to the remaining 69 DMSO wells to then calculate DE. This process was iterated 500 times (Figures B and 4A, File S2). Out of the 500 iterations, we identified both DMSO wells that contributed to the fewest DE genes and DMSO wells that contributed to the most DE genes using the redundant siRNA activity (RSA) statistic.[14] The nine DMSO wells per batch (three per plate) that contributed to the fewest DE genes were then used as reference controls to calculate DE for the compound-treated samples (Figure B). It should be noted that the number of iterations, as well as the number of DMSO wells chosen for an optimal reference control, can be customized for each experiment. Next, we carry out DE analysis using limma-trend[15] to quantitate gene-level changes in the transcriptome, which can serve as the input for additional analyses (Figure B).

Figure 4

DRUG-seq activity threshold set by the true null calculation. (A) Left panel depicts the typical analysis of a DRUG-seq experiment using compounds with DMSO as a control. When setting a contrast for DE analysis, three replicates of a standard active (SA) sample (compound plus dose, colored black) are compared to the DMSO reference controls (RC, colored gold). Each compound treatment has 3 SA well replicates and 24 RC or DMSO well replicates per plate. The middle panel depicts step 1 of the true null calculation. For this, the number of differentially expressed genes is quantified when comparing three DMSO well replicates as mock treatments (DMSO/RC turned SA wells, green) relative to the remaining 69 DMSO/RC (gold) wells per batch of three plates. Five hundred randomly chosen differential expression comparisons of 3 DMSO versus the remaining 69 DMSO are performed. Next outlier DMSO wells (colored red) and the best DMSO wells (colored blue) are identified using the redundant siRNA activity (RSA) statistical ranking analysis. The right panel depicts step 2 of the true null calculation. Five hundred random DMSO to DMSO differential expression comparisons are recalculated, this time in the absence of the bad DMSO wells with the nine best DMSO as the RC. (B) Histogram shows the frequency of the number of DE genes per comparison of randomized DMSO SA to DMSO RC comparisons. In step 1, all DMSO wells are compared (top), and in step 2, the bad DMSO wells are removed and the randomly chosen DMSO wells are compared to the best nine DMSO wells (bottom). Y-axis is the frequency of the 500 DMSO comparisons, across 3 batches, binned by the number of differentially expressed genes on the X-axis. (C) Bar graph plots the number of DE genes (y-axis) against the percentile from the distribution of 500 randomized DMSO to DMSO comparisons per batch of three plates. The left panel depicts true null calculated using all DMSO wells (step 1) and selected a mean threshold of 221 differentially expressed genes (DEG) at a 95th percentile. Depicted on the right panel is the true null calculation using removing the outlier DMSO wells across three plates (mean 84 DEG at the 95th percentile). The result is interpreted as only 5% of the time DEGs in DMSO are above the noise detected by comparing DMSO treatments. Removing the outlier DMSO wells for analysis lowers the threshold per batch. Light blue is 1407 DEGs and dark blue is 1 DEG. In step 2, DMSO wells that potentially inflate the number of DEGs were removed (“bad” wells marked red, Figures C and 4A). The true null was recalculated by comparing three random DMSO wells versus the nine best DMSO wells with 500 iterations, where we selected the three random wells from the remaining DMSO wells (Figures C and 4A, File S3). When comparing the results from steps 1 and 2, the frequency of DMSO to DMSO comparisons yielding more than 100 DE genes was reduced with reciprocal gains in the frequency of DMSO comparisons with 20 or fewer DE genes (Figure B). We typically define transcriptionally active compounds as those with more DE genes than the 95th percentile of the DMSO to DMSO true null distribution (Figures C and 4C). This indicates that the treatment is more active than DMSO, 95% of the time. Results obtained with all DMSO wells (Step 1: active >221 DE genes) have a higher number of DE genes (Figure C). Results obtained by removing outlier DMSO wells (Step 2: active >84 DE genes), lowered the minimal number of DE genes to be considered active (Figure C). The step 2 true null threshold to filter active samples for the results is discussed below. The secondary analysis pipeline integrates count table data and experimental metadata for quality control and batch correction analyses. In addition to the true null (step two), gene-level thresholds are applied to reduce technical variation. Dimensionality reduction analysis by Uniform Manifold Approximation and Projection (UMAP)[16] is then performed to globally visualize the data.

Batch and Plate Reproducibility Experiment with 14 Tool Compounds

After using the true null threshold to select for active treatments, we used the secondary analysis pipeline and generated a UMAP from the DRUG-seq UMI counts matrix to visualize the global relationship between the 14-compound treatments across an active dose range (Figure A). After labeling the UMAP by either Louvain cluster number[17] or compound treatment, it was evident that DRUG-seq identified many transcriptional groups. The 13 Louvain clusters were spatially distinct in the UMAP plot and often displayed substructures in the UMAP projection. We next examined how each compound was distributed across the UMAP and identified that the majority (12 of 14) of compounds were localized in a compound-specific cluster (Figure B, File S4, interactive plot). Dilazep and (S)-crizotinib were the exception, as these two compounds clustered near DMSO, suggesting low activity in the U2-OS cell line. Some compounds exhibited a single cluster at a certain dose range like BLU9931, CPI-203, CHIR118637, and Cmp_341. For most compounds, dose influenced the clustering and we observed dose-specific clusters for triptolide, homoharringtonine, brusatol, NVS–SM2, fedratinib, Cmp_334, and AZD8055. These compounds may exhibit polypharmacology, and the targets they engage either change and/or interact across doses. However, validation would be required to confirm these multitargeted transcriptional activities. We observed coclustering of compounds with related MoAs. For example, the translational inhibitors homoharringtonine and brusatol exhibited matched dose-dependent clustering, which reflects similar potencies and targets. Cmp_334, a PI3K inhibitor, and AZD8055, an mTOR inhibitor, exhibit a shared cluster at low doses that diverge with increasing concentration. This is not surprising given that PI3K is the upstream of mTOR,[18] and perhaps both compounds have selective effects at the lower doses, while a broader range of scaffold-related activity is induced at higher doses. In addition to this descriptive analysis, we quantified cluster composition across batches. We used the Louvain method to define 13 clusters and generated a heatmap to indicate the proportion of each active compound treatment within the Louvain clusters per each batch (Figure C). Each compound exhibits an enrichment in a specific Louvain cluster, and the same result was observed across the three batches of cells. This indicates high reproducibility despite both technical and biological variation. Overall, these results indicate that DRUG-seq has sufficient resolution to group compounds by MoA, and because it is target agnostic, it can detect many MoAs in a single assay.

Figure 5

Reproducibility of DRUG-seq data using a set of diverse MoA compounds. (A, B) UMAP plots depicting a dimensionality reduced 2D transcriptome using DRUG-seq data colored by Louvain cluster number for (A) and colored by compound ID for (B). The 14 compounds represent a diverse set of MoAs and only active treatments above the true null threshold are plotted and form distinct clusters. Brusatol and Homoharringtonine are both translation inhibitors and cocluster. DMSO wells are included for an activity reference. Compounds with subtle effects cluster near DMSO. Each dot represents a single well for each treatment. The size of dots is scaled to represent doses from 0 to 10 μM. On the left and right panels, each compound or Louvain cluster, respectively, is labeled by text in the color of the corresponding dots. (C) Heatmap indicates the proportion of each compound-treated sample within Louvain clusters 1–13. Most compounds have a predominant composition in a single or a few Louvain clusters. Cluster 12 includes DMSO and has a mixed composition of many compound and dose combinations of weaker effects. Results across three independent batches of cells show a similar composition indicating reproducibility despite biological and technical variation. Proportion for each cluster is indicated with a color (scale 0 = white and 1 = dark blue). Next, the reproducibility of single compounds across doses was examined. Comparing a single dose of homoharringtonine at 10 μM across all three batches revealed an average overlap of 68%, which is close to the expected overlap with a false discovery rate threshold of 0.1 (Figure A). Homoharringtonine exhibited a dose response and high overlap of DEGs across consecutive doses (Figure B,C). Within a batch, adjacent doses 1, 3.16, and 10 μM exhibited an average of 82% overlap in differentially expressed genes (Figure C). We performed a pairwise comparison across all doses and identified a high Pearson’s correlation between samples within a batch. Higher doses and adjacent doses exhibited the highest correlations (0.8–0.98) (Figure D). Overall, these studies represent a high bar for vetting the stability of a platform and will allow for comparisons with emerging technologies. We deposited the NGS data, metadata, and analysis code for this study to enable other teams to reproduce the transcriptome signatures and to use in additional benchmarking studies.

Figure 6

Dose response and batch reproducibility of Homoharringtonine. (A) Venn diagram showing overlapping differentially expressed genes across batches 1–3 at 10 μM dose. (B) Volcano plots depicting dose response for homoharringtonine. Y-axis adjusted p-value, X-axis log2(FC). Red = upregulated genes in 10 μM, Blue = downregulated genes in 10 μM, and n.s. = not significant in 10 μM (significant = adj.p<0.5 +/- 1 log2FC). (C) Venn diagram depicting the overlap of differentially expressed genes between adjacent doses 1–10 μM. (D) Pair plot comparing all doses. The top right half indicates Pearson’s correlation between samples. Color scale from white = 0 to red = 1. Bottom left scatter plots pairwise compare log2(FC) in gene expression from conditions labeled on the top and right edges. The dashed lines label log2(FC) threshold equal to 1. Red indicates common DEGs across conditions. Blue are DEGs specific to condition on the y-axis and green are DEGs the x-axis for each comparison. The histogram on the diagonal depicts the distribution of gene expression for each condition.

DRUG-seq Detects On- and Off-Target Effects for NR2A Drug Discovery Program

To demonstrate the applicability of DRUG-seq, we describe an example from a neuroscience drug discovery program aimed at developing therapeutics for schizophrenia (Figure ). Schizophrenia afflicts approximately 1% of the population.[19] NMDA receptor (NMDAR) hypofunction is implicated in schizophrenia,[20,21] and the gene encoding the NR2A subunit, GRIN2A, is associated with schizophrenia risk, as evidenced by genome- and transcriptome-wide association studies and rare de novo mutations, all of which can modify disease risk.[22−25] The NMDA receptor is heterotetrameric with many subunits, and among the possible combinations, it can be composed of two dimers of GRIN2A and GRIN1[26] (Figure A). The NMDA receptor functions as an ion channel that opens in the presence of its ligands, glutamate, glycine/D-serine, and NMDA.[27] Furthermore, the channel can be inhibited by phencyclidine (PCP), ketamine, dizocilpine (MK-801), and zinc.[27] Screening campaigns identified two independent chemical scaffolds (hit series 1 and 2) as novel NR2A potentiators. Examination of the structure of one of these scaffolds, NR2AS1-1, indicated a potential to chelate zinc (Figure A). Although this could be part of the on-target MoA by removing zinc from the extracellular surface of NMDAR to potentiate receptor activity,[28,29] it also presents potentially undesirable pharmacology by chelating zinc, an essential ion with many biological functions.[30−32]

Figure 7

DRUG-seq detects on- and off-target effects for NR2A Drug Discovery Program. (A) Schematic depicting an idealized heterotetrameric NMDA receptor composed of two dimers of GRIN2A and GRIN1. The presence of ligands glycine/D-serine and glutamate trigger the release of ions. Zinc ions bind and inhibit channel function (red). The chemical structure of NR2AS1-1 revealed a potential to chelate zinc (Zn2+). (B) Experimental paradigm combines human Ngn2 neurons with both an on-target NMDA positive allosteric modulator (PAM) and off-target zinc chelators. Neurons were treated for 6 h in the absence or presence of ligands. (C) Volcano plots depicting ligand-dependent differential expression of genes induced by the NMDA PAM GNE-0723. Y-axis −log 10 (adjusted p-value); X-axis log 2 (fold change). Red = upregulated differential expressed genes in 10 μM, blue = downregulated differential expressed genes in 10 μM, and n.s. = not differential expressed genes in 10 μM (differential expression = adj.p<0.1 and abs(log2FC)>1). (D) Pair plot comparing tool compounds (GNE-0723 and ZX1) to both NR2A hit series 1 and 2. The top right half indicates Pearson’s correlation between samples. Color scale is blue = −1, white = 0, and red = 1. Bottom left scatter plots pairwise comparison of log2(FC) in gene expression from conditions labeled on the top and right edges. The dashed lines label log2(FC) threshold equal to 1. Red indicates common DEGs across conditions. Blue shows DEG specific to condition on the y-axis and green is DEG the x-axis for each comparison. The histogram on the diagonal depicts the distribution of gene expression for each condition. (E) UMAP projection of the DRUG-seq transcriptome for active compound treatments without ligands. NR2AS1-1 clusters with the zinc chelators TPEN and TPA. Other compounds from hit series 1 and GNE-0723 do not cluster with zinc chelators. Compounds from hit series 2 are very active without ligands and exhibit a second off-target effect caused by an unknown mechanism. To study both on- and off-target effects, we performed DRUG-seq profiling of human pluripotent stem cell (hPSC)-derived neurons treated with our two internal hit series of NR2A potentiators (Figure B and Table ). We compared these to neurons treated with the extracellular zinc chelator ZX1, the cell-permeant zinc chelators TPEN and TPA,[33−35] and GNE-0723, a selective NR2A NMDAR positive allosteric modulator.[36] In the presence of NMDAR ligands (NMDA and D-serine) alone, we saw no DE genes (Figure C). In iPSC-derived neurons,[37] we observed a transcriptional response to GNE-0723 that was enhanced by ligands (Figure C). In the presence of ligands, we detected a similar NMDAR activation signature in neurons treated with five additional compounds (NR2AS1-1, -2, and -3; NR2AS2-1 and -2) from the two independent potentiator hit series scaffolds (Figure D), with correlations ranging from 0.736 to 0.834. As expected, the extracellular zinc chelator ZX1 produced a similar signature to the NR2A potentiators, presumably by inhibiting the repressive effect of extracellular zinc binding on NMDAR signaling (Figure D). Cumulatively, we identified seven compounds that induced an NMDAR activation signature from four distinct chemotypes.

Table 2

NR2A On- and Off-Target Compounds

ID	InChIKey	description
NR2A_DRUG-PC1	DFBIRQPKNDILPW-KTGKZQHOSA-N	triptolide—Drug-Seq control
NR2A_DRUG-PC3		Drug-Seq control
TPA	VGUWFGWZSVLROP-UHFFFAOYSA-N	TPA
TPEN	CVRXLMUYFMERMJ-UHFFFAOYSA-N	TPEN
ZX1	AXBINWBONNNKDF-UHFFFAOYSA-N	ZX1
NR2AS1-1	YTSDVHCTYSWFSK-UHFFFAOYSA-N	hit series 1
NR2AS1-2		hit series 1
NR2AS1-3		hit series 1
NR2AS1-4		hit series 1
NR2AS1-5		hit series 1
NR2AS1-6		hit series 1
NR2AS2-1		hit series 2
NR2AS2-2		hit series 2
NR2AS2-3		hit series 2
NR2AS2-4		hit series 2
NR2AS2-5		hit series 2
NR2AS2-6		hit series 2
NR2AS2-7		hit series 2
GNE-0723	FTIBNGABJNFFAI-SVRRBLITSA-N	GNE NR2A PAM

In the absence of ligands, ZX1 and GNE-0723 produced a minimal change in gene expression, thus indicating that these compounds selectively affect the NR2A signaling pathway in vitro (Figure C–E). However, members of hit series 1 and 2 had two distinct off-target signatures in the absence of ligands (Figure E). Unlike ZX1 and GNE-0723, which produced few transcriptional effects without ligands, NR2AS1-1 produced both an on- and off-target signature at the same dose (10 μM), in the presence and absence of ligands, respectively. Without ligands, NR2AS1-1, of hit series 1, clustered with both TPEN and TPA (Figure E), indicating that NR2AS1-1 induced off-target effects similar to known zinc chelators. NR2AS1-2 and -3 of the hit series 1 compound were able to produce an NMDA activation signature with ligands, but without ligands had subtle effects that did not overlap with TPEN and TPA (Figure D,E). Additionally, NR2AS2-1, -2, -4, and -5 of series 2, without ligands present, formed a distinct cluster and induced a significant number of DEGs (Figure E). This indicates that these compounds have a common but unknown, off-target that changes gene expression in human neurons. In a single assay, DRUG-seq discerned NR2A on-target effects and two distinct off-target effects, zinc chelation and an undefined activity produced by hits from series 2. This example highlights a key feature of DRUG-seq—the ability to look beyond the expected to identify additional biological activities early in the process of drug discovery. DRUG-seq helped gain a deeper picture of the total biological activity of compounds from both hit series and contextualized them relative to additional tool compounds. Overall, we envision that DRUG-seq can be incorporated into drug discovery projects at early stages to help prioritize potential therapeutic candidates.

Discussion

DRUG-seq is a target-agnostic high-throughput screening method with a transcriptome readout, and it can be broadly applied to new cellular models without redesign of the approach or a priori assumptions about key genes or pathways that will be measured. It is well suited for high-complexity RNA-seq studies in which many variables and perturbations are tested, such as the dose and length of treatment. DRUG-seq is a bulk RNA-seq readout, and, as such, is best applied to cell cultures with moderate to low heterogeneity. The bandwidth of DRUG-seq accommodates the profiling of a chemical series of related or unrelated chemotypes with different potencies and known or unknown on- and off-target activities. The total cost of DRUG-seq is $3–10 per condition including triplicates at a read depth of 0.25–1 million reads per well, respectively. This makes it possible to screen thousands of conditions while still providing a high-dimensional readout with greater than 7000 genes. The resulting high-dimensional data can be used to group compounds by MoA, conduct user-defined signature queries, or search for compounds that may reverse disease signatures. Although the work reported here describes the usage of compounds, one can leverage DRUG-seq profiling for other perturbagens, depending on the question and biological model. The low cost allowed us to systematically test both the technical and biological variability across plates and batches of cells. By standardizing the experimental design, performance metrics can be tracked long term across many experiments. Using this information, we set statistically defined thresholds to determine the activity of treatments tested in a DRUG-seq experiment. The true null threshold allows us to pick treatments that are statistically defined as active and provides a minimal range of DE genes that we can trust to produce a reliable signature of expression. We also demonstrated that the results were highly reproducible across plates and batches of cells. The open-source analysis pipeline and available data will facilitate future analytical improvements and lower the barrier for new labs to adopt the platform. By deploying transcriptomics at scale, we stand to gain biological insights beyond a single target or pathway. With the selection of a set of compounds with diverse MoAs, we demonstrated the granularity of DRUG-seq to discern specific MoAs, dose responses, and dose-dependent polypharmacology. This would be easy to miss if only a single or a few doses were tested. DRUG-seq is unbiased, as a selection of panels of genes is not required, and it quantifies 5–10x more transcripts than L1000 or other targeted amplicon approaches.[5,6] The wide range of activity detected by DRUG-seq allows it to detect expected and unexpected biological responses. Being blind to the latter has likely contributed to the failure of many drugs at various stages of discovery and development. Furthermore, the dose-dependent detection of on- and off-target phenotypes and the switch between phenotypes can be used to determine potencies for each effect[38] and potentially quantify selectivity and safety windows in the absence of dedicated assays. Lastly, we show how DRUG-seq can impact a neuroscience drug development project. DRUG-seq was successfully used to compare NMDAR potentiators and zinc chelators. Not only did we detect on-target NMDAR activity signatures from two independent internal hit series, but we also uncovered that some representatives of each hit series had unique off-target effects. We were aware that series 1 had the potential to chelate zinc and confirmed this by detecting similarity to a signature induced by known chelators TPA and TPEN. In addition, we demonstrated that series 2 also had significant transcriptional effects not related to NMDAR and were likely caused by an undefined off-target effect. These results demonstrate that a high-dimensional unbiased transcriptomic readout has the potential to improve the efficiency of the drug development process to save both time and resources.

Materials and Methods

Cell Culture

U2-OS (ATCC HTB-96) was grown in DMEM, 10% FBS, and 1% Pen/Strep. The sufficient number of cells was grown prior to trypsin dissociation, the day of plating. Twenty microliters of cells were dispensed into 384-well black uClear polystyrene cell culture-treated plates (Griener, Cat#: 781090) using a bottle valve washer/dispenser from the Genomics Institute of the Novartis Research Foundation (GNF, http://www.gnfsystems.com) with a concentration of 5000 cells per well, a day prior to compound treatment. The GNF system is critical for large-scale experiments, but other standard plate wash/dispense equipment or multichannel pipettes will suffice for a smaller scale. Density optimization is required for each cell line for optimal downstream steps. H9-hESCs were grown and expanded in mTESR media on hESC-qualified Matrigel. Ngn2 neurons were generated by exposing transgenic H9-hESCs, harboring a dox-inducible Ngn2 gene that was stably integrated on a piggyBac transposon, to doxycycline (1.9 μg/mL) for 3 days in DMEM/F-12 with Glutamax 95%, Pen/Strep 1% and N2 1%. Immature neurons at day 3 were dissociated using accutase, frozen in CryoStor CS5, and stored in liquid nitrogen. Thawed Ngn2 neurons were then replated in matrigel-coated 384-well plates at a density of 12 000 per well in 80 μL of media (DMEM/F-12 Glutamax 95%, B27 2%, Pen/Strep 1%, N2 1%, NT3 9.5 ng/mL, 3.8 ng/mL BDNF) with doxycycline (1.9 μg/mL). The Ngn2 neurons were hemifed every other day until day 14 of differentiation, at which point the neurons were treated with compounds for 6 h prior to lysis for DRUG-seq.

Day 1–2: Compound Treatment and Lysis

One day after plating, 20 nL of each compound was added using an acoustic dispense Echo 555 Liquid Handler (Labcyte). After a 6 h (Ngn2) or 24 h (U2-OS) of compound treatment, the media was aspirated down to 7.5 μL and an equal amount of 2× lysis buffer was added to all wells using the bottle valve washer/dispenser from GNF (U2-OS plates were sealed and placed on a microplate shaker HT-91002 (BigBear automation) for 4 min at 900 rpm). Lysis duration is cell type and density-dependent and requires optimization. Ngn2 neurons plates were lysed for 8 min. Plates were then centrifuged at 2000 rpm for 1 min before storage at −80 C until ready for further sample processing. Plates can be stored in −80 C for up to 3–4 weeks, after which RNA quality may begin to deteriorate.

Day 3: Reverse Transcription and Library Construction

On the day of sample processing, the assay plates were placed on ice until thawed. In an Armadillo PCR plate (ThermoFisher, AB2384), 2.75 μL of RT mix was added using a Multidrop Combi (ThermoFisher, 5840300). Once the assay plates were thawed, they were centrifuged at 2000 rpm for 1 min and then 15 μL of cell lysate was transferred using a Bravo (Agilent Technologies) into the plate containing RT mix. The assay plates are then centrifuged again at 2000 rpm for 1 min and 10 nL of 1 μM barcoded DRUG-seq RT primers were then dispensed into each well using an Echo 555 Liquid Handler (Labcyte). Plates were sealed (Bio-Rad, MSB1001), centrifuged for 1 min at 2000 rpm, and incubated at 42 C on a ProFlex PCR system (ThermoFisher, 4484077) for 2 h. After RT, samples were centrifuged for 1 min at 2000 rpm. Next, each individual plate was pooled into a reagent reservoir (ThermoFisher, 1064-05-7) using a Bravo Automated Liquid Handling Platform (Agilent). Samples were then transferred from the reservoir into a 50 mL conical tube, purified and concentrated using the DNA clean and concentrator-100 kit (Zymo Research, Cat#: D4030), and eluted in 150 μL of water. Due to the high volume after the addition of DNA binding buffer, samples were run three times through the same DNA clean and concentrator filter before elution. We further purified the materials eluted from the columns by adding 150 μL (1:1) of AMPure beads RNA clean XP (Beckman coulter, A63987) and incubating for 5 min. The bound beads were pelleted with a magnet and washed twice for 30 s with enough freshly made 80% ethanol to submerge the beads. After removal of ethanol, the beads were allowed to dry completely before eluting with 32 μL of water. To remove single-stranded DNA and excess nucleotides, exonuclease I (Exol) treatment was performed on all samples by adding 4 μL of ExoI buffer and 4 μL of ExoI (New England Biolabs, M0293L). Samples were incubated at 37 C for 30 min, heat inactivated at 85 C for 15 min, and held at 4 C. cDNA was then amplified by adding 50 μL 2X Kapa HIFI PCR ReadyMix (Kapa Biosystems, KK2602), 10 μL of the 10 μM DRUG-seq PCR primer (File S5), and then running the following program: cDNA samples were purified using the Agencourt RNA clean beads as described above but eluted with 11 μL of water. We ran 1 μL on a Bioanalyzer (Agilent, G2939BA) with a DNA high-sensitivity chip (Agilent) or a Fragment Analyzer (Agilent, M5310AA). We expect to see a wide range of fragment sizes, as represented in the figure below. Preamp abundance will be determined by cell type, but for U2-OS cells, we generally observe quantities of 1–5 ng/μL. Sizes range from 200 to 6000 (representative below).

Day 4: Tagmentation, Purification, and Quantification of DRUG-seq Libraries

For tagmentation, 5 μL of preamp material, measured by the fragment analyzer, was mixed with nuclease-free water to a final volume of 20 μL. The 20 μL of preamp was then mixed with 25 μL of TD buffer and 5 μL of TDE1 buffer (Nextera kit, FC-131-1096, Illumina) and incubated for 5 min at 55 C and held at 10 C. Tagmented DNA was purified with the Qiagen MinElute PCR Purification Kit (Qiagen, Cat#: 28004) and eluted with 25 μL of nuclease-free water. Each 25 μL sample is then PCR amplified using 15 μL of NPM (Nextera XT DNA library preparation kit, FC-131-1024/FC-131-1096), 5 μL of DRUG-seq_p5_PCR primer (5 μM), and 5 μL DRUG-seq indexing primer (Table S4). The PCR cycles were as follows: The amplicons were then purified using the Agencourt RNA clean beads as described above but this time eluted with 20 μL of nuclease-free water. The samples were then size-selected for 200–600 bp fragments using a PippinHT 2% agarose precast gel cassette (Sage Science, HTC2010). One microliter of the samples from the PippinHT was analyzed on a Fragment Analyzer (Agilent) using a DNF-474 High-Sensitivity NGS Fragment Kit 1-6000bp (Agilent, DNF-474-0500). We generally observe approximately 5–10 ng/μL, averaging a size of 300 bp. To quantify the libraries, qPCR was performed using Kapa library quantification kit for Illumina (KAPA #KK4824 Roche #07960140001). Following the Kapa kit manual, a premix of 5 mL Kapa SYBR FAST qPCR Master Mix, 1 mL Illumina primer premix, and 200 μL ROX low was combined, and the libraries were diluted 1:20 000 with nuclease-free water. The diluted libraries as well as the six standards provided in each kit were plated in triplicate using 4 μL per well. The reagent premix was plated using 6 μL per well. The plate was sealed and run on a QuantStudio 12k Flex with the following cycling: qPCR data analysis was performed using automatic threshold and baseline settings in QuantStudio software. The quantities calculated by the software were then dilution corrected by multiplying by the dilution factor above and size corrected following the Kapa kit manual. The libraries were normalized and pooled based on the qPCR quantities. On the following day, library denature was made and sequencing was performed on Illumina’s HiSeq. 4000 utilizing a custom Read 1 primer (following manufacturer’s protocol). Primer sequences (seeFile S5) Template switching primer AAGCAGTGGTATCAACGCAGAGTGAATrGrGrG DRUG-seq Barcoded RT primers: AAGCAGTGGTATCAACGCAGAGTACAACAAGGTACNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTV DRUG-seq PCR primer AAGCAGTGGTATCAACGCAGAGT DRUG-seq_p5_PCR primer AATGATACGGCGACCACCGAGATCTACACGCCTGTCCGCGGAAGCAGTGGTATCAACGCAGAGT*A*C DRUG-seq indexing CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTCTCGTGGGCTCGG DRUG-seq custom read 1 primer GCCTGTCCGCGGAAGCAGTGGTATCAACGCAGAGTAC

Code and Data

The data analysis pipeline starts at UMI counts the matrix, which is shown in Figure A. We first used true null calculation to find the bad and best DMSOs and then filtered out the bad DMSO for the following differential expression gene analysis. Before conducting differential expression gene analysis, we applied 75% percentile gene-level filter across all compounds treated wells and then calculated the fold change using limma-trend package. The differential expressed genes are defined as effect size is greater than 2-fold and adjusted p-value after the multiple comparisons is less than 0.1. In addition, we also generated the UMAP plot to show the global transcriptomic information of the most active compounds. We applied second true null to selected best DMSOs for investigating the technical noise. We chose 95% percentile as the number of DEGs threshold for active compounds selection and used all differentially expressed genes in the differential gene expression analysis. The batch effect was removed and applied to the plate level before generating the UMAP plot. The NR2A study was designed differently from the 14 compounds. This study had a small number of compounds to treat in the absence or presence of ligands. All replicates are on a single plate for the NR2A experiment. In addition, the DESeq2 R package was applied to calculate the DEGs. Raw data and processed data: GSE176150 Github: https://github.com/Novartis/DRUG-seq

35 in total

DRUG-seq Provides Unbiased Biological Activity Readouts for Neuroscience Drug Discovery.

Introduction

Results

Summary of Protocol and Experimental Design

DRUG-seq Analysis Pipeline and Activity Threshold

Batch and Plate Reproducibility Experiment with 14 Tool Compounds

DRUG-seq Detects On- and Off-Target Effects for NR2A Drug Discovery Program

Discussion

Materials and Methods

Cell Culture

Day 1–2: Compound Treatment and Lysis

Day 3: Reverse Transcription and Library Construction

Day 4: Tagmentation, Purification, and Quantification of DRUG-seq Libraries

Code and Data

1. A probability-based approach for the analysis of large-scale RNAi screens.

2. Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project.

3. Dimensionality reduction for visualizing single-cell data using UMAP.

Review 4. NMDA receptor subunit diversity: impact on receptor properties, synaptic plasticity and disease.

5. De Novo Mutations and Rare Variants Occurring in NMDA Receptors.

Review 6. Organogenesis in a dish: modeling development and disease using organoid technologies.

7. PLATE-Seq for genome-wide regulatory network analysis of high-throughput screens.

Review 8. Zinc as a Neuromodulator in the Central Nervous System with a Focus on the Olfactory Bulb.

9. Comprehensive functional annotation of susceptibility SNPs prioritized 10 genes for schizophrenia.

10. Efficient Generation of Transcriptomic Profiles by Random Composite Measurements.