| Literature DB >> 31932729 |
Marco Jost1,2,3,4, Daniel A Santos1,2,3, Reuben A Saunders1,2,3, Max A Horlbeck1,2,3, John S Hawkins4, Sonia M Scaria1,2,3, Thomas M Norman1,2,3,5, Jeffrey A Hussmann1,2,3,4, Christina R Liem1,2,3, Carol A Gross4,6, Jonathan S Weissman7,8,9.
Abstract
A lack of tools to precisely control gene expression has limited our ability to evaluate relationships between expression levels and phenotypes. Here, we describe an approach to titrate expression of human genes using CRISPR interference and series of single-guide RNAs (sgRNAs) with systematically modulated activities. We used large-scale measurements across multiple cell models to characterize activities of sgRNAs containing mismatches to their target sites and derived rules governing mismatched sgRNA activity using deep learning. These rules enabled us to synthesize a compact sgRNA library to titrate expression of ~2,400 genes essential for robust cell growth and to construct an in silico sgRNA library spanning the human genome. Staging cells along a continuum of gene expression levels combined with single-cell RNA-seq readout revealed sharp transitions in cellular behaviors at gene-specific expression thresholds. Our work provides a general tool to control gene expression, with applications ranging from tuning biochemical pathways to identifying suppressors for diseases of dysregulated gene expression.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31932729 PMCID: PMC7065968 DOI: 10.1038/s41587-019-0387-5
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1.Mismatched sgRNAs titrate GFP expression at the single-cell level. (a) Experimental design to test knockdown conferred by all mismatched variants of a GFP-targeting sgRNA. (b) Distributions of GFP levels in cells with a perfectly matched sgRNA (top), mismatched sgRNAs (middle), and a non-targeting control sgRNA (bottom). Sequences of sgRNAs are indicated on the right (without the PAM). (c) Relative activities of all mismatched sgRNAs, defined as the ratio of fold-knockdown conferred by a mismatched sgRNA to fold-knockdown conferred by the perfectly matched sgRNA. Data represent mean relative activities obtained from two replicate transductions.
Figure 2.A large-scale CRISPRi screen identifies factors governing mismatched sgRNA activity. (a) Design of large-scale mismatched sgRNA library. (b) Schematic of pooled CRISPRi screen to determine activities of mismatched-sgRNAs. (c) Growth phenotypes (γ) in K562 and Jurkat cells for four sgRNA series, with the perfectly matched sgRNAs shown in darker colors and mismatched sgRNAs shown in corresponding lighter colors. Phenotypes represent the mean of two replicate screens. Differences in absolute phenotypes likely reflect cell type-specific essentiality. A γ of 0 is equivalent to the average phenotype of non-targeting control sgRNAs. (d) Comparison of mismatched sgRNA relative activities in K562 and Jurkat cells. Marginal histograms depict distributions of relative activities along the corresponding axes. n = 41,512 sgRNAs; r2 = squared Pearson correlation coefficient. (e) Distribution of mismatched sgRNA relative activities stratified by position of the mismatch. Position –1 is immediately adjacent to the PAM. n = 1372–3374 sgRNAs. (f) Distribution of mismatched sgRNA relative activities stratified by type of mismatch, grouped by mismatches located in positions –19 to –13 (PAM-distal region), positions –12 to –9 (intermediate region), and positions –8 to –1 (PAM-proximal/seed region). Division into these regions was based on previous work[13,16] and the patterns in panel e. n = 437–2342 sgRNAs. (g) Comparison of mean apparent on-rates measured in vitro for mismatched variants of a single sgRNA[29] and mean relative activities from large-scale screen. Values are compared for identical combinations of mismatch type and mismatch position; mean relative activities were calculated by averaging relative activities for all mismatched sgRNAs with a given combination. Data are from n = 57 unique combinations of mismatch type and position; r2 = squared Pearson correlation coefficient. Lines in violin plots (e, f) denote distribution quartiles.
Figure 3.Identification and characterization of intermediate-activity constant regions. (a) Design of constant region variant library. (b) Mean relative activities of constant region variants, calculated by averaging relative activities for all targeting sequences; n = 995 constant region variants, gray margins denote 95% confidence interval of 30 targeting sequences. Inset: Focus on 6 constant region variants with higher activity than the original constant region. Black diamonds denote mean relative activity, gray dots denote relative activities of individual targeting sequences. (c) Mapping of constant region variant relative activities onto the constant region structure. Each constant region base is colored by the average relative activity of the three constant region variants carrying a single mutation at that position. Positions mutated in 6 highly active constant regions (inset in panel b) are indicated by colored dots. The BlpI site (gray) is used for cloning and was not mutated. (d) Constant region activities by targeting sequence, plotted against ranked mean constant region activity. For each gene, the activities with the strongest targeting sequence are shown as rolling means with a window size of 50. (e-g) Constant region activities by targeting sequence for all three targeting sequences against the indicated genes. Growth phenotypes (γ) of each targeting sequence paired with the unmodified constant region are indicated in the legend.
Figure 4.Neural network predictions of sgRNA activity. (a) Schematic of a singly-mismatched sgRNA feature array (X) and the convolutional neural network architecture trained on pairs of such arrays and their corresponding relative activities (y). Black squares in X represent the value 1 (the presence of a base at the indicated position); white represents 0. The mean prediction from 20 independently trained models was used to assign a final prediction (ŷ) to each sgRNA in the hold-out validation set (orange). (b) Comparison of measured relative growth phenotypes from the large-scale screen and predicted activities assigned by the neural network. Marginal histograms show distributions of relative activities along the corresponding axes. n = 5,241 sgRNAs; r2 = squared Pearson correlation coefficient. (c) Distribution of Pearson r values (predicted vs. measured relative activity) for each sgRNA series in the validation set. n = 406 series. (d) Comparison of measured relative activity (i.e. relative knockdown) in the GFP experiment and predicted relative sgRNA activity. Two outliers with lower-than-predicted activity are annotated with their respective mismatch position and type. Predictions are shown as mean ± S.D. from the 20-model ensemble. n = 57 sgRNAs; r2 = squared Pearson correlation coefficient.
Figure 5.Compact mismatched sgRNA library targeting essential genes. (a) Design of library. For activity bins lacking a previously measured sgRNA, novel mismatched sgRNAs were included according to predicted activity. (b) Distribution of relative activities from the large-scale library (gray) and the compact library (purple) in K562 cells. The dashed line represents sgRNAs that were selected based on predicted activity from the deep learning model. (c) Comparison of relative activities of mismatched sgRNAs in HeLa and K562 cells. Marginal histograms show the distributions of relative activities along the corresponding axes. n = 9,514 sgRNAs; r2 = squared Pearson correlation coefficient.
Figure 6.Rich phenotyping of cells with intermediate-activity sgRNAs by Perturb-seq. (a) Distributions of HSPA9 and RPL9 expression in cells with indicated perturbations. Expression is quantified as target gene UMI count normalized to total UMI count per cell. sgRNA activity is calculated using relative γ measurements from the Perturb-seq cell pool after 5 days of growth. (b) Distributions of total UMI counts in cells with indicated perturbations. (c) Comparison of median UMI count per cell and target gene expression in cells with GATA1- or POLR2H- targeting sgRNAs. (d) Right: Expression profiles of 100 genes in populations with HSPA9-targeting sgRNAs. Genes were selected by lowest FDR-corrected p-values in cells with the strongest sgRNA from a two-sided Kolmogorov-Smirnov test (Methods). Expression is quantified as z-score relative to population of cells with non-targeting sgRNAs. Left: Growth phenotype and knockdown for each sgRNA. (e) Distribution of gene expression changes in populations with indicated sgRNAs. Magnitude of gene expression change is calculated as sum of z-scores of genes differentially expressed in the series (FDR-corrected p < 0.05 with any sgRNA in the series, two-sided Kolmogorov-Smirnov test, Methods), with z-scores of individual genes signed by the direction of change in cells with the perfectly matched sgRNA. Distribution for negative control sgRNAs is centered around 0 (dashed line).
For a-e, the cell numbers for each perturbation are listed in Table S14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers).
(f) Comparison of relative growth phenotype and magnitude of gene expression change for all individual sgRNAs. Growth phenotype and magnitude of gene expression change are normalized in each series to those of the sgRNA with the strongest knockdown. (g) Comparison of magnitude of gene expression change and target gene knockdown, as in f. (h) UMAP projection of all single cells with assigned sgRNA identity in the experiment, colored by targeted gene. Clusters clearly assignable to a genetic perturbation are labeled. Cluster labeled * contains a small number of cells with residual stress response activation and could represent apoptotic cells. Note that ~5% cells appear to have confidently but incorrectly assigned sgRNA identities (Methods). Given the strong trends in the other results, we concluded that such misassignment did not substantially affect our ability to identify trends within cell populations and in the future may be avoided by approaches to directly capture the expressed sgRNA[41]. n = 19,587 cells. (i) UMAP projection, as in h, with selected series colored by sgRNA activity. n = 19,587 cells. (j) Comparison of extent of ISR activation to ATP5E UMI count in cells with knockdown of ATP5E or control cells.