Dane Z Hazelbaker1, Amanda Beccard1, Anne M Bara1, Nicole Dabkowski1, Angelica Messana1, Patrizia Mazzucato1, Daisy Lam1, Danielle Manning2, Kevin Eggan3, Lindy E Barrett4. 1. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2. Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA. 3. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA. Electronic address: eggan@mcb.harvard.edu. 4. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA. Electronic address: lbarrett@broadinstitute.org.
Abstract
Scaling of CRISPR-Cas9 technology in human pluripotent stem cells (hPSCs) represents an important step for modeling complex disease and developing drug screens in human cells. However, variables affecting the scaling efficiency of gene editing in hPSCs remain poorly understood. Here, we report a standardized CRISPR-Cas9 approach, with robust benchmarking at each step, to successfully target and genotype a set of psychiatric disease-implicated genes in hPSCs and provide a resource of edited hPSC lines for six of these genes. We found that transcriptional state and nucleosome positioning around targeted loci was not correlated with editing efficiency. However, editing frequencies varied between different hPSC lines and correlated with genomic stability, underscoring the need for careful cell line selection and unbiased assessments of genomic integrity. Together, our step-by-step quantification and in-depth analyses provide an experimental roadmap for scaling Cas9-mediated editing in hPSCs to study psychiatric disease, with broader applicability for other polygenic diseases.
Scaling of CRISPR-Cas9 technology in human pluripotent stem cells (hPSCs) represents an important step for modeling complex disease and developing drug screens in human cells. However, variables affecting the scaling efficiency of gene editing in hPSCs remain poorly understood. Here, we report a standardized CRISPR-Cas9 approach, with robust benchmarking at each step, to successfully target and genotype a set of psychiatric disease-implicated genes in hPSCs and provide a resource of edited hPSC lines for six of these genes. We found that transcriptional state and nucleosome positioning around targeted loci was not correlated with editing efficiency. However, editing frequencies varied between different hPSC lines and correlated with genomic stability, underscoring the need for careful cell line selection and unbiased assessments of genomic integrity. Together, our step-by-step quantification and in-depth analyses provide an experimental roadmap for scaling Cas9-mediated editing in hPSCs to study psychiatric disease, with broader applicability for other polygenic diseases.
Combining CRISPR-Cas9 gene editing and directed differentiation of human pluripotent stem cells (hPSCs) into somatic cell types offers a powerful approach to elucidate mechanisms across the spectrum of human disease. As methodologies are fine-tuned in each of these arenas, the field is challenged to increase scale in order to keep pace with emerging human genetic data. Indeed, biobanking initiatives for hPSCs (including both human embryonic stem cells [hESCs] and human induced pluripotent stem cells [hiPSCs]) have emerged in recent years with collections aiming to include thousands of cell lines (Avior et al., 2016, McKernan and Watt, 2013). In parallel, high-throughput platforms are being developed for phenotypic characterization of hPSCs and their differentiated progeny (Leha et al., 2016, Paull et al., 2015).These scaled initiatives hold great promise for accelerating our understanding of polygenic diseases such as diabetes (Flannick and Florez, 2016), heart disease (Kessler et al., 2016), and psychiatric disease. In the case of psychiatric disease, human genetic data have implicated hundreds of loci or genes in schizophrenia (SCZ), autism spectrum disorder (ASD), and intellectual disability (ID) (Sanders et al., 2015, Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014), yet only a fraction of implicated genes have well-defined neurobiological functions. Here, stem cell models are particularly relevant, as human brain tissue is largely inaccessible for in vitro studies, and common model organisms lack implicated brain structures (Sanders et al., 2015, Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014, Yoon et al., 2014). Indeed, stem cell differentiation technologies now allow for the generation of many human brain cell types in vitro (Chambers et al., 2009, Kriks et al., 2011, Lu et al., 2016, Mertens et al., 2016, Zhang et al., 2013), facilitating the use of genetically defined hPSC lines to study disease mechanisms in relevant neuronal populations (Hendriks et al., 2016, Santos et al., 2016).Several studies have identified disease-relevant phenotypes in cells with single-gene loss-of-function (LoF) mutations generated using CRISPR-Cas9. For example, Yi et al. (2016) generated isogenic hESCs with mutations in the ASD-associated gene SHANK3 and derived neurons in vitro to identify specific channel defects, demonstrating how a LoF strategy was valuable for making discoveries of normal neurobiological function to provide insight into the disease state. Similar strategies were employed in hiPSCs to study the ASD-associated gene CHD8 (Wang et al., 2015) and the SCZ-associated gene DISC1 (Srikanth et al., 2015).Despite these advances, empirical knowledge of best practices for scaling Cas9-mediated gene editing in hPSCs is lacking. Compared with more commonly used HEK293Ts or cancer cell lines (Doench et al., 2016, Hsu et al., 2013), hPSCs display poor viability at a single-cell state (Amit et al., 2000, Hendriks et al., 2016), and studies report lower frequencies of gene correction in hPSCs compared with cancer cell lines (Hockemeyer and Jaenisch, 2016). Pluripotent cells may have a lower tolerance for double-strand breaks (DSBs) compared with non-pluripotent cells (Liu et al., 2014) and can acquire chromosomal abnormalities with continued passage or cellular stress (Amit et al., 2000, Martins-Taylor and Xu, 2012, Taapken et al., 2011). At the level of the targeted locus, some studies report that chromatin accessibility is a significant factor in the ability of Cas9 complexes to bind DNA (Daer et al., 2016, Isaac et al., 2016, Kuscu et al., 2014, O'Geen et al., 2015, Wu et al., 2014) while others report no correlation (Perez-Pinera et al., 2013) or a decrease of Cas9 binding in heterochromatic regions (Knight et al., 2015). Thus, it is difficult to predict the outcome of an individual gene-targeting experiment in an hPSC line, a problem surmountable for single-gene studies, but one that is amplified with increased experimental scale.Here, we applied an established Cas9-mediated gene-targeting strategy to target and genotype 58 distinct genes implicated in psychiatric disease. Our dataset, comprising nearly 5,000 individual hPSC clones, allowed us to benchmark efficiencies at each experimental step, as well as the potential influence of transcription state and nucleosome positioning on gene-editing outcome. To derive indel frequencies and features, we applied both a publicly available sequence-evaluation tool as well as an in-house tool. Finally, to address whether different hPSC lines may be more amenable to the gene-editing workflow, we targeted a subset of genes in an independent hPSC line.
Results
Gene Selection, Cell-Line Selection, and Gene-Editing Workflow
We selected 58 genes implicated in SCZ, ASD, and/or ID (Basak et al., 2015, Durand et al., 2007, Golzio et al., 2012, Hamdan et al., 2009, Nishimura et al., 2007, O'Roak et al., 2012, Pickard et al., 2005, Sanders et al., 2015, Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014, Too et al., 2016) distributed across 16 chromosomes (Figure 1A and Table S1) and applied a standardized gene-editing approach (Santos et al., 2016) (Figure 1B) to generate heterozygous and homozygous indels in hPSCs utilizing paired single-guide RNAs (sgRNAs) and Cas9D10A nickase (Ran et al., 2013b). We used Cas9D10A nickase based on studies suggesting equal on-target cutting efficiency and enhanced specificity with this strategy compared with single sgRNAs and Cas9 nuclease (Ran et al., 2013a). Paired sgRNAs were designed to target early constitutive exons (Table S1) in order to generate frameshift indels in the maximal number of gene isoforms.
Figure 1
Selected Genes and Experimental Workflow
(A) Genes selected for editing and their chromosomal location. For gene target and sgRNA plasmid details, see Table S1.
(B) Schematic of Cas9D10A nickase gene-editing workflow. For amplicon primer sequences, see Table S2.
Selected Genes and Experimental Workflow(A) Genes selected for editing and their chromosomal location. For gene target and sgRNA plasmid details, see Table S1.(B) Schematic of Cas9D10A nickase gene-editing workflow. For amplicon primer sequences, see Table S2.We then selected two well-characterized hESC lines, HUES63 (XY) and WA01 (XY) (Thomson et al., 1998), with the goal of creating LoF mutations in isogenic backgrounds in order to further understand gene function and ultimately compare cellular phenotypes with those observed in patient-derived iPSCs in downstream studies. Both HUES63 and WA01 were genotyped using a high-density Illumina microarray to probe psychiatric disease risk, and we confirmed a lack of significant enrichment for marker SNPs associated with common psychiatric diseases (<2 SDs from the mean; data not shown). Both cell lines were also whole-exome sequenced at 60× coverage, confirming the absence of known structural variants predicted to have a negative impact on stem cell biology or brain development (Merkle et al., 2017).Initially, HUES63 cells were transfected with paired sgRNA plasmids and Cas9D10A nickase-GFP (PX286; kindly provided by F. Zhang). Individual GFP positive (GFP+) cells were then isolated by fluorescence-activated cell sorting (FACS) and plated at low density to allow clonal stem cell recovery. Individual clones were isolated and expanded in 96-well plates and duplicated for cell freezing and genomic DNA (gDNA) extraction. For each clonal well, barcoded PCR amplicons spanning the Cas9 target site were generated (Figure 1B and Table S2) and pooled to create a sequencing library for each targeted gene. To assess indel generation, sequence files were input into OutKnocker (www.OutKnocker.org) (Schmid-Burgk et al., 2014).
Efficiency of Indel Generation in hPSCs
We first quantified efficiencies at key experimental steps. Considering transfection efficiency of Cas9D10A nickase-GFP in hPSCs across 45 separate gene-targeting experiments, we obtained an average of 24.8% GFP+ cells (Figure 2A) with a range of 6.43%–45.1% GFP+ cells (Figure S1A). In all cases, we recovered a sufficient number of live GFP+ cells to plate at clonal density and isolate colonies. On average, we plated 10,000 cells in a 10 cm dish (range, 6,400–30,000 cells) and picked 87.3 clones per gene (considering data for 57 genes; 4,976 total clones) (Figure 2B). After analyzing sequencing data with OutKnocker, we obtained indel data on an average of 74.8 clones per gene-targeting experiment (n = 58 genes; 4,333 total clones) (Figure 2B). In total, we obtained indel data on approximately 86% of isolated clones with an average read depth of 18,878 reads per clone across all experiments. These results validated a high rate of return for each clone picked and analyzed using the above conditions.
Figure 2
Indel Generation Efficiency for 58 Targeted Genes in the hPSC Line HUES63
(A) Percentage of GFP+ cells following Cas9D10A-GFP transfection (n = 45 genes). Individual transfection percentages are shown in Figure S1A.
(B) Number of clones isolated (“picked”; n = 57 genes) and number of clones that yielded indel data (“called”; n = 58 genes) per gene.
(C) Percentage of clones that were putative WT, HET/MIXED, NULL, IN FRAME, or UND for 58 targeted genes (n = 4,333 clones). For examples of clone genotyping, see Figure S1B.
(D) Editing landscape across 58 genes with percentage of clones that were putative HET/MIXED (red), NULL (orange), IN FRAME (blue), UND (black), or WT (gray) for each gene.
In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.
Indel Generation Efficiency for 58 Targeted Genes in the hPSC Line HUES63(A) Percentage of GFP+ cells following Cas9D10A-GFP transfection (n = 45 genes). Individual transfection percentages are shown in Figure S1A.(B) Number of clones isolated (“picked”; n = 57 genes) and number of clones that yielded indel data (“called”; n = 58 genes) per gene.(C) Percentage of clones that were putative WT, HET/MIXED, NULL, IN FRAME, or UND for 58 targeted genes (n = 4,333 clones). For examples of clone genotyping, see Figure S1B.(D) Editing landscape across 58 genes with percentage of clones that were putative HET/MIXED (red), NULL (orange), IN FRAME (blue), UND (black), or WT (gray) for each gene.In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.We applied the following nomenclature in our genotype analysis to identify clones for downstream LoF analyses. Clones that had less than 25% unaligned reads and no indels were scored as putative wild-type (WT), while clones with greater than 25% unaligned reads were defined as undetermined (UND). Clones that had at least two sequence species with a minimum of one sequence species containing a frameshift indel and one sequence species not corresponding to a frameshift indel were considered putative heterozygous or heterozygous mixed clones (HET/MIXED). The designation of heterozygous mixed clones is important as sub-cloning of these species may yield clones that are heterozygous or homozygous for frameshift indels. In cases where all sequences corresponded to identical or non-identical frameshift indels and no WT sequence, clones were considered putative frameshift null (NULL). Finally, when reads contained an in-frame indel or mix of WT and in-frame indels, but no frameshift indels, clones were designated as putative in-frame (IN FRAME). Examples of read alignments for putative WT, HET/MIXED, and NULL clones are shown in Figure S1B.Across all experiments, we obtained an average of 39.8% WT clones, 27.2% HET/MIXED clones, 11.5% NULL clones, 9.7% IN FRAME clones, and 11.7% UND clones (Figure 2C). Individual genes showed a high degree of variability in terms of the number of clones with indels, ranging from 0% (e.g., PTN) to nearly 90% (e.g., KDM4A and MAN2A1) (Figure 2D). While we cannot distinguish between poor activity of sgRNAs versus indels that may be deleterious to stem cell viability and therefore selected against, 56 of 58 genes targeted resulted in clones with putative frameshift indels (Figure 2D), suggesting that the Cas9D10A nickase approach was effective in hPSCs across a broad range of genes. Indeed, while we failed to genotype any clones with putative frameshift indels in CUL3, this gene has been shown to play an important role in mammalian cell division (Singer et al., 1999).
Influence of Gene-Specific Features on Indel Generation
To probe the contribution of gene-specific features to Cas9-mediated indel generation, we examined how transcript expression and nucleosome positioning around target genes affected outcome. By comparing editing efficiency in 53 genes with their corresponding FPKM (fragments per kilobase of transcript per million mapped reads) values in HUES63 (Cacchiarelli et al., 2015), we found that transcript levels were a poor predictor of editing efficiency (Figure 3A). We observed examples of high editing efficiency in genes expressed with FPKM <1 and low editing efficiency in genes expressed with FPKM >10 and vice versa. Indeed, editing efficiency appeared randomly distributed among high and low expressed genes, with an R2 value of 0.01846 (Figure S2). While these analyses do not account for the influence of sgRNA expression/stability for an individual gene, our data suggest that neither high nor low transcript expression was an accurate predictor of editing outcome. Furthermore, we examined nucleosome positioning around 58 target genes using a published MNase-seq dataset (Yazdi et al., 2015) and found no correlation between editing efficiency and nucleosome dyad distance to the nearest PAM sequence, with an R2 value of 0.00242 (Figure 3B). This suggests that efficient editing can occur at target loci in close proximity to the predicted location of nucleosomes.
Figure 3
Overall Editing Efficiency Versus Steady-State Transcript Levels and Predicted Nucleosome Position
(A) Percentage of clones EDITED (putative HET/MIXED, NULL, IN FRAME, or UND clones with greater than 5% non-WT reads; blue) or UNEDITED (WT; gray) for each gene (n = 53) plotted with their respective FPKM values (top). For the correlation plot of gene edits relative to FPKM, see Figure S2.
(B) Correlation between editing efficiency and the distance between the closest PAM of either Cas9D10A target site (n = 58 genes) to the nearest nucleosome dyad. R2 = 0.00242. Examples of genes with a high percentage of edited clones (PPP1R16B, MPP6) and a low percentage of edited clones (CUL3, PTN) are highlighted.
Overall Editing Efficiency Versus Steady-State Transcript Levels and Predicted Nucleosome Position(A) Percentage of clones EDITED (putative HET/MIXED, NULL, IN FRAME, or UND clones with greater than 5% non-WT reads; blue) or UNEDITED (WT; gray) for each gene (n = 53) plotted with their respective FPKM values (top). For the correlation plot of gene edits relative to FPKM, see Figure S2.(B) Correlation between editing efficiency and the distance between the closest PAM of either Cas9D10A target site (n = 58 genes) to the nearest nucleosome dyad. R2 = 0.00242. Examples of genes with a high percentage of edited clones (PPP1R16B, MPP6) and a low percentage of edited clones (CUL3, PTN) are highlighted.
Indel Properties with Cas9D10A Nickase
We next analyzed the properties of indels generated in HUES63 and found that, on average, 84.72% of indels were deletions, and 15.28% of indels were insertions (Figure 4A). As expected, 33.65% of indels were in-frame, and 66.35% of indels were frameshift (Figure 4B). We observed an average deletion size of 25.7 bp (range, 1–90 bp) and an average insertion size of 5.4 bp (range, 1–10 bp) (Figures 4C–4E). There was no apparent size difference for in-frame versus frameshift deletions or in-frame versus frameshift insertions (Figure 4C) and no clear preference for indels of a specific length (Figures 4D and 4E). It is important to note that version 1.0 of OutKnocker did not call insertions greater than 10 bp, so there may be additional larger insertions present but not detected. Further, indels that made up a sizable percentage of the overall PCR amplicon (average amplicon length = 166 bp) likely failed in the library generation step, which weighted our analyses toward the detection of smaller indels. We also sought to examine the influence of 5′ overhang length generated by Cas9D10A on editing efficiency by binning our sgRNA pairs into 35–44 base, 45–54 base, and 55–64 base 5′ overhang groups and determining the editing efficiency of each group. As shown in Figure 4F, 5′ overhang length had little effect on editing efficiency, in line with previous observations that optimal sgRNA cut site spacing was approximately 30–70 bases (Shen et al., 2014).
Figure 4
Indel Properties for 58 Targeted Genes in HUES63
(A) Percentage of Cas9D10A-mediated indels called as deletions (blue; n = 7,235 calls) or insertions (green; n = 1,305 calls).
(B) Percentage of in-frame indels (green; n = 2,874 calls) or frameshift indels (blue; n = 5,666 calls).
(D) Distribution of deletion sizes (1–90 bp; n = 7,235 calls).
(E) Distribution of insertion sizes (1–10 bp; n = 1,305 calls).
(F) Schematic of Cas9D10A-nickase generated 5′ overhangs at target sites (top). Boxplots depicting the influence of 5′ overhang length of paired sgRNAs on editing efficiency for 58 genes (bottom).
(G) Locations and identity of inserted DNA sequences in selected clones (C) for FXR1, KCTD13, and TLE1 genes and their alignment to the Cas9D10A target sequences. Inserted sequences for each clone are shown in red.
In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles.
Indel Properties for 58 Targeted Genes in HUES63(A) Percentage of Cas9D10A-mediated indels called as deletions (blue; n = 7,235 calls) or insertions (green; n = 1,305 calls).(B) Percentage of in-frame indels (green; n = 2,874 calls) or frameshift indels (blue; n = 5,666 calls).(C) Average size of in-frame insertions (n = 383 calls), in-frame deletions (n = 2,491 calls), frameshift insertions (n = 922 calls), and frameshift deletions (n = 4,744 calls). Error bars represent SD.(D) Distribution of deletion sizes (1–90 bp; n = 7,235 calls).(E) Distribution of insertion sizes (1–10 bp; n = 1,305 calls).(F) Schematic of Cas9D10A-nickase generated 5′ overhangs at target sites (top). Boxplots depicting the influence of 5′ overhang length of paired sgRNAs on editing efficiency for 58 genes (bottom).(G) Locations and identity of inserted DNA sequences in selected clones (C) for FXR1, KCTD13, and TLE1 genes and their alignment to the Cas9D10A target sequences. Inserted sequences for each clone are shown in red.In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles.During the course of our analyses, we noted that a significant number of insertions (≥3 bp) contained sequences with micro-homology to portions of the sgRNA and/or DNA target sites for Cas9D10A nickase (Figure 4G). It is possible that upon generation of a DSB, the transfected sgRNA plasmids serve as readily available donor templates for homology-directed DNA repair (Cong et al., 2013), microhomology-mediated end joining (van Overbeek et al., 2016), or RNA-directed DNA repair pathways (Keskin et al., 2014). The presence of these insertions underscores the need to consider plasmid-free Cas9 delivery in hPSCs, such as RNPs, that may reduce the potential integration of unwanted sequences with high or degenerate homology to sequences adjacent to the DSB.
Utilization of a Second Sequence-Evaluation Tool
Next-generation sequencing is a powerful tool to identify indels generated by gene targeting, however, downstream analytical pipelines often use different alignment methodologies, threshold criteria, and variant effect predictors, which can substantially influence outcomes (Hasan et al., 2015). To further validate our findings, we developed a sequence-evaluation tool, which we called BaySnpper (Figure S3A; https://github.com/dat4git/BaySnpper). BaySnpper utilized FreeBayes (FB) (Garrison and Marth, 2012) to create composite calls from forward and reverse sequence reads and SnpEff to make predictions about the indel effect (Cingolani et al., 2012). Data were exported to generate Tab Separated Values (TSV) files as well as annotated Variant Call Format (VCF) files for visualization in Integrative Genomics Viewer (Robinson et al., 2011) or other genome browsers (Figure S3A). We then re-analyzed 30 individual gene-targeting experiments using BaySnpper and compared calls with those obtained via OutKnocker (Figures 5A, 5B, and S3B). As shown in Figure 5A, indel distribution was similar across 30 genes using data analyzed by OutKnocker and BaySnpper. Both OutKnocker and BaySnpper called an average of approximately 19–22 HET/MIXED clones, 6–7 NULL clones, and 6–8 IN FRAME clones per gene. On a gene-by-gene basis, we also observed similar proportions of HET/MIXED, NULL, and IN FRAME clones (Figure 5B). It is important to note that clone-to-clone differences did arise, presumably due to differences in alignment algorithms, read thresholds, and quality scores between the programs (Figures S3C and S3D). For example, TSNARE1 showed highly similar proportions of HET/MIXED, NULL, and IN FRAME calls (Figure 5B), but there were discrepant indel calls between individual clones analyzed with OutKnocker and BaySnpper (Figures S3C and S3D). Thus, utilization of two analysis programs in parallel increased our confidence at the level of an individual clone.
Figure 5
Comparison between OutKnocker and BaySnpper Indel Analysis Tools
(A) Average number of putative HET/MIXED, NULL, or IN FRAME clones analyzed by OutKnocker (OK) or BaySnpper (BSNP) for 30 Cas9D10A targeted genes (n = 2,128 clones).
(B) Distribution of putative HET/MIXED (red), NULL (orange), or IN FRAME (blue) clones analyzed by OK or BSNP tools for 30 Cas9D10A targeted genes. For examples of data visualization and call comparisons between OK and BSNP, see Figures S3A–S3D.
In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.
Comparison between OutKnocker and BaySnpper Indel Analysis Tools(A) Average number of putative HET/MIXED, NULL, or IN FRAME clones analyzed by OutKnocker (OK) or BaySnpper (BSNP) for 30 Cas9D10A targeted genes (n = 2,128 clones).(B) Distribution of putative HET/MIXED (red), NULL (orange), or IN FRAME (blue) clones analyzed by OK or BSNP tools for 30 Cas9D10A targeted genes. For examples of data visualization and call comparisons between OK and BSNP, see Figures S3A–S3D.In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.
Gene Editing in Independent hPSC Lines
To compare results in HUES63 with another hESC line, we repeated seven gene-targeting experiments in WA01 using identical sgRNAs and Cas9D10A nickase-puro (PX462; Ran et al., 2013b) or Cas9 nuclease (PX459; Ran et al., 2013b) (Figure 6A and Table S3). Although seven out of seven genes targeted in WA01 resulted in clones with putative frameshift indels, average indel generation efficiency was consistently lower in WA01 compared with HUES63. To assess whether this could be attributed to the use of Cas9D10A nickase specifically, we repeated four gene-targeting experiments using Cas9 nuclease (PX459 (Ran et al., 2013b)). As shown in Figure 6B, both Cas9D10A nickase and Cas9 nuclease approaches yield similar outcomes, suggesting that WA01 may be less amenable to Cas9 editing. However, the properties of the indels generated remained comparable between WA01 and HUES63 (Figures 4A–4C and S4A–S4H), and there was a trend toward smaller deletions using the Cas9 nuclease (Figure S4G). These results suggest that inherent or culture-acquired properties of individual hPSC lines may significantly affect Cas9-mediated gene-editing efficiency. From WA01 targeting experiments, clones containing putative heterozygous and homozygous frameshift indels in CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1 (Herring et al., 2016), and TLE3 (Bara et al., 2016) are described in Table S4.
Figure 6
Gene Editing in Independent hPSC Lines
(A) Comparison of Cas9D10A nickase editing efficiencies in HUES63 (blue; n = 519 clones, PX286) and WA01 (yellow; n = 302 clones, PX462) for seven selected targeting experiments using identical sgRNA pairs for CSMD1, CYFIP1, FURIN, FXR1, SYNGAP1, TLE1, and TLE3. Gene target and sgRNA plasmid details are in Table S3 and for indel properties, see Figures S4A–S4C.
(B) Comparison of editing efficiencies in WA01 with Cas9D10A nickase (blue; n = 178 clones, PX462) and Cas9 nuclease (yellow; n = 177 clones, PX459) for four selected gene-targeting experiments with identical paired and single sgRNAs for CSMD1, CYFIP1, FXR1, and TLE1. Indel comparisons are found in Figures S4D–S4H.
(C) Percentage of clones analyzed by SNP genotyping or G-banded karyotyping analysis with no chromosomal aberration detected (blue), CNVs only (orange), or trisomies with or without CNVs (green) for HUES63 (n = 83 clones) and WA01 (n = 83 clones).
(D) Examples of G-banded metaphase analysis for parental HUES63 (46, XY) and an abnormal clone after gene targeting (51, XY +11, +12, +14, +17, +X). Arrows denote instances of aneuploidy.
(E) Generation of induced neurons from select WA01 clones with indels in CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1, and TLE3. For detailed clone information, see Table S4. For pluripotency and tri-lineage differentiation potential of select WA01 clones, see Figures S5A and S5B.
In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.
Gene Editing in Independent hPSC Lines(A) Comparison of Cas9D10A nickase editing efficiencies in HUES63 (blue; n = 519 clones, PX286) and WA01 (yellow; n = 302 clones, PX462) for seven selected targeting experiments using identical sgRNA pairs for CSMD1, CYFIP1, FURIN, FXR1, SYNGAP1, TLE1, and TLE3. Gene target and sgRNA plasmid details are in Table S3 and for indel properties, see Figures S4A–S4C.(B) Comparison of editing efficiencies in WA01 with Cas9D10A nickase (blue; n = 178 clones, PX462) and Cas9 nuclease (yellow; n = 177 clones, PX459) for four selected gene-targeting experiments with identical paired and single sgRNAs for CSMD1, CYFIP1, FXR1, and TLE1. Indel comparisons are found in Figures S4D–S4H.(C) Percentage of clones analyzed by SNP genotyping or G-banded karyotyping analysis with no chromosomal aberration detected (blue), CNVs only (orange), or trisomies with or without CNVs (green) for HUES63 (n = 83 clones) and WA01 (n = 83 clones).(D) Examples of G-banded metaphase analysis for parental HUES63 (46, XY) and an abnormal clone after gene targeting (51, XY +11, +12, +14, +17, +X). Arrows denote instances of aneuploidy.(E) Generation of induced neurons from select WA01 clones with indels in CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1, and TLE3. For detailed clone information, see Table S4. For pluripotency and tri-lineage differentiation potential of select WA01 clones, see Figures S5A and S5B.In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.Despite reduced editing efficiency, WA01 displayed greater genomic stability compared with HUES63. Indeed, despite starting with a normal karyotype, HUES63 cells acquired chromosomal aberrations in 78 of 83 clones analyzed in the laboratory to date by SNP genotyping or G-banded karyotyping analysis (Figures 6C and 6D). We observed recurrent aberrations in HUES63 across experiments, namely trisomies 4, 11, 12, 14, 17, and X, in addition to the presence of a small number of copy number variants (CNVs; 0.27–2.4MB). Although individual clones differed in the number and specific combination of trisomies, the vast majority of clones shared trisomy 12, and many also shared trisomies 11, 14, and 17. Trisomy 4 was observed in approximately 35% of clones (data not shown). Importantly, none of the 58 genes included in this dataset were localized on chromosomes prone to trisomy. Trisomies 12, 14, 17, and X have been repeatedly identified in hPSCs in vitro and are thought to endow cells with a competitive growth advantage (Baker et al., 2016, International Stem Cell Initiative et al., 2011, Mayshar et al., 2010, Nguyen et al., 2013, Peterson and Loring, 2014), while trisomies 4 and 11 are less frequently reported in the literature. By contrast, WA01 acquired chromosomal aberrations in only 6 of 83 clones analyzed in the laboratory to date (Figure 6C), including one clone with trisomy 20, and five clones each harboring a single CNV (0.43–2.39 MB). The potential growth advantage conferred upon cells by chromosomal aberrations may indeed have contributed to the increased editing efficiency observed in HUES63 compared with WA01. However, it is also likely that the bottleneck of clonal selection was a significant contributor to instability rather than DNA damage caused by Cas9, as we identified similar chromosomal aberrations in HUES63 clones using the same gene-targeting protocol in the absence of Cas9 and sgRNA expression (data not shown).
Generation of Induced Neurons from Edited WA01 hPSC Lines
To further validate our hPSC lines after gene targeting, we confirmed expression of pluripotency markers as well as germ-layer differentiation capacity following embryoid body formation for a set of six WA01 clones containing putative frameshift indels in CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1, and TLE3 (Figures S5A and S5B) with no chromosomal aberrations detected (Table S4). To assess the relevance of these clones for the interrogation of downstream cellular phenotypes, we then generated induced neurons via ectopic expression of Ngn2 (Figure 6E) (Pak et al., 2015, Yi et al., 2016, Zhang et al., 2013).
Discussion
By directly comparing editing outcomes across nearly 5,000 hPSC clones from a diverse set of 58 genes, our findings outline a readily applicable experimental and computational framework for Cas9-mediated editing in hPSCs for the study of psychiatric disease. In addition, our findings help to set expectations for large-scale studies of other complex diseases.Without pre-testing guide pairs, we obtained putative frameshift indels in approximately 40% of clones in HUES63 and 25% of clones in WA01. The fact that the Cas9D10A nickase and Cas9 nuclease strategies were equally efficient in terms of indel generation suggests that results obtained with the Cas9D10A nickase strategy are broadly representative of Cas9-mediated indel generation in these hPSCs. Given our limited ability to predict editing efficiency without pre-testing guide pairs in vitro and the variability in editing efficiency between different cell lines, we opted to pick a relatively large number of clones per gene-targeting experiment. A successful sequencing and analysis rate of 86% provided us with multiple clones with putative frameshift indels for the majority of genes, even in cases of lower overall editing efficiency. Based on the variability in indel generation observed at a gene-to-gene as well as cell line-to-cell line level, we recommend selecting a similarly large set of clones for initial screening until cell-line-specific data can be ascertained.Within our gene set, we failed to see either a positive or negative correlation between steady-state transcript levels or nucleosome positioning and Cas9 editing efficiency. A priori, we expected highly expressed genes to be more efficiently edited due to a more open chromatin structure (Kuscu et al., 2014, O'Geen et al., 2015, Wu et al., 2014). It is possible that some discrepancies in the literature may result from applications that require Cas9 to have stable as opposed to transient binding to target DNA. For example, several studies reported a positive correlation between chromatin accessibility and Cas9 functionality using nuclease-dead Cas9 (Kuscu et al., 2014, Wu et al., 2014) as opposed to nuclease-active Cas9. As discussed by Horlbeck et al. (2016), nuclease-dead Cas9 requires persistent access to DNA for optimal functionality while nuclease-active Cas9 requires only transient access, which could perhaps be achieved following disruption of nucleosomes during replication.Computationally, both OutKnocker and BaySnpper effectively identified indels in our dataset, and we advocate the use of a minimum of two programs simultaneously to increase confidence in individual indel calls. Indeed, similar combinatorial strategies have been suggested for evaluating indels from whole-genome sequencing data (Hasan et al., 2015). One advantage of BaySnpper is that it gave detailed information on variant effect prediction, including the level of effect (i.e., high, moderate, or low), location (i.e., coding region, splice site, intron/exon boundary), and effect (i.e., silent, frameshift, missense) as the prediction was made in the context of the whole genic structure (Cingolani et al., 2012).Importantly, we found one of two parental cell lines, HUES63, to be chromosomally unstable, acquiring aberrations that likely favored their propagation in vitro. While HUES63 and WA01 were cultured under standard conditions, current gene-targeting protocols require single-cell passaging for transfection and clonal selection, which may contribute to the emergence of aberrations in vitro. Our analyses detected large structural aberrations, but did not address the potential for small CNVs or single nucleotide variants, which could further complicate comparisons between clonal lines assumed to be isogenic. We speculate there may be a relationship between stem cell growth properties and Cas9 editing efficiency, highlighting the importance of identifying factors that contribute to the propensity of a given hPSC line to become aneuploid, especially for applications utilizing large-scale gene editing in hPSCs.Finally, we have validated a set of 58 sgRNAs in hPSCs for the study of diverse genes implicated in psychiatric disease as well as generated a banked resource of CRISPR-edited hPSC lines for six genes, including CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1, and TLE3. It will be crucial for future studies to incorporate additional genetically diverse cell lines, including hiPSCs, which may include highly informative behavioral and developmental data from patients, to accelerate our understanding of basic underlying disease mechanisms as well as drug discovery efforts.
Experimental Procedures
Plasmid DNA Construction
Paired sgRNAs targeting early exons of selected genes were designed using http://crispr.mit.edu. Oligonucleotides (IDT) corresponding to antisense and sense pairs of sgRNAs were cloned into a pU6-sgRNA vector (kindly provided by Neville Sanjana and Feng Zhang, Broad Institute) to generate individual sgRNA plasmids (Bara et al., 2016, Cong et al., 2013, Herring et al., 2016). The sequence and locations for each gene-specific sgRNA target site are described in Table S1.
Stem Cell Culture and Gene Targeting
The hESC line HUES63 was obtained from the Harvard Stem Cell Institute (Cambridge, MA; http://ipscore.hsci.harvard.edu) and the hESC line WA01 was obtained from WiCell Research Institute (Madison, WI) (Thomson et al., 1998). Stem cells were grown in mTeSR1 medium (Stem Cell Technologies 05850) on Geltrex-coated plates (Life Technologies A1413301) and tested to confirm the absence of mycoplasma contamination (Lonza MycoAlert) as described (Bara et al., 2016, Herring et al., 2016).For transfection, cells were pre-incubated with 1:1 medium composed of a 1:1 mixture of mTeSR1 medium and hPSC medium (KnockOut DMEM [Gibco 10829-018] with 20% KnockOut serum replacement [Gibco 10828-028], 1% Glutamax [Gibco 35050-061], 1% non-essential amino acids [Corning 25-025-Cl], 0.1% 2-mercaptoethanol [Gibco 21985-023], and 20 ng/mL basic fibroblast growth factor [EMD Millipore GF0003AF]) supplemented with 10 μM ROCK inhibitor (Y-27632). For each transfection, 2.5 × 106 cells were electroporated at 1,050 V, 30 ms, 2 pulses (NEON, Life Technologies MPK10096), as described (Bara et al., 2016, Herring et al., 2016). The following DNA concentrations were used: 1.4 μg per sgRNA plasmid and 7 μg Cas9 plasmid. Cells were dispensed into 10 cm Geltrex-coated plates in 1:1 medium plus 10 μM Y-27632. Twenty-four hours after transfection, GFP+/phosphotidylinositol− cells were isolated by FACS or cells were treated with 1 μM puromycin (Sigma-Aldrich P8833) for 24 hr and then maintained in 1:1 medium for 1–2 weeks to allow clonal stem cell recovery.Individual hPSC colonies were picked and seeded into Geltrex-coated 96-well plates, expanded for 1–2 weeks, and duplicated for cell freezing and gDNA extraction. Clones were frozen in 96-well plates using 50% 1:1 medium plus 10 μM Y-27632, 40% fetal bovine serum (VWR SH30070.03), and 10% DMSO (Sigma D2650). gDNA was extracted overnight at 55°C in Tail Lysis Buffer (Viagen 102-T) with Proteinase K (Roche 03115828001) followed by a 1 hr incubation at 90°C. G-band karyotyping analysis was performed by Cell Line Genetics (Madison, WI), and SNP genotyping was performed using the Illumina Infinium PsychArray-24 Kit (Broad Institute, Cambridge, MA). SNP genotyping was also utilized for cell-line authentication.Using WA01 for gene targeting, we observed low cell viability following FACS and therefore transfected with Cas9D10A nickase-puromycin (PX462) and selected for individual clones with puromycin treatment. To ensure that PX462 was competent to generate indels, we repeated four gene-targeting experiments in HUES63 using identical sgRNAs (Table S3) with PX462 instead of Cas9D10A nickase-GFP (PX286). We observed similar average indel frequencies overall using PX286 and PX462 in HUES63 across four genes analyzed (data not shown) indicating that GFP versus puromycin selection did not significantly alter our ability to generate indels.
DNA Sequencing
To create gene-specific libraries from extracted gDNA in 96-well format, primers containing Illumina multiplexing adapters (Ilumina) and gene-specific sequences (Table S2) were used to PCR amplify sequencing amplicons with Q5 Hot Start High-Fidelity Master Mix (NEB M04945). A second round of PCR amplification incorporated well-specific barcode IDs (Broad Institute). Barcoded PCR products were pooled and gel purified (Zymo Research D4008), run on a 2100 BioAnalyzer (Agilent Technologies) for quality assessment, and submitted for MiSeq paired-end sequencing (Broad Institute), as described (Bara et al., 2016, Herring et al., 2016).
Indel Analyses Using OutKnocker
Gene-specific FASTQ sequence files, reference amplicons, and target sites corresponding to the sense sgRNA for each gene in forward (F) and reverse (R) reads were uploaded into the OutKnocker indel analysis program (www.OutKnocker.org) (Schmid-Burgk et al., 2014). To standardize analyses, we applied a cut-off threshold of 5% read species for each individual clone. Read species that fell below the 5% threshold were not included in our analyses. To determine both indel effect on each clone (i.e., putative NULL, HET/MIXED, IN FRAME, UND) and overall editing efficiency for each gene via OutKnocker, we selected the read direction (F or R) that provided the most complete calls overall.To determine frequencies of insertions versus deletions and putative in-frame versus frameshift indels, we tallied calls from all available OutKnocker data, including both F and R sequence reads to give an overall view of the indel landscape (with the exception of CACNB2, CNOT1, SCN2A, ZSWIM6, CYP26B1, DPP4, and SYNGAP1 for which reads from only one direction were available). In all boxplots, the dark horizontal line represents the median, the box represents the 25th and 75th percentiles, the whiskers represent the 5th and 95th percentiles, and the circles represent outliers.
Indel Analyses Using BaySnpper
To determine indel effect and editing efficiency on selected genes with the BaySnpper indel calling tool, FASTQ files containing sequence data for F and R reads, along with the relevant barcode and gene reference amplicon sequences, were submitted to The Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA, for processing.
Transcript and Nucleosome Position Analyses
For transcript analyses, edited and unedited percentages for each gene (n = 53) were plotted against corresponding FPKM values for HUES63 (Cacchiarelli et al., 2015). The genes SHANK3, ZNF536, AKT3, GALNT10, and TBC1D5 were excluded from analyses as they were not present in the RNA-seq dataset. For nucleosome position analyses, edited and unedited percentages for each gene (n = 58) were plotted against the distance between the closest PAM sequence (NGG, where N = 0, bases 5′ of the PAM are negative, and bases 3′ of the N are positive) of either target site 1 or target site 2 for each gene (Table S1) to the nearest nucleosome dyad base pair as determined by MNase-seq for WA01 (Yazdi et al., 2015).
Resource Distribution
Clones derived from CSMD1, CYFIP1, FXR1, SYNGAP1, TLE1, and TLE3 editing experiments performed in WA01 with no chromosomal aberrations detected are available upon request (Table S4). All sgRNA plasmids created in this study are available via Addgene (91573–91688). The plasmid pPN234 contains an additional 21 bp in the scaffold region, of which 18 bp directly match the CYFIP1 target sequence, indicating that pPN234 contains a dimeric target sequence that will express a longer, dimeric sgRNA.
Author Contributions
K.E. and L.E.B. conceived the study and L.E.B., K.E., and D.Z.H. analyzed and interpreted data and wrote the manuscript with input from all co-authors. D.Z.H., A.B., A.M., and D.M. performed the CRISPR-Cas9 design, cloning, and screening experiments. A.M.B., N.D., P.M., and D.L. performed the cell biology experiments.
Authors: F Ann Ran; Patrick D Hsu; Chie-Yu Lin; Jonathan S Gootenberg; Silvana Konermann; Alexandro E Trevino; David A Scott; Azusa Inoue; Shogo Matoba; Yi Zhang; Feng Zhang Journal: Cell Date: 2013-08-29 Impact factor: 41.582
Authors: Anindita Basak; Miroslava Hancarova; Jacob C Ulirsch; Tugce B Balci; Marie Trkova; Michal Pelisek; Marketa Vlckova; Katerina Muzikova; Jaroslav Cermak; Jan Trka; David A Dyment; Stuart H Orkin; Mark J Daly; Zdenek Sedlacek; Vijay G Sankaran Journal: J Clin Invest Date: 2015-05-04 Impact factor: 14.808
Authors: Xuebing Wu; David A Scott; Andrea J Kriz; Anthony C Chiu; Patrick D Hsu; Daniel B Dadon; Albert W Cheng; Alexandro E Trevino; Silvana Konermann; Sidi Chen; Rudolf Jaenisch; Feng Zhang; Phillip A Sharp Journal: Nat Biotechnol Date: 2014-04-20 Impact factor: 54.908
Authors: Ben S Pickard; M Pat Malloy; Leanne Clark; Stéphanie Lehellard; Henrik L Ewald; Ole Mors; David J Porteous; Douglas H R Blackwood; Walter J Muir Journal: Psychiatr Genet Date: 2005-03 Impact factor: 2.458
Authors: Bin Shen; Wensheng Zhang; Jun Zhang; Jiankui Zhou; Jianying Wang; Li Chen; Lu Wang; Alex Hodgkins; Vivek Iyer; Xingxu Huang; William C Skarnes Journal: Nat Methods Date: 2014-03-02 Impact factor: 28.547
Authors: Davide Cacchiarelli; Cole Trapnell; Michael J Ziller; Magali Soumillon; Marcella Cesana; Rahul Karnik; Julie Donaghey; Zachary D Smith; Sutheera Ratanasirintrawoot; Xiaolan Zhang; Shannan J Ho Sui; Zhaoting Wu; Veronika Akopian; Casey A Gifford; John Doench; John L Rinn; George Q Daley; Alexander Meissner; Eric S Lander; Tarjei S Mikkelsen Journal: Cell Date: 2015-07-16 Impact factor: 41.582
Authors: Brian J O'Roak; Laura Vives; Wenqing Fu; Jarrett D Egertson; Ian B Stanaway; Ian G Phelps; Gemma Carvill; Akash Kumar; Choli Lee; Katy Ankenman; Jeff Munson; Joseph B Hiatt; Emily H Turner; Roie Levy; Diana R O'Day; Niklas Krumm; Bradley P Coe; Beth K Martin; Elhanan Borenstein; Deborah A Nickerson; Heather C Mefford; Dan Doherty; Joshua M Akey; Raphael Bernier; Evan E Eichler; Jay Shendure Journal: Science Date: 2012-11-15 Impact factor: 47.728
Authors: Pablo Perez-Pinera; D Dewran Kocak; Christopher M Vockley; Andrew F Adler; Ami M Kabadi; Lauren R Polstein; Pratiksha I Thakore; Katherine A Glass; David G Ousterout; Kam W Leong; Farshid Guilak; Gregory E Crawford; Timothy E Reddy; Charles A Gersbach Journal: Nat Methods Date: 2013-07-25 Impact factor: 28.547
Authors: Ralda Nehme; Olli Pietiläinen; Mykyta Artomov; Matthew Tegtmeyer; Vera Valakh; Leevi Lehtonen; Christina Bell; Tarjinder Singh; Aditi Trehan; John Sherwood; Danielle Manning; Emily Peirent; Rhea Malik; Ellen J Guss; Derek Hawes; Amanda Beccard; Anne M Bara; Dane Z Hazelbaker; Emanuela Zuccaro; Giulio Genovese; Alexander A Loboda; Anna Neumann; Christina Lilliehook; Outi Kuismin; Eija Hamalainen; Mitja Kurki; Christina M Hultman; Anna K Kähler; Joao A Paulo; Andrea Ganna; Jon Madison; Bruce Cohen; Donna McPhie; Rolf Adolfsson; Roy Perlis; Ricardo Dolmetsch; Samouil Farhi; Steven McCarroll; Steven Hyman; Ben Neale; Lindy E Barrett; Wade Harper; Aarno Palotie; Mark Daly; Kevin Eggan Journal: Nat Commun Date: 2022-06-27 Impact factor: 17.694
Authors: Sara G Susco; Sulagna Ghosh; Patrizia Mazzucato; Gabriella Angelini; Amanda Beccard; Victor Barrera; Martin H Berryer; Angelica Messana; Daisy Lam; Dane Z Hazelbaker; Lindy E Barrett Journal: Cell Rep Date: 2022-09-06 Impact factor: 9.995