Literature DB >> 25830891

In vivo genome editing using Staphylococcus aureus Cas9.

F Ann Ran¹, Le Cong², Winston X Yan³, David A Scott⁴, Jonathan S Gootenberg⁵, Andrea J Kriz⁶, Bernd Zetsche⁷, Ophir Shalem⁷, Xuebing Wu⁸, Kira S Makarova⁹, Eugene V Koonin⁹, Phillip A Sharp¹⁰, Feng Zhang¹¹.

Abstract

The RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform. However, the size of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits its utility for basic research and therapeutic applications that use the highly versatile adeno-associated virus (AAV) delivery vehicle. Here, we characterize six smaller Cas9 orthologues and show that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome with efficiencies similar to those of SpCas9, while being more than 1 kilobase shorter. We packaged SaCas9 and its single guide RNA expression cassette into a single AAV vector and targeted the cholesterol regulatory gene Pcsk9 in the mouse liver. Within one week of injection, we observed >40% gene modification, accompanied by significant reductions in serum Pcsk9 and total cholesterol levels. We further assess the genome-wide targeting specificity of SaCas9 and SpCas9 using BLESS, and demonstrate that SaCas9-mediated in vivo genome editing has the potential to be efficient and specific.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2015 PMID： 25830891 PMCID： PMC4393360 DOI： 10.1038/nature14299

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Introduction

Cas9, an RNA-guided endonuclease derived from the Type II CRISPR-Cas bacterial adaptive immune system[1-7], has been harnessed for genome editing[8,9] and holds tremendous promise for biomedical research. Genome editing of somatic tissue in post-natal animals, however, has been limited in part by the challenge of delivering Cas9 in vivo. For this purpose, adeno-associated virus (AAV) vectors are attractive vehicles[10] because of their low immunogenic potential, reduced oncogenic risk from host-genome integration[11], and broad-range of serotype specificity[12-15]. Nevertheless, the restrictive cargo size (~4.5kb) of AAV presents an obstacle for packaging the commonly used Streptococcus pyogenes Cas9 (SpCas9, ~4.2kb) and its sgRNA in a single vector; although technically feasible[16,17], this approach leaves little room for customized expression and control elements. In search of smaller Cas9 enzymes for efficient in vivo delivery by AAV, we have previously described a short Cas9 from the CRISPR1 locus of Streptococcus thermophilus LMD-9 (St1Cas9, ~3.3kb)[8] as well as a rationally-designed truncated form of SpCas9[18] for genome editing in human cells. However, both systems have important practical drawbacks: the former requires a complex Protospacer-Associated Motif (PAM) sequence (NNAGAAW)[3], which restricts the range of accessible targets, whereas the latter exhibits reduced activity. Given the substantial diversity of CRISPR-Cas systems present in sequenced microbial genomes[19], we therefore sought to interrogate and discover additional Cas9 enzymes that are small, efficient, and broadly targeting.

In vitro cleavage by small Cas9s

Type II CRISPR-Cas systems require only two main components for eukaryotic genome editing: a Cas9 enzyme, and a chimeric single guide RNA (sgRNA)[6] derived from the CRISPR RNA (crRNA) and the noncoding trans-activating crRNA (tracrRNA)[4,20]. Analysis of over 600 Cas9 orthologs shows that these enzymes are clustered into two length groups with characteristic protein sizes of approximately 1350aa and 1000aa residues, respectively[19,21] (Extended Data Fig. 1a), with shorter Cas9s having significantly truncated REC domains (Fig. 1a). From these shorter Cas9s, which belong to Type IIA and IIC subtypes, we selected six candidates for profiling (Fig. 1a and Extended Data Fig. 1b). To determine the cognate crRNA and tracrRNA for each Cas9, we computationally identified regularly interspaced repeat sequences (direct repeats) within a 2-kb window flanking the CRISPR locus. We then predicted the tracrRNA by detecting sequences with strong complementarity to the direct repeat sequence (an anti-repeat region), at least two predicted stem-loop structures, and a Rho-independent transcriptional termination signal up to 150-nt downstream of the anti-repeat region. Although a truncated tracrRNA can support robust DNA cleavage in vitro[6], previous reports show that the secondary structures of the tracrRNA are important for Cas9 activity in mammalian cells[8,9,18,22]. Therefore, we designed sgRNA scaffolds for each ortholog by fusing the 3′ end of a truncated direct repeat with the 5′ end of the corresponding tracrRNA, including the full-length tail, via a 4-nt linker[6] (Extended Data Fig. 1b and Supplementary Table 1). To identify the PAM sequence for each Cas9, we first constructed a library of plasmid DNA containing a constant 20-bp target followed by a degenerate 7-bp sequence (5′-NNNNNNN). We then incubated cell lysate from human embryonic kidney 293FT (293FT) cells expressing the Cas9 ortholog with its in vitro transcribed sgRNA and the plasmid library. By generating a consensus from the 7-bp sequence found on successfully cleaved DNA plasmids (Fig. 1b), we determined putative PAMs for each Cas9 (Fig. 1c). We observed that, similar to SpCas9, most Cas9 orthologs cleaved targets 3-bp upstream of the PAM (Extended Data Fig. 2). To validate each putative PAM from the library, we then incubated a DNA template bearing the consensus PAM with cell lysate and the corresponding sgRNA. We found that the Cas9 orthologs, in combination with the sgRNA designs, successfully cleaved the appropriate targets (Fig. 1d and Supplementary Table 2).

Extended Data Figure 1

Selection of Type II CRISPR-Cas loci from eight bacterial species

a, Distribution of lengths for Cas9 >600 Cas9 orthologs[19]. b, Schematic of Type II CRISPR-Cas loci and sgRNA from eight bacterial species. Spacer or “guide” sequences are shown in blue, followed by direct repeat (gray). Predicted tracrRNAs are shown in red, and folded based on the Constraint Generation RNA folding model[46].

Figure 1

Biochemical screen for small Cas9 orthologs

a, Phylogenetic tree of selected Cas9 orthologs. Subfamily and sizes (amino acids) are indicated, with nuclease domains highlighted in colored boxes, and conserved sequences in black. b, Schematic illustration of the in vitro cleavage-based method used to identify the first seven positions (5′-NNNNNNN) of protospacer adjacent motifs (PAMs). c, Consensus PAMs for eight Cas9 orthologs from sequencing of cleaved fragments. Error bars are Bayesian 95% confidence interval[45]. d, Cleavage using different orthologs and sgRNAs targeting loci bearing the putative PAMs (consensus shown in red). Red triangles indicate cleavage fragments.

Extended Data Figure 2

Cas9 ortholog cleavage pattern in vitro

Stacked bar graph indicates the fraction of targets cleaved at 2, 3, 4, or 5-bp upstream of PAM for each Cas9 ortholog; most Cas9s cleave stereotypically at 3-bp upstream of PAM (red triangle).

To test whether each Cas9 ortholog can facilitate genome editing in mammalian cells, we co-transfected 293FT cells with individual Cas9s and their respective sgRNAs targeting human endogenous loci containing the appropriate PAMs. Of the six Cas9 orthologs tested, only the one from Staphylococcus aureus (SaCas9) produced indels with efficiencies comparable to those of SpCas9 (Extended Data Fig. 3a, b and Supplementary Table 3), suggesting that DNA-cleavage activity in cell-free assays does not necessarily reflect the activity in mammalian cells. These observations prompted us to focus on harnessing SaCas9 and its sgRNA for in vivo applications.

Extended Data Figure 3

Test of Cas9 ortholog activity in 293FT cells

a, SURVEYOR assays showing indel formation at human endogenous loci from co-transfection of Cas9 orthologs and sgRNA. PAM sequences for individual targets are shown above each lane, with the consensus region for each PAM highlighted in red. Red triangles indicate cleaved fragments. b, SaCas9 generates indels efficiently for a multiple targets. c, Box-whisker plot of indel formation as a function of SaCas9 guide length L, with unaltered guides (perfect match of L nucleotides, gray bars) or replacement of the 5′-most base of guide with guanine (G + L −1 nucleotides, blue bars) (n = 8 guides).

SaCas9 sgRNA design and PAM discovery

Although mature crRNAs in S. pyogenes are processed to contain 20-nt spacers (guides) and 19- to 22-nt direct repeats[4], RNA sequencing of crRNAs from other organisms reveals that the spacer and direct repeat sequence lengths can vary[4,20,23]. We therefore tested sgRNAs for SaCas9 with variable guide lengths and repeat:anti-repeat duplexes. We found that SaCas9 achieves the highest editing efficiency in mammalian cells with guides between 21- to 23-nt long and can accommodate a range of lengths for the direct repeat:anti-repeat region (Fig. 2a, b, Extended Data Fig. 4). This notably contrasts with SpCas9, where the natural 20-nt guide length can be truncated to 17-nt without significantly compromising nuclease activity, while increasing specificity[24]. Additionally, replacing the first base of the guide with guanine further improved SaCas9 activity (Extended Data Fig. 3c).

Figure 2

Characterization of Staphylococcus aureus Cas9 (SaCas9) in 293FT cells

a, SaCas9 sgRNA scaffold (red) and guide (blue) base-pairing at target locus (black) immediately 5′ of PAM. b, Box-whisker plot showing indels vary depending on the length of the guide sequence (n=4). c, dSaCas9-ChIP reveals peaks associated with seed + PAM. Text to the right indicates the total number of peaks and percentage containing significant (FDR < 0.1) match to the guide motif followed by NNGRRT PAMs. d, Pooled indel values for NNGRR(A), (C), (G), or (T) PAM combinations (n=12, 21, 39, and 44 respectively).

Extended Data Figure 4

Optimization of SaCas9 sgRNA scaffold in mammalian cells

a, Schematic of the Staphylococcus aureus subspecies aureus CRISPR locus. b, Schematic of SaCas9 sgRNA with 21-nt guide, crRNA repeat (gray), tetraloop (black) and tracrRNA (red). The number of crRNA repeat to tracrRNA anti-repeat base-pairing is indicated above the gray boxes. SaCas9 cleaves targets with varying repeat:anti-repeat lengths in c, HEK 293FT and d, Hepa1-6 cell lines. (n=3, error bars show S.E.M.)

To fully characterize the SaCas9 PAM and the seed region within its guide sequence[25], we performed ChIP using catalytically mutant forms of SaCas9 (dSaCas9, D10A and N580A mutations, based on homology to SpCas9) or SpCas9 (dSpCas9, D10A and H840A mutations) and their corresponding sgRNAs. We targeted two loci in the human EMX1 gene with composite NGGRRT PAMs, which allow targeting by both Cas9 variants. A search for motifs containing both the guide region and PAM within 50-nt of the ChIP peak summits revealed seed sequences of 7–8 nt for dSaCas9 (Fig. 2c). In addition, NNGRRT and NGG PAMs were found adjacent to the seed sequences for dSaCas9 and dSpCas9, respectively (Extended Data Fig. 5). Although the 6th position of the PAM is predominantly thymine, we did observe low levels of degeneracy in both the biochemical and ChIP-based PAM discovery assays (Fig. 1c and Extended Data Fig. 5a). We therefore tested the base preference for this position and determined that although SaCas9 cleaves genomic targets most efficiently with NNGRRT, all NNGRR PAMs can be cleaved and should be considered as potential targets, especially in the context of off-target evaluations (Fig. 2d, Extended Data Fig. 6, and Supplementary Table 4).

Extended Data Figure 5

Genome-wide binding by Cas9-chromatin immunoprecipitation (dCas9-ChIP)

a, Unbiased identification of PAM motif for dSaCas9 and dSpCas9. Peaks were analyzed for the best match by motif score to the guide region only within 50-nt of the peak summit. The alignment extended for 10-nt at the 3′ end and visualized using Weblogo. Numbers in parentheses indicate the number of called peaks. b, Histograms show the distribution of the peak summit relative to motif for dSaCas9 and dSpCas9. Position 1 on x-axis indicates the first base of PAM.

Extended Data Figure 6

Indel measurements at candidate off-target sites based on ChIP

Indels at top off-target sites predicted by dCas9-ChIP for each Cas9 and sgRNA pair, based on ChIP peaks ranked by sequence similarity of the genomic loci to the guide motif (heatmap in purple), or p-value of ChIP enrichment over control (heatmap in red). Lines connect the common targets (EMX1) and off-targets between the two Cas9s.

Unbiased profiling of Cas9 specificity

As advances in Cas9 technology promise to enable a broad range of in vivo and therapeutic applications, accurate, genome-wide identification of off-target nuclease activity has become increasingly important. Although a number of studies have employed sequence similarity-based off-target search[22,26-30] or dCas9-ChIP[31,32] to predict off-target sites for Cas9, such approaches cannot assess the nuclease activity of Cas9 in a comprehensive and unbiased manner. To directly measure the genome-wide cleavage activity of SaCas9 and SpCas9, we applied BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing)[33] to capture Cas9-induced DNA double-stranded breaks (DSBs) in cells. We transfected 293FT cells with SaCas9 or SpCas9 and the same EMX1 targeting guides used in the previous ChIP experiment, or pUC19 as negative controls. After cells are fixed, free genomic DNA ends from DSBs are captured using biotinylated adaptors and analyzed by deep sequencing (Fig. 3a). To identify candidate Cas9-induced DSB sites genome-wide, we established a three-step analysis pipeline following alignment of the sequenced BLESS reads to the genome (Extended Data Fig. 7a, Supplementary Discussion). First, we applied nearest-neighbor clustering on the aligned reads to identify groups of DSBs (DSB clusters) across the genome. Second, we sought to separate potential Cas9-induced DSB clusters from background DSB clusters resulting from low frequency biological processes and technical artifacts, as well as high frequency telomeric and centromeric DSB hotspots[33]. From the on-target and a small subset of verified off-target sites (predicted by sequence similarity using a previously established method[22] and sequenced to detect indels), we found that reads in Cas9-induced DSB clusters mapped to characteristic, well-defined genomic positions compared to the more diffuse alignment pattern at background DSB clusters. To distinguish between the two types of DSB clusters, we calculated in each cluster the distance between all possible pairs of forward and reverse-oriented reads (corresponding to 3′ and 5′ ends of DSBs), and filtered out the background DSB clusters based on the distinctive pairwise-distance distribution of these clusters (Extended Data Fig. 7b, c). Third, the DSB score for a given locus was calculated by comparing the count of DSBs in the experimental and negative control samples using a maximum-likelihood estimate (MLE)[22] (Supplementary Discussion). This analysis identified the on-target loci for both SaCas9 and SpCas9 guides as the top scoring sites, and revealed additional sites with high DSB scores (Fig. 3b–d).

Figure 3

Characterization of genome-wide nuclease activity of SaCas9 and SpCas9

a, Schematic of BLESS processing steps. b, Manhattan plots of genome-wide DSB clusters generated by each Cas9 and sgRNA pair, with on-target loci shown above. c, Correlation between DSB scores and indel levels for top-scoring DSB clusters. Trendlines, r, and p-values are calculated using ordinary least squares. d, Off-target loci from BLESS with detectable indels through targeted deep sequencing (n=3) are shown. Heatmaps indicate DSB score (blue), motif score from ChIP (purple), or sequence similarity score (green) for each locus. Blue triangles indicate peak positions of BLESS signal.

Extended Data Figure 7

Analysis pipeline of sequencing data from BLESS

a, Overview of the data analysis pipeline starting from the raw sequencing reads. Representative sequencing read mappings and corresponding histograms of the pairwise distances between all the forward orientation (red) reads and reverse orientation (blue) reads, displayed for representative b, DSB hotspots and poorly-defined DSB sites and c, Cas9 induced DSBs with detectable indels. Fraction of pairwise distances between reads overlapping by no more than 6bp (dashed vertical line) are indicated over histogram plots.

Next, we sought to assess whether DSB scores correlated with indel formation. We used targeted deep sequencing to detect indel formation on the ~30 top-ranking off-target sites identified by BLESS for each Cas9 and sgRNA combination. We found that only those sites that contained PAM and homology to the guide sequence exhibited indels (Extended Data Fig. 8). We observed a strong linear correlation between DSB scores and indel levels for each Cas9 and sgRNA pairing (r2 = 0.948 and 0.989 for the two EMX1 targets with SaCas9 and r2 = 0.941 and 0.753 for those with SpCas9) (Fig. 3c, Extended Fig. 9b–d). Furthermore, BLESS identified additional off-target sites not previously predicted by sequence similarity to target or ChIP (Extended Data Fig. 7 and 9, Supplementary Tables 5 and 6). These new off-target sites include not only those containing Watson-Crick base-pairing mismatches to the guide, but also the recently reported insertion and deletion mismatches in the guide:target heteroduplex (Fig. 3d)[29,30]. Together, these results highlight the need for more precise understanding of rules governing Cas9 nuclease activity, a requisite step towards improving the predictive power of computational guide design programs.

Extended Data Figure 8

Indel measurements at off-target sites based on DSB scores

List of top off-target sites ranked by DSB scores for each Cas9 and sgRNA pair. Indel levels are determined by targeted deep sequencing. Blue triangles indicate positions of peak BLESS signal, and where present, PAMs and targets with sequence homology to the guide are highlighted. Lines connect the common on-targets (EMX1) and off-targets between the two Cas9s. N.D. not determined.

Extended Data Figure 9

Indel measurements of top candidate off-target sites based on sequence similarity score

Off-targets are predicted based on sequence similarity to on-target, accounting for number and position of Watson-Crick base-pairing mismatches as previously described[22]. NNGRR and NRG are used as potential PAMs for SaCas9 and SpCas9, respectively. Lines connect the common targets (EMX1) and off-targets between the two Cas9s. Correlation plots between indel percentages and b, prediction based on sequence similarity, c, ChIP peaks ranked by motif similarity, or d, DSB scores for top ranking off-target loci. Trendlines, r, and p-values are calculated using ordinary least squares.

In vivo genome editing using SaCas9

Following in vitro characterization, we incorporated SaCas9 and its sgRNA into an AAV vector to test its efficacy and specificity in vivo. The small size of SaCas9 enables packaging of both a U6-driven sgRNA and a CMV- or TBG-driven SaCas9 expression cassette into a single AAV vector within the 4.5kb packaging limit. Using hepatocyte-tropic AAV serotype 8, we targeted the mouse apolipoprotein (Apob) gene (Extended Data Fig. 10a). One week after intravenous administration of virus into C57BL/6 mice, we observed ~5% indel formation in liver tissue; after four weeks, the liver tissue showed characteristic hepatic lipid accumulation from Apob knockdown following histology analysis using oil red staining[34-37] (Extended Data Fig. 10b, c).

Extended Data Figure 10

SaCas9 targeting Apob locus in the mouse liver

a, Schematics illustrating the mouse Apob gene locus and the positions of the three guides tested. b, Experimental time course and c, SURVEYOR assay showing indel formation at target loci after intravenous injection of AAV2/8 carrying thyroxine-binding globulin (TBG) promoter-driven SaCas9 and U6-driven guide at 2E11 total genome copies (n = 1 animal each). d, Oil-red staining of liver tissue from AAV- or saline-injected animals. Male C56BL/6 mice were injected at 8 weeks of age and analyzed 4 weeks post injection.

We next targeted proprotein convertase subtilisin/kexin type 9 (Pcsk9), a therapeutically relevant gene involved in cholesterol homeostasis[38]. Inhibitors of the human convertase PCSK9 have emerged as a promising new class of cardioprotective drugs after human genetic studies revealed that loss of PCSK9 is associated with a reduced risk of cardiovascular disease and lower levels of LDL cholesterol[39-41]. We designed two Pcsk9-targeting sgRNAs and validated their activity in vitro. Each sgRNA was packaged into AAV-SaCas9 and injected into mice (2E11 total genome copies) (Fig. 4a). One week after administration, we observed greater than 40% indel formation at either locus in whole liver tissue, with similar levels two and four weeks post-injection (Fig. 4b). To determine the effect of Pcsk9-targeting AAV-SaCas9 dosage on serum Pcsk9 and total cholesterol levels, we administered a range of AAV titers from 0.5E11 to 4E11 total genome copies. With all titers, we observed a ~95% decrease in serum Pcsk9 and a ~40% decrease in total cholesterol one week after administration, both of which were sustained throughout the course of four weeks (Fig. 4c, d).

Figure 4

AAV-delivery of SaCas9 for in vivo genome editing

a, Single-vector AAV system and experimental timeline. b, Indels at Pcsk9 targets in liver tissue following injection of AAV at 2E11 total genome copies (n=3 animals). Time course of c, serum Pcsk9 and d, total cholesterol in animals (n=3 for all titers and time points, error bars show S.E.M.). e, Manhattan plots of BLESS-identified DSB clusters in N2a cells. Inset indicates indel levels at top DSB scoring loci. f, Indels in liver tissue (n=3 animals, error bars indicate Wilson intervals) at BLESS-identified off-target loci. Heatmap indicates DSB scores.

Given the importance of targeting specificity in a therapeutic context, we next considered SaCas9 off-target modifications in vivo. To identify candidate off-target cleavage sites for the two Pcsk9-targeting guides, we transiently transfected an AAV-CMV::SaCas9 vector into mouse Neuroblastoma-2a (N2a) cells and applied BLESS to detect Cas9-induced DSBs in the genome. For both guides, we found very low levels of DSB signal across the genome except at the on-target locus (Fig. 4e). Targeted deep sequencing of the candidate off-target sites identified by BLESS in N2a cells did not reveal appreciable levels of indels in either N2a cells or liver tissue (4 weeks post injection of 2E11 total genome copies) (Fig. 4e, f and Supplementary Table 8). We additionally sequenced off-target sites predicted by target sequence similarity, and likewise did not detect indel formations (Supplementary Table 9). Finally, we examined the titer-matched Pcsk9-targeting and TBG-GFP cohorts as well as naïve animals for signs of toxicity or acute immune response. At 1 week post-injection, necropsy and gross examination of liver tissue of the cohorts revealed no abnormalities; further histological examination of the liver by hematoxylin and eosin (H&E) staining showed no signs of inflammation, such as aggregates of lymphocytes or macrophages (Fig. 5a). Throughout the time course of the experiment, there were no elevated levels of serum ALT, albumin, and total bilirubin in any of the cohorts. We observed a slight trend in AST increase across all cohorts at four weeks, including the un-injected animals. The elevated levels did not exceed the upper limit of normal and is not indicative of hepatocellular injury (Fig. 5b). However, a larger cohort study should be conducted to further evaluate the effects of in vivo toxicity.

Figure 5

Liver function tests and toxicity examination in injected animals

a., Histological analysis of the liver at 1-week post-injection by H&E stain. Scale bar = 10μm. b, Liver function tests in Pcsk9-targeted (both Pcsk9-sg1 and Pcsk9-sg2; 2E11 total genome copies, n ≥ 4), TBG::EGFP injected (2E11 total genome copies, n=3), and un-injected (n=5) animals. Dashed lines show the upper and lower ranges of normal value in mice where applicable.

Discussion

Here, we develop a small and efficient Cas9 from S. aureus for in vivo genome editing[17]. The results of these experiments highlight the power of using comparative genomic analysis[19,42] in expanding the CRISPR-Cas toolbox. Identification of new Cas9 orthologs[19,42], in addition to structure-guided engineering, could yield a repertoire of Cas9 variants with expanded capabilities and mimized molecular weight, for nucleic acid manipulation to further advance genome and epigenome engineering. The AAV-SaCas9 system is able to mediate efficient and rapid editing of Pcsk9 in the mouse liver, resulting in reductions of serum Pcsk9 and total cholesterol levels. To assess the specificity of SaCas9, we used an unbiased DSB detection method, BLESS, to identify a list of candidate off-target cleavage sites in mouse cells. We examined these sites in liver tissue transduced by AAV-SaCas9 and did not observe any indel formation within the detection limits of targeted deep sequencing. However, the off-target sites identified in vitro might differ from those in vivo, which need to be further evaluated by the in vivo applications of BLESS or other unbiased techniques such as those published during the revision of this work[43,44]. Finally, we did not observe any overt signs of acute toxicity at one to four weeks post virus administration. Although further studies are needed to further improve the SaCas9 system for in vivo genome editing, such as assessing the long-term impact of Cas9 and sgRNA expression, these findings suggest that in vivo genome editing using SaCas9 has the potential to be highly efficient, specific, and well-tolerated.

Methods

In vitro transcription and cleavage assay

Cas9 orthologs were human codon-optimized and synthesized by GenScript, and transfected into 293FT cells as described below. Whole cell lysates from 293FT cells were prepared with lysis buffer (20 mM HEPES, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 0.1% Triton X-100) supplemented with Protease Inhibitor Cocktail (Roche). T7-driven sgRNA was transcribed in vitro using custom oligos (Supplementary Information) and HiScribe T7 In vitro Transcription Kit (NEB), following the manufacturer’s recommended protocol. The in vitro cleavage assay was carried out as follows: for a 20 μl cleavage reaction, 10 μl of cell lysate was incubated with 2 μl cleavage buffer (100 mM HEPES, 500 mM KCl, 25 mM MgCl2, 5 mM DTT, 25% glycerol), 1 μg in vitro transcribed RNA and 200 ng EcoRI-linearized pUC19 plasmid DNA or 200 ng purified PCR amplicons from mammalian genomic DNA containing target sequence. After 30 min incubation, cleavage reactions were purified using QiaQuick Spin Columns and treated with RNase A at final concentration of 80 ng/μl for 30 min and analyzed on a 1% Agarose E-Gel (Life Technologies).

In vitro PAM screen

Rho-independent transcriptional termination was predicted using the ARNold terminator search tool[47,48]. For the PAM library, a degenerate 7-bp sequence was cloned into a pUC19 vector. For each ortholog, the in vitro cleavage assay was carried out as above with 1 μg T7-transcribed sgRNA and 400 ng pUC19 with degenerate PAM. Cleaved plasmids were linearized by NheI, gel extracted, and ligated with Illumina sequencing adaptors. Barcoded and purified DNA libraries were quantified by Quant-iT PicoGreen dsDNA Assay Kit or Qubit 2.0 Fluorometer (Life Technologies) and pooled in an equimolar ratio for sequencing using the Illumina MiSeq Personal Sequencer (Life Technologies). MiSeq reads were filtered by requiring an average Phred quality (Q score) of at least 23, as well as perfect sequence matches to barcodes. For reads corresponding to each ortholog, the degenerate region was extracted. All extracted regions were then grouped and analyzed with Weblogo[45].

Cell culture and transfection

Human embryonic kidney 293FT (Life Technologies), Neuro-2a (N2a), and Hepa1-6 (ATCC) cell lines were maintained in Dulbecco’s modified Eagle’s Medium (DMEM) supplemented with 10% FBS (HyClone), 2 mM GlutaMAX (Life Technologies), 100 U/ml penicillin, and 100 μg/ml streptomycin at 37 °C with 5% CO2 incubation. Cells were seeded into 24-well plates (Corning) one day prior to transfection at a density of 240,000 cells per well, and transfected at 70–80% confluency using Lipofectamine 2000 (Life Technologies) following the manufacturer’s recommended protocol. For each well of a 24-well plate, a total of 500 ng DNA was used. For ChIP and BLESS, a total of 4.5 million cells are seeded the day before transfection into a 100mm plate, and a total of 20 ug DNA was used.

DNA isolation from cells and tissue

Genomic DNA was extracted using the QuickExtract DNA Extraction Solution (Epicentre). Briefly, pelleted cells were resuspended in QuickExtract solution and incubated at 65 °C for 15 min, 68 °C for 15 min, and 98 °C for 10 min[8]. Genomic liver DNA was extracted from bulk tissue fragments using a microtube bead mill homogenizer (Beadbug, Denville Scientific) by homogenizing approximately 30–50 mg of tissue in 600 μL of DPBS (Gibco). The homogenate was then centrifuged at 2000 to 3000×g for 5 minutes at 4°C and the pellet was resuspended in 300–600 μL QuickExtract DNA Extraction Solution (Epicentre) and incubated as above.

Indel analysis and guide:target basepairing mismatch search

Indel analyses by SURVEYOR assay and targeted deep sequencing were carried out and analyzed as previously described[8,22]. The methods for identification of potential off-target sites for SpCas9 based on Watson-Crick base-pairing mismatch between guide RNA and target DNA has been previously described[22], and adapted for SaCas9 by considering NNGRR for possible off-target PAMs.

Chromatin immunoprecipitation and analysis

Cells are passaged at 24 hours post-transfection into a 150mm dish, and fixed for ChIP processing at 48 hours post-transfection. For each condition, 10 million cells are used for ChIP input, following experimental protocols and analyses as previously described[31] with the following modifications: instead of pairwise peak-calling, ChIP peaks were only required to be enriched over both ‘empty’ controls (dSpCas9 only, dSaCas9 only) as well as the other Cas9/other sgRNA sample (e.g., SpCas9/EMX-sg2 peaks must be enriched over SaCas9/EMX-sg1 peaks in addition to the empty controls). This was done to avoid filtering out of real peaks present in two related samples as much as possible. To identify off-targets ranked by motif or sequence similarity to guide, motif scores for ChIP peaks were calculated as follows: For a given ChIP peak, the 100-nt interval around the peak summit, the target sequence, and a given sgRNA guide region L, the query, an alignment score is calculated for every subsequence of length L in the target. The subsequence with the highest score is reported as the best match to the query. For each subsequence alignment, the score calculation begins at the 5′ end of the query. For each position in the alignment, 1 is added or subtracted for match or mismatch between the query and target, respectively. If the score becomes negative, it is set to 0 and the calculation continued for the remainder of the alignment. The score at the 3′ end of the query is reported as the final score for the alignment. MACS scores = −10log(p-value relative to the empty control) are determined as previously described[49]. For unbiased determination of PAM from ChIP peaks, the peaks were analyzed for the best match by motif score to the guide region only within 50-nt of the peak summit; the alignment was extended for 10-nt at the 3′ end and visualized using Weblogo[45]. To calculate the motif score threshold at which FDR < 0.1 for each sample, 100-nt sequences centered around peak summits were shuffled while preserving dinucleotide frequency. The best match by motif score to the guide+PAM (NGG for SpCas9, NNGRRT for SaCas9) in these shuffled sequences was then found. The score threshold for FDR < 0.1 was defined as the score such that less than 10% of shuffled peaks had a motif score above that score threshold.

BLESS for DSB detection

Cells are harvested at 24 hours post-transfection, then processed as previously described[33] with the following alterations: a total of 10 million cells are fixed for nuclei isolation and permeabilization, and treated with Proteinase K for 4 min at 37°C before inactivation with PMSF. All deproteinized nuclei are used for DSB labeling with 100 mM of annealed proximal linkers overnight. After Proteinase K digestion of labeled nuclei, chromatin are mechanically sheared with a 26G needle before sonication (BioRuptor, 20 min on High, 50% duty cycle). 20 ug of sheared chromatin are captured on streptavidin beads, washed, and ligated to 200 mM of distal linker. Linker hairpins are then cleaved off with I-SceI digestion for 1 hour at 37°C, and products PCR-enriched for 18 cycles before proceeding to library preparation with TruSeq Nano LT Kit (Illumina). For the negative control, cells mock transfected with Lipofectamine 2000 and pUC19 DNA were parallel processed through the assay.

BLESS Analysis

Fastq files were demultiplexed, and 30-bp genomic sequences were separated from the BLESS ligation handles for alignment. Bowtie was used to map the genomic sequences to hg19 or mm9, allowing for a maximum of 2 mismatches. Following alignment, reads from all bio-replicates for an individual sample were first pooled, and then nearest neighbor clustering was performed with a 30-bp moving window to identify regions of enrichment across the genome. Within each cluster, the pairwise distance was calculated between all forward and reverse read strand mappings (Extended Data Figure 7b, c). Pairwise distance distributions were used to filter out wide and poorly-defined DSB clusters from the well-defined DSB clusters characteristically found at Cas9-induced cleavage sites (see Supplementary Information). Finally, we adjusted the count of predicted Cas9-induced DSBs at a given locus by using a binomial model to calculate the maximum-likelihood estimate (MLE) of peak enrichment in the Cas9-sgRNA treated sgRNAs given BLESS measurements from an untreated negative control. After the MLE calculation, a list of loci ranked by their DSB scores could be obtained and plotted (Figure 3b, Extended Data Figure 8). Additional descriptions can be found in Supplementary Information. The top-ranking ~30 sites from the list of Cas9 induced DSB clusters were sequenced for indel formation (Extended Data Figure 8; validated targets in Figure 3d). Within these loci, PAMs and regions of target homology were identified by first searching all PAM sites within a ±50 bp window around the DSB cluster, then selecting the adjacent sequence with fewest mismatches to the target sequence.

Code Availability

BLESS analysis code is available upon request.

Virus Production and Titration

For in-house viral production, 293FT cells (Life Technologies) were maintained as described above in 150mm plates. For each transfection, 8 ug of pAAV8 serotype packaging plasmid, 10 ug of pDF6 helper plasmid, and 6 ug of AAV2 plasmid carrying the construct of interest were added to 1mL of serum-free DMEM. 125 μL of PEI “Max” solution (1mg/mL, pH = 7.1) was then added to the mixture and incubated at room temperature for 5 to 10 seconds. After incubation, the mixture was added to 20 mL of warm maintenance media and applied to each dish to replace the old growth media. Cells were harvested between 48h and 72h post transfection by scraping and pelleting by centrifugation. The AAV2/8 (AAV2 ITR vectors pseudo-typed with AAV8 capsid) viral particles were then purified from the pellet according to a previously published protocol[50]. High titer and purity viruses were also produced by vector core facilities at Children’s Hospital Boston and Massachusetts Eye and Ear Infirmary (MEEI). These AAV vectors were then titered by real-time qPCR using a customized TaqMan probe against the transgene, and all viral preparations were titer-matched across different batches and production facilities prior to experiments. The purity of AAV vector was further verified by SDS-PAGE.

Animal Injection and Processing

All mice cohorts were maintained at animal facility with standard diet and housing following IRB-approved protocols. AAV vector was delivered to 5–6 week old male C57/BL6 mice intravenously via lateral tail vein injection. All dosages of AAV were adjusted to 100 μL or 200 μL with sterile phosphate buffered saline (PBS), pH 7.4 (Gibco) before the injection. Animals were not immunosuppressed or otherwise handled differently prior to injection or during the course of the experiment except the pre-bleed fasting as noted below. The animals were randomized to the different experimental conditions, with the investigator not blinded to the assignments. To track the serum levels of Pcsk9 and total cholesterol, animals were fasted overnight for 12 hours prior to blood collection by saphenous vein bleeds (no more than 100 μL or 10% of total blood volume per week). Multiple bleeds were made prior to tail vein delivery of AAV vector or control to collect pre-injection samples and to habituate the animals to handling during the procedure. After the blood was allowed to clot at room temperature, the serum was separated by centrifugation and stored at −20°C for subsequent analysis. For terminal procedures to collect liver tissue and larger serum volumes for chemistry panels, mice were euthanized by carbon dioxide inhalation. Subsequently, blood was collected via cardiac puncture. Transcardial perfusion with 30 mL PBS removed the remaining blood, after which liver samples were collected. The median lobe of liver was removed and fixed in 10% neutral buffered formalin for histological analysis, while the remaining lobes were sliced in small blocks of size less than 1×1×3mm3 and frozen for subsequent DNA or protein extraction.

Histology and serum analysis

Following tissue harvesting as described above, flash-frozen mouse liver samples were embedded in O.C.T. compound (Tissue Tek, Cat # 4583), snap-frozen, and stored at −80°C prior to processing. Frozen tissues were cryosectioned at 4-micron in thickness and stained with Oil Red O following manufacturer’s recommended protocol. Liver histology was assessed by H&E staining sections of 10% neutral buffer formalin fixed liver sections. Serum levels of Pcsk9 were determined by ELISA using the Mouse Proprotein Convertase 9/PCSK9 Quantikine ELISA Kit (MPC-900, R&D Systems), following the manufacturer’s instructions. Total cholesterol levels were measured using the Infinity Cholesterol Reagent (Thermo Fisher) per the manufacturer’s instructions. Serum ALT, AST, albumin and total bilirubin were measured by an Olympus AU5400 (IDEXX Memphis, TN).

Selection of Type II CRISPR-Cas loci from eight bacterial species

Cas9 ortholog cleavage pattern in vitro

Stacked bar graph indicates the fraction of targets cleaved at 2, 3, 4, or 5-bp upstream of PAM for each Cas9 ortholog; most Cas9s cleave stereotypically at 3-bp upstream of PAM (red triangle).

Test of Cas9 ortholog activity in 293FT cells

Optimization of SaCas9 sgRNA scaffold in mammalian cells

Genome-wide binding by Cas9-chromatin immunoprecipitation (dCas9-ChIP)

Indel measurements at candidate off-target sites based on ChIP

Analysis pipeline of sequencing data from BLESS

Indel measurements at off-target sites based on DSB scores

Indel measurements of top candidate off-target sites based on sequence similarity score

SaCas9 targeting Apob locus in the mouse liver

50 in total

1. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA.

Authors: Josiane E Garneau; Marie-Ève Dupuis; Manuela Villion; Dennis A Romero; Rodolphe Barrangou; Patrick Boyaval; Christophe Fremaux; Philippe Horvath; Alfonso H Magadán; Sylvain Moineau
Journal: Nature Date: 2010-11-04 Impact factor: 49.962

Review 2. Therapeutic in vivo gene transfer for genetic disease using AAV: progress and challenges.

Authors: Federico Mingozzi; Katherine A High
Journal: Nat Rev Genet Date: 2011-05 Impact factor: 53.242

Review 3. State-of-the-art gene-based therapies: the road ahead.

Authors: Mark A Kay
Journal: Nat Rev Genet Date: 2011-04-06 Impact factor: 53.242

4. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence.

Authors: Ekaterina Semenova; Matthijs M Jore; Kirill A Datsenko; Anna Semenova; Edze R Westra; Barry Wanner; John van der Oost; Stan J J Brouns; Konstantin Severinov
Journal: Proc Natl Acad Sci U S A Date: 2011-06-06 Impact factor: 11.205

5. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.

Authors: Martin Jinek; Krzysztof Chylinski; Ines Fonfara; Michael Hauer; Jennifer A Doudna; Emmanuelle Charpentier
Journal: Science Date: 2012-06-28 Impact factor: 47.728

6. Mechanisms and optimization of in vivo delivery of lipophilic siRNAs.

Authors: Christian Wolfrum; Shuanping Shi; K Narayanannair Jayaprakash; Muthusamy Jayaraman; Gang Wang; Rajendra K Pandey; Kallanthottathil G Rajeev; Tomoko Nakayama; Klaus Charrise; Esther M Ndungo; Tracy Zimmermann; Victor Koteliansky; Muthiah Manoharan; Markus Stoffel
Journal: Nat Biotechnol Date: 2007-09-16 Impact factor: 54.908

7. Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection.

Authors: Carmela Zincarelli; Stephen Soltys; Giuseppe Rengo; Joseph E Rabinowitz
Journal: Mol Ther Date: 2008-04-15 Impact factor: 11.454

8. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.

Authors: Elitza Deltcheva; Krzysztof Chylinski; Cynthia M Sharma; Karine Gonzales; Yanjie Chao; Zaid A Pirzada; Maria R Eckert; Jörg Vogel; Emmanuelle Charpentier
Journal: Nature Date: 2011-03-31 Impact factor: 49.962

9. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli.

Authors: Rimantas Sapranauskas; Giedrius Gasiunas; Christophe Fremaux; Rodolphe Barrangou; Philippe Horvath; Virginijus Siksnys
Journal: Nucleic Acids Res Date: 2011-08-03 Impact factor: 16.971

10. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.

Authors: Shengdar Q Tsai; Zongli Zheng; Nhu T Nguyen; Matthew Liebers; Ved V Topkar; Vishal Thapar; Nicolas Wyvekens; Cyd Khayter; A John Iafrate; Long P Le; Martin J Aryee; J Keith Joung
Journal: Nat Biotechnol Date: 2014-12-16 Impact factor: 54.908

944 in total

1. Bacteria yield new gene cutter.

Authors: Heidi Ledford
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

2. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system.

Authors: Bernd Zetsche; Jonathan S Gootenberg; Omar O Abudayyeh; Ian M Slaymaker; Kira S Makarova; Patrick Essletzbichler; Sara E Volz; Julia Joung; John van der Oost; Aviv Regev; Eugene V Koonin; Feng Zhang
Journal: Cell Date: 2015-09-25 Impact factor: 41.582

3. Synthetic CRISPR RNA-Cas9-guided genome editing in human cells.

Authors: Meghdad Rahdar; Moira A McMahon; Thazha P Prakash; Eric E Swayze; C Frank Bennett; Don W Cleveland
Journal: Proc Natl Acad Sci U S A Date: 2015-11-16 Impact factor: 11.205

Review 4. Combining CRISPR/Cas9 and rAAV Templates for Efficient Gene Editing.

Authors: Manuel Kaulich; Steven F Dowdy
Journal: Nucleic Acid Ther Date: 2015-11-05 Impact factor: 5.486

5. Optimization of a multiplex CRISPR/Cas system for use as an antiviral therapeutic.

Authors: Edward M Kennedy; Anand V R Kornepati; Adam L Mefferd; Joy B Marshall; Kevin Tsai; Hal P Bogerd; Bryan R Cullen
Journal: Methods Date: 2015-08-17 Impact factor: 3.608

6. Decoding non-random mutational signatures at Cas9 targeted sites.

Authors: Amir Taheri-Ghahfarokhi; Benjamin J M Taylor; Roberto Nitsch; Anders Lundin; Anna-Lina Cavallo; Katja Madeyski-Bengtson; Fredrik Karlsson; Maryam Clausen; Ryan Hicks; Lorenz M Mayr; Mohammad Bohlooly-Y; Marcello Maresca
Journal: Nucleic Acids Res Date: 2018-09-19 Impact factor: 16.971

Review 10. The potential of gene therapy approaches for the treatment of hemoglobinopathies: achievements and challenges.

Authors: Michael A Goodman; Punam Malik
Journal: Ther Adv Hematol Date: 2016-06-25