Tobias Anton1, Elisabeth Karg1, Sebastian Bultmann1. 1. Department of Biology II and Center for Integrated Protein Science Munich (CIPSM), LMU Munich, 82152 Martinsried, Germany.
Abstract
Since the discovery of the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated system (Cas) as a tool for gene editing a plethora of locus-specific as well as genome-wide approaches have been developed that allow efficient and reproducible manipulation of genomic sequences. However, the seemingly unbound potential of CRISPR/Cas does not stop with its utilization as a site-directed nuclease. Mutations in its catalytic centers render Cas9 (dCas9) a universal recruitment platform that can be utilized to control transcription, visualize DNA sequences, investigate in situ proteome compositions and manipulate epigenetic modifications at user-defined genomic loci. In this review, we give a comprehensive introduction and overview of the development, improvement and application of recent dCas9-based approaches.
Since the discovery of the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated system (Cas) as a tool for gene editing a plethora of locus-specific as well as genome-wide approaches have been developed that allow efficient and reproducible manipulation of genomic sequences. However, the seemingly unbound potential of CRISPR/Cas does not stop with its utilization as a site-directed nuclease. Mutations in its catalytic centers render Cas9 (dCas9) a universal recruitment platform that can be utilized to control transcription, visualize DNA sequences, investigate in situ proteome compositions and manipulate epigenetic modifications at user-defined genomic loci. In this review, we give a comprehensive introduction and overview of the development, improvement and application of recent dCas9-based approaches.
Prokaryotes not only employ a variety of innate defense mechanisms against foreign viral or plasmid DNA, such as restriction/modification systems or blocking of phage adsorption [1], but also possess a sophisticated adaptive defense mechanism [2] called clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas).First observed in Escherichia coli, approximately half of bacteria (∼40%) and nearly all archaea (∼90%) are equipped with [3-5] these systems, which rely on small CRISPR-RNAs (crRNAs) to guide nucleases to invading nucleic acids. All CRISPR/Cas systems comprise a set of Cas genes, organized in operons, and a CRISPR-locus, harboring an array of foreign DNA-derived genome-targeting sequences (termed spacers) flanked by identical direct repeats [6].In general, CRISPR/Cas-mediated adaptive immunity occurs in the following steps. First, upon phage infection or plasmid uptake, short stretches (∼30 bp) of exogenous DNA (termed protospacer) are recognized and integrated into the CRISPR-array. Second, transcription of this array, results in a pre-CRISPR-RNA (pre-crRNA), which is subsequently processed into small mature crRNAs containing a portion of the direct repeat sequence and the spacer. Third, the crRNA forms a ribonucleoprotein-complex with Cas protein(s) and, in case of type II systems, a trans-activating crRNA (tracrRNA). Finally, the complementarity between the spacer of the crRNA and the protospacer of the invading DNA and a short protospacer-adjacent motif (PAM) lead to binding of this complex and target degradation [6].Based on the presence of unique Cas proteins, the modes of crRNA maturation and RNA-guided interference, CRISPR/Cas systems are subdivided into two classes and five main types (type I, II, III, V and VI). Class I systems include type I as well as type III systems and rely on multi-subunit effector proteins for target degradation. In type I systems, the endonucleases Cas6 or Cas5d facilitate the maturation of crRNA [7-9], which in turn interacts with a complex of five Cas proteins (CasA–E) called Cascade (CRISPR-associated complex for antiviral defense) for target recognition via complementary base pairing [10]. Upon target binding, conformational changes (R-loop formation) lead to recruitment of the nuclease Cas3, which facilitates DNA degradation [11]. In type III systems, Cascade is replaced by a complex consisting of repeat units of Csm or Cmr proteins and target degradation requires Cas10 [11-15].Contrary to Class I, Class II CRISPR/Cas systems (types II, V, and VI) only require a single effector protein for target degradation. Cas9 is the major hallmark for type II CRISPR/Cas systems and forms a bi-lobed structure with a larger recognition lobe (REC lobe), a smaller nuclease lobe (NUC lobe) and a C-terminal PAM-interacting domain. Cas9 interacts with the repeat: anti-repeat sequences of the tracrRNA: crRNA duplex via a positively charged arginine-rich motif at the inner surface of the REC lobe. RNA-binding induces a conformational change and reorientation of the NUC lobe toward the REC lobe. The two lobes form a clam-like shape with a positively charged central channel, which accommodates both the tracrRNA: crRNA duplex and the target DNA. Furthermore, the two nuclease domains RuvC and HNH in the NUC lobe are positioned in a favorable way for subsequent target cleavage [16, 17].For genome engineering purposes, the site-directed nuclease Cas9 has been used in combination with a fusion of the crRNA and tracrRNA into a single guide RNA (sgRNA), to induce a double-stranded break (DSB) at a desired locus in a great variety of cell types and organisms, including human, mouse, fly, worm and zebrafish [18-22]. DSBs are resolved by the hosts repair machinery either by non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ) or homology-directed repair (HDR). While the NHEJ-pathway results in a high frequency of insertions or deletions (indels) near the break-site, MMEJ leads to deletions only. Both pathways, however, cause premature stop codons or other frameshift mutations and eventually a knock-out of gene function [23]. Yet, in the presence of a homologous donor DNA, DSBs can be repaired while introducing defined sequences [24].To increase the likelihood of HDR and thus the insertion of defined homologous DNA sequences, Cas9 nuclease domains have been mutated/engineered to convert Cas9 into a nicking enzyme with only one functional nuclease domain. The resulting single strand breaks are preferentially repaired by HDR and the frequency of NHEJ is decreased. The most commonly used point mutations to inactivate either of the two nuclease domains are D10A and H840A for RuvC and HNH, respectively [16, 17, 21, 25, 26]. The Cas9 RuvC domain shares structural similarities with an RNase H fold found in RuvC Holliday junction resolvases of prokaryotes, such as E. coli and Thermus thermophilus [27, 28]. These nucleases are characterized by four catalytic amino acids and cleave their target via a two-metal mechanism. In addition to the D10A mutation, it was shown that substituting either Glu762, His983 or Asp986 with alanine also resulted in a Cas9 nickase [17]. The catalytic center of the HNH domain comprises a ββα-metal fold and shares structural similarities to the phage T4 endonuclease VII and the Vibrio vulnificus nuclease. Contrary to the RuvC domain, HNH nuclease activity depends on three catalytic residues and is facilitated by a single-metal mechanism [29, 30]. Besides the H840A mutation, it was demonstrated that in a N863A mutant Cas9, HNH-mediated substrate cleavage is abolished [17].Importantly, by mutating both nuclease domains simultaneously, Cas9 can be engineered into a RNA-guided DNA-binding platform, which still binds DNA in a sequence-specific manner without cleaving the underlying target [26, 31]. To date, this catalytically dead Cas9 (dCas9) has been employed outside the gene editing context, for example, for genome visualization, transcriptional regulation, epigenetic manipulation and investigation of chromatin composition (Fig. 1). In this review, we present a current overview of those dCas9-based CRISPR/Cas methodologies.
Figure 1:
Overview of dCas9-based applications to study and manipulate chromatin. (A) Catalytically inactive Cas9 (dCas9) represents a RNA-guided DNA-binding platform that can be harnessed to target FP to a pre-defined genomic sequence and allows to visualize spatiotemporal dynamics of chromatin in living cells. (B) Fused to a variety of effector proteins, dCas9 can be employed to directly alter the transcriptional state of specific genes or to precisely manipulate epigenetic marks, such as CpG-methylation or post-translational modifications of histones. (C) Additionally, dCas9 can be fused to biotin ligases (tag = BirA* or APEX2). This approach allows to biotinylate (red pentagons) locus-associated proteins and to subsequently identify them by mass spectrometry.
Overview of dCas9-based applications to study and manipulate chromatin. (A) Catalytically inactive Cas9 (dCas9) represents a RNA-guided DNA-binding platform that can be harnessed to target FP to a pre-defined genomic sequence and allows to visualize spatiotemporal dynamics of chromatin in living cells. (B) Fused to a variety of effector proteins, dCas9 can be employed to directly alter the transcriptional state of specific genes or to precisely manipulate epigenetic marks, such as CpG-methylation or post-translational modifications of histones. (C) Additionally, dCas9 can be fused to biotin ligases (tag = BirA* or APEX2). This approach allows to biotinylate (red pentagons) locus-associated proteins and to subsequently identify them by mass spectrometry.
Earlier approaches
For decades, sequence-specific visualization of chromatin mainly relied on fluorescence in situ hybridization (FISH)-based approaches. Here, the target sequence is detected via complementary base pairing with an epitope- or fluorophore-labeled nucleic acid probe after the genomic DNA has been denatured. Using this technique on fixed cells, entire chromosomes, chromosome arms or single loci have been visualized. By combining different fluorophores, simultaneous detection of several loci or even entire sets of chromosomes can be achieved [32]. By omitting the DNA denaturation step, the FISH approach can be adapted for the visualization of RNA. Moreover, sophisticated methods, such as the fusion of MS2 aptamers (discussed in Section Site-specific visualization of genomic elements) to the target RNA or the microinjection of molecular beacons, facilitate imaging of individual RNA molecules in living cells [33]. In living cells, bulk chromatin can be visualized by cell-permeable DNA-dyes (e.g. DRAQ5), fluorescently labeled nucleotides (e.g. Cy5-dUTP) or histones fused to fluorescent proteins (FP) (e.g. H2B-GFP) [34-36]. Furthermore, specific genomic sequences that are characterized by a defined protein composition, such as centromeres or telomeres can be traced by fluorescently tagging their associated binding proteins [37, 38]. Since loci that meet this requirement are limited, lac or tet operator repeats were introduced at specific genomic regions via genome engineering. These sequences can then be recognized by Lac- or Tet-repressor proteins, respectively, fused to FP [39, 40]. However, since these large artificial sequences might interfere with chromatin dynamics, this method only provides indirect information about the native locus. Hence, live-cell approaches to examine the spatiotemporal dynamics of endogenous loci rely on DNA-interacting proteins, which bind their target in a sequence-specific manner.Recognition and labeling of user-defined sequences was first realized utilizing zinc finger (ZnF) proteins [41]. This highly diverse group of proteins serves a large variety of biological functions, including transcriptional activation, protein folding, regulation of apoptosis and nucleic acid binding [42]. Notably, the classic Cys2His2ZnF structural motif, which was first described in the transcription factor IIIA from Xenopus laevis, is conserved among higher eukaryotes and represents the predominant DNA-binding domain in humans [43-45].Individual Cys2His2ZnF domains comprise ∼30 amino acids, forming a ββα-motif, in which two cysteines and histidines coordinate a single Zn2+ ion. Target recognition is predominantly facilitated by the α-helix, which establishes contact to three bases within the major groove of the target DNA [46, 47]. Since each ZnF motif recognizes a distinct base triplet, tandem arrangement of up to six modules into a polydactyl ZnF protein (PZF) enables the recognition of unique genomic loci [48]. Fused to green fluorescent protein (GFP), PZFs were the first modular DNA-binding proteins, which were used to visualize repetitive genomic sequences in vivo [49]. It was shown that PZFs can be harnessed to trace major satellite sequences in mouse cells, as well as centromeric repeats in Arabidopsis, demonstrating that this method can be applied in different organisms. Although PZFs have additionally been successfully employed for genome editing and transcriptional regulation, it has been demonstrated that individual zinc fingers display a preference for GC-rich substrates and that neighboring modules affect each other’s target specificity. Hence, target sequence prediction is limited and newly designed PZFs have to be subjected to a rigorous selection process, rendering this approach laborious and expensive [50-52].To overcome these limitations, fluorescently tagged designer transcription activator-like effectors (dTALEs) have been employed to substitute PZFs [53-57]. Contrary to PZFs, target recognition via dTALEs is facilitated via a simple repeat variable diresidue (RVD)-based code. The fact that one RVD specifically recognizes one nucleotide greatly simplifies the process of designing new dTALEs. Importantly, dTALEs are characterized by a high target specificity and, in fact, were shown to distinguish 1–2 nucleotide differences on the target sequence [58]. Hence, fluorescent dTALEs allow to detect individual chromosomes by single nucleotide polymorphisms (SNPs) [54]. Despite these evident advantages over PZFs, the highly repetitive central domain of dTALEs has been suggested to cause self-assembly and formation of protein aggregates, preventing an effective binding of the cognate target DNA. To this end, a recent study demonstrated that fusion of dTALEs to the chaperone-like protein thioredoxin (TRX) enhances their solubility in human cells [55]. However, the repetitive nature of the DNA-recognition domain renders it difficult to reassemble dTALEs for different sequences and necessitates the use of laborious cloning techniques [59-61].
Site-specific visualization of genomic elements
CRISPR/Cas-based imaging (CRISPR imaging) can be regarded as the next generation of tools to study chromatin dynamics in living cells. Contrary to PZF- or dTALE-based approaches, target specificity is solely mediated by the sgRNA. Therefore, CRISPR imaging can easily be customized to visualize new sequences by simply exchanging the sgRNA without the need to replace the protein itself. By employing CRISPR imaging, repetitive sequences have been successfully visualized [62, 63]. Although such tandem repeats are present in virtually all eukaryotes, it might prove difficult to find one near a locus of interest. Tracing of single-copy loci, therefore, requires targeting of many different consecutive sequences in order to enrich the signal over background levels [63]. In this context, CRISPR imaging is likely to be superior, as it only requires the introduction of small RNAs, whereas for dTALE- or PZF-based approaches, different proteins for each target sequence have to be expressed. This is illustrated by the fact that labeling of a non-repetitive sequence via dTALEs has not been realized, so far. Moreover, recent studies presented a method that employs a cocktail of restriction enzymes and controlled micrococcal nuclease (MNase) digests to generate a genome-wide library of sgRNAs [64, 65]. Further refinement of these methods could allow the production of sgRNA-libraries, which are specific for whole chromosomes or non-repetitive subsets of chromosomes. Together with suitable means to deliver these libraries into living cells (e.g. via lentiviral transduction), CRISPR imaging could facilitate to monitor the spatiotemporal dynamics of chromosomes in real time.Since the first demonstration of CRISPR imaging, great efforts have been made to further optimize this system (Fig. 2). One issue of visualizing single-copy loci via CRISPR imaging is a rather low signal-to-noise ratio, caused by freely diffusing dCas9-eGFP molecules (Fig. 2A and B). To this end, it has been recently demonstrated that the SunTag (SUperNova tag) is suitable to significantly amplify the fluorescence signal at targeted loci [66, 67]. This system comprises a peptide, which contains up to 24 tandemly arranged copies of a short peptide epitope (GCN4), and a GFP-tagged cognate single-chain variable fragment antibody (scFv). By combining the SunTag with CRISPR imaging of telomeres, it was shown that a ∼20-fold signal enhancement is possible. Due to this bright signal, it would be feasible to visualize genomic loci with lower light illumination, thus reducing phototoxic effects during long-term imaging. Moreover, high signal-to-noise ratios enabled the visualization of two low-repeat loci in humanembryonic kidney (HEK293T) cells [67].
Figure 2:
Expanding the CRISPR imaging toolkit. (A, B) To enhance the signal-to-noise ratio of CRISPR imaging, dCas9 is fused to arrays of small peptide epitopes (GCN4 and sfGFP11). These epitopes then either recruit fluorescent molecules (scFv-GFP) or are complemented (sfGFP1–10), reconstituting fluorescence. (C–E) Multi-color CRISPR imaging can be achieved by either co-expressing differentially labeled orthogonal dCas9 proteins (Sp-dCas9-GFP and Nm-dCas9-RFP) or by fusion of RNA aptamers (PP7 or MS2) to the sgRNA and co-expressing the cognate, fluorescently tagged binding proteins (PCP-GFP and MCP-GFP, respectively). Moreover, sgRNAs can be appended by PUF-binding sites (PBS1 or PBS2, respectively). These sites are then recognized by differentially tagged PUF proteins (PUF1-GFP and PUF2-RFP). (F) By substituting the PAM sequence in the form of an oligonucleotide (PAMmer), dCas9 can be targeted to single stranded RNA molecules.
Expanding the CRISPR imaging toolkit. (A, B) To enhance the signal-to-noise ratio of CRISPR imaging, dCas9 is fused to arrays of small peptide epitopes (GCN4 and sfGFP11). These epitopes then either recruit fluorescent molecules (scFv-GFP) or are complemented (sfGFP1–10), reconstituting fluorescence. (C–E) Multi-color CRISPR imaging can be achieved by either co-expressing differentially labeled orthogonal dCas9 proteins (Sp-dCas9-GFP and Nm-dCas9-RFP) or by fusion of RNA aptamers (PP7 or MS2) to the sgRNA and co-expressing the cognate, fluorescently tagged binding proteins (PCP-GFP and MCP-GFP, respectively). Moreover, sgRNAs can be appended by PUF-binding sites (PBS1 or PBS2, respectively). These sites are then recognized by differentially tagged PUF proteins (PUF1-GFP and PUF2-RFP). (F) By substituting the PAM sequence in the form of an oligonucleotide (PAMmer), dCas9 can be targeted to single stranded RNA molecules.Similar to the SunTag, a split version of super-folder GFP (sfGFP) has been adopted as an epitope tag for signal enhancement in targeted gene activation and fluorescence imaging [68]. For this approach, sfGFP is split between the 10th and 11th β-strand, resulting in the non-fluorescent sfGFP1–10 and sfGFP11, which serves as a short epitope. A tandem array, comprising up to seven repeats of the sfGFP11 epitope (dCas9-sfGFP11x7), can be fused to a protein of interest (POI) and subsequently recognized by the co-expressed sfGFP1–10 fragment. Upon self-complementation, the chromophore matures and sfGFP regains its fluorescence [69]. By expressing dCas9-sfGFP11x7 together with VP64-sfGFP1–10, it was demonstrated that this system is capable to drastically increase the expression level of the targeted CXCR4 gene. Hence, it would be interesting to evaluate, whether the sfGFP11-tag can be adopted for CRISPR imaging. Whereas the SunTag system still might cause background fluorescence due to unbound scFv-GFP molecules, this GFP-derived epitope tag would be particularly promising, since sfGFP1–10 can be overexpressed without causing background signals. Moreover, a similar tag has also been derived from super-folder Cherry (sfCherry11), which could enhance the versatility of this approach [68].In comparison to dTALE-based visualization of genomic elements, one inherent drawback of the original CRISPR imaging technique is the fact that simultaneous multi-color labeling of several loci is not feasible. To overcome this limitation, orthogonal dCas9 proteins from the three different bacterial species Streptococcus pyogenes (Sp-dCas9), Neisseria meningitidis (Nm-dCas9) and Streptococcus thermophilus (St1-dCas9) were harnessed (Fig. 2D) [70]. Importantly, target recognition of these dCas9 variants is constrained by different PAM sequences, enabling multiplexed sgRNA-guided recruitment to multiple genomic sites [71]. Hence, by simultaneously co-expressing differently tagged dCas9 orthologs with their cognate sgRNAs, each specific for a distinct locus, it was possible to resolve the inter- and intrachromosomal distances between two repetitive sequences in living human cells. In addition, another study repurposed the small Cas9 ortholog from Staphylococcus aureus (Sa-dCas9) for multiplexed CRISPR imaging and demonstrated that a combination of Sp-dCas9-eGFP and Sa-dCas9-mCherry is capable to resolve two different loci, which are separated by less than 300 kb [72].Although representing a significant improvement in CRISPR imaging, orthogonal CRISPR/Cas systems are characterized by more complex PAM requirements. For instance, whereas Sp-dCas9 recognizes a 5´-NGG-3´-sequence, Nm-dCas9 is only targeted to sequences followed by a 5´-NNNNGATT-3´ motif, thus restricting the flexibility toward target sequences. To circumvent this issue, a second strategy for multi-color CRISPR imaging has been developed (Fig. 2C). Here, the S. pyogenes sgRNA is fused to the RNA aptamers MS2 or PP7 that are specifically bound by the bacteriophage coat proteins MCP (MS2 coat protein) and PCP (PP7 coat protein), respectively [73-75]. The resulting scaffold RNA (scRNA) is still capable to guide dCas9 to the desired target and can be detected by fluorescently tagged coat proteins. A similar approach combined dCas9-mediated DNA-recognition with Pumilio-assisted RNA-binding (Fig. 2E) [76]. This technique, termed Casilio, utilizes the fact that the RNA-binding domain of the Drosophila proteinPumilio (PUF) can be reprogrammed to bind specific 8-mer RNA sequences (PUF-binding site, PBS) [77]. Therefore, appending different sgRNAs with PUF-binding sites, specific for differently tagged PUF domains, facilitates simultaneous tracing of multiple genomic loci.Recently, the scope of CRISPR imaging has been further expanded by demonstrating that dCas9 can be harnessed as a tool to visualize endogenous, unmodified RNA (Fig. 2F) [78]. Although the type II CRISPR/Cas system has evolved to target double-stranded DNA, Cas9 can be guided to RNA, when the PAM is provided in trans by an oligomer (PAMmer) that is partially complementary to the targeted RNA [79]. However, due to high background signals, so far RNA could only be visualized, when large quantities accumulated within stress granules. In this case, fusing dCas9 to the aforementioned sfGFP11-tag is likely to enhance the signal-to-noise ratio, thus facilitating the tracking of less abundant RNAs. It is worth to mention that an additional Cas protein, Cas13, has recently been harnessed for targeted RNA visualization, as well as RNA knockdown and editing [80, 81]. This CRISPR effector is a member of the Class II type VI system, which specifically targets single-stranded RNA in an crRNA-guided manner and requires two Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains for RNA degradation [82]. Similar to Cas9, Cas13 can be converted into a catalytically inactive variant (dCas13) by substituting catalytic residues within the HEPN domains. By fusing dCas13 to a monomeric superfolder GFP (msfGFP), accumulation of actin beta (ACTB) transcripts to stress granules could be tracked in living cells [80]. Interestingly, when fused to a hyperactive mutant of adenosine deaminase acting on RNA 2 (ADAR2), dCas13 has been demonstrated to correct 33 out of 34 tested disease-relevant G-to-A mutations [81]. This ability to manipulate proteins on the RNA-level could be a putative treatment for a variety of diseases.Collectively, CRISPR imaging offers high flexibility and represents a time- and cost-effective alternative to visualization techniques, based on modular DNA-binding proteins. Especially the potential to trace non-repetitive sequences renders CRISPR imaging a versatile tool to study chromatin dynamics.
Transcriptional regulation and epigenetic manipulation
The eukaryotic transcriptional landscape is regulated by a complex network of epigenetic modifications, transcription factors and chromatin organization. To dissect cause and consequence of any of these elements, it is crucial to have tools at hand that allow their precise manipulation. Since the first description of Cas9-mediated genome engineering, the CRISPR/Cas system has in parallel been refined for site-specific transcriptional regulation and manipulation of epigenetic marks and represents a powerful tool to interrogate the mechanistic relationship between chromatin state and regulation of gene expression (Fig. 1B).The applicability of the CRISPR/Cas system to specifically repress gene transcription has first been described in E. coli. Termed CRISPR interference (CRISPRi), DNA-bound dCas9 can either interfere with transcription elongation by physically blocking RNA polymerase progression or hinder the binding of essential transcription factors [31, 83]. In prokaryotes, this method can lead to up to a ∼1000-fold reduction of mRNA levels. In mammalian cells, however, only a modest ∼2-fold reduction of reporter transcription was observed [31]. Since these initial studies, the CRISPRi system has been further refined. For a more efficient repression of transcription in mammalian cells, dCas9 can be either fused to the repressor domain Krüppel-asociated box (KRAB) or four-linked mSin3 interaction domains (SID4X). Using these fusion proteins, endogenous genes, such as octamer-binding protein 4A (OCT4A), sex determining region Y (SRY)- box 2 (SOX2), C-X-C chemokine receptor type 4 (CXCR4) and cluster of differentiation 71 (CD71), have been specifically and efficiently repressed [84-86].In addition to CRISPRi, the CRISPR/Cas system has also been harnessed to specifically activate gene expression. In E. coli, CRISPR-mediated gene activation (CRISPRa) was realized by fusing the ω-subunit of RNA polymerase to dCas9. Targeted to the promoter of a reporter gene, this fusion protein led to the recruitment of the polymerase holoenzyme, resulting in up to 3-fold increase in gene expression [83]. In eukaryotic cells, CRISPRa is commonly achieved by fusing dCas9 to heterologous transcription activation domains, such as VP64 or p65 [84, 87–89]. To achieve significant activation of endogenous genes, however, it is often necessary to recruit dCas9-effector to neighboring sites via multiple different sgRNAs [88, 89]. By screening more than 20 effector proteins with known transcriptional roles, one study led to the development of a powerful hybrid activation domain [90]. This tripartite domain, termed VPR, consists of tandemly linked VP64, p65 and the Epstein-Barr virus R transactivator (Rta), which synergistically enhance gene expression. Targeting dCas9-VPR fusion proteins to endogenous genes revealed that, compared to dCas9-VP64 alone, transcription can be amplified up to ∼300-fold. Moreover, this dCas9-VPR approach was successfully tested in diverse eukaryotic cells, such as human, mouse, fly and yeast.Analogous to signal amplification in CRISPR imaging approaches, both dCas9 and the sgRNA have been engineered to indirectly target effector proteins to specific loci. As mentioned before, co-expressing dCas9-sfGFP11x7 with VP64-sfGFP1–10 and a sgRNA targeting the CXCR4 locus in K562 cells, resulted in a ∼45-fold increase of CXCR4 expression, when compared to dCas9-VP64 [68]. Additionally, a SunTag array, consisting of 10 copies of the GCN4 epitope, was fused to dCas9. This construct was successfully used to recruit scFv-VP64 fusions to CXCR4 and CDKN1B loci in K562 cells, resulting in significant activation of gene expression [66]. The efficiency of both CRISPRi and CRISPRa approaches highly depends on the genomic position to where dCas9 is recruited. To elucidate targeting rules for CRISPR-based modulation of transcription, one study performed an extensive screen, targeting a set of 49 genes with a total of 54.810 sgRNAs. It was demonstrated that for CRISPRi (via dCas9-KRAB), active sgRNAs cluster in a window from -50 bp to +300 bp relative to the transcription start site (TSS). For CRISPRa (via dCas9-SunTag/VP64), a peak in transcription activation was observed, when dCas9 was targeted between -400 bp to -50 bp upstream of the TSS [91].Besides engineering the dCas9 protein, several studies demonstrated that the sgRNA can be modified to recruit effector proteins to a desired genomic locus. In a first attempt, two MS2 RNA stem-loops were incorporated at the 3´ end of the sgRNA, which subsequently can be recognized by MCP-VP64 fusion proteins. Although this system was capable to activate ZFP42 expression, it was outperformed by dCas9-VP64 [92]. To further improve sgRNA-mediated manipulation of transcription, the synergistic activation mediator (SAM) technology was developed. For this, MS2 aptamers were inserted at the tetraloop and the stem-loop 2 of the sgRNA. Notably, at these positions, the sgRNA does not interact with the dCas9 protein. MS2 then recruits MCP fused to p65 and HSF1 (heat shock factor 1). Combining this modified sgRNA with a dCas9-VP64 fusion resulted in drastically increased expression levels of targeted genes and was shown to simultaneously upregulate 10 genes [114]. In contrast to RNA-hairpin structures, the Casilio technique introduces specific 8-mer sequences (PBS; see above) that are recognized by PUF domains. Using this approach, it was demonstrated that recruitment of PUF-p65-HSF1 results in robust activation of SOX2 and OCT4 [76].Direct fusions of dCas9 with epigenetic effectors have been successfully employed to activate transcription of previously silenced genes. For instance, recruiting the catalytic core of the human acetyltransferase p300 to promoters or enhancers leads to robust transcriptional activation and directly implicates H3K27ac in this process [93]. Additionally, targeted DNA demethylation, mediated by dCas9-ten-eleven translocation 1 (TET1) fusions results in gene reactivation [94-96]. Analogous to targeted activation, the CRISPR/Cas system has been utilized to recruit DNA de novo methyltransferases or histone demethylases to specific gene regulatory regions, resulting in a local repressive epigenetic state and thus gene silencing [97-99].As for CRISPRi and CRISPRa, dCas9-based epigenetic editing approaches have been further improved to amplify the enzymatic activity at the target site and to enable the use of more modular strategies. The dCas9-SunTag system was employed to recruit epigenetic effectors to influence the transcriptional state of specific genes. Whereas targeting the catalytic domain of TET1 (scFv-TET1CD) to the imprinted H19 locus led to DNA demethylation and gene expression, dCas9-SunTag-DNMT3A was able to increase CpG methylation and thereby repress HOXA5 transcription [100, 101]. Furthermore, a recent study used MS2 stem loops incorporated into the sgRNA to recruit the epigenetic effector proteins fused to MCP. This setup allowed the efficient and specific demethylation and of the RANKL and MAGEB2 gene using the TET1 catalytic domain. Interestingly, demethylation was not observed directly at the target site but in a distance of 100–300 bp pointing toward so far uncharacterized mechanisms that restrict access of dCas9 to the target site or cellular pathways that protect certain CpGs from demethylation [96]. Another approach utilized dCas9 fused to a GFP-binding nanobody (GBP), enabling specific and effective recruitment of GFP-tagged epigenetic effectors to manipulate local DNA methylation [102].Interestingly, several studies have shown that epigenetic editing is only transient and does not result in stable memory once the targeting system is removed [103-105]. However, a recent study demonstrated that combinatorial targeting of multiple effector domains involved in the silencing of endogenous retroviruses results in the stable silencing of gene expression even after the engineered repressors where removed [106]. This underlines the importance of the concerted action and recruitment of multiple epigenetic pathways components for the establishment of efficient long-term epigenetic memory.
Investigating chromatin composition
Besides visualizing the spatiotemporal dynamics of chromatin and manipulating the transcriptional state of endogenous genes, it is crucial to assess the protein composition of specific genomic loci to fully decipher the function of chromatin. To this end, several methods have been developed that facilitate the profiling of genome-wide or local binding sites of chromatin proteins, as well as detecting protein-protein associations.Chromatin immunoprecipitation (ChIP) represents a powerful and well established method to study the distribution of a DNA-binding protein along the genome. For this, DNA-protein complexes are crosslinked in vivo. Subsequently, the chromatin is fragmented and DNA fragments that are bound by the POI are immunoprecipitated via specific antibodies. Finally, DNA-POI crosslinks are reversed and the released DNA is assayed by next-generation sequencing (ChIP-seq), PCR (ChIP-qPCR) or microarray (ChIP-chip) to determine the sequences bound by the POI [107]. However, these classical ChIP approaches are often limited by the availability of suitable antibodies and only provide information on the global distribution of POIs. To identify proteins that are associated with a specific genomic locus, engineered DNA-binding molecule-mediated ChIP (enChIP) has been developed (Fig. 3A) [108]. Here, FLAG-tagged dCas9 is recruited to a locus of interest, chromatin is crosslinked, fragmented and dCas9-bound fragments are enriched via FLAG-tag specific antibodies. After reversion of crosslinks, the isolated complexes can then be analyzed by mass spectrometry. Another approach to identify the chromatin composition of individual genomic loci, termed CasID, combines proximity-dependent biotin identification (BioID) with the target specificity of the CRISPR/Cas system (Fig. 3B) [109, 110]. For this, dCas9 is fused to the promiscuous biotin ligase BirA*. The culture medium of BirA*-dCas9 expressing cells is then supplemented with exogenous biotin, resulting in the biotinylation of proteins that are located in the near proximity (∼10 nm) of the targeted locus. After affinity purification via streptavidin-coated beads, biotinylated proteins can be analyzed by mass spectrometry. Using this technique, known and unknown proteins of telomeres, minor and major satellites could be identified. Recently, a similar method employed a fusion of dCas9 with the engineered ascorbate peroxidase APEX2 (CASPEX) [111]. Comparable to BirA*, APEX2 also ligates biotin to nearby lysine residues, the labeling radius, however, is smaller. Whereas CasID has only been tested on repetitive sequences, CASPEX successfully identified proteins at the single-copy loci hTERT and c-MYC. In addition to CRISPR-mediated recruitment of biotin ligases, dCas9, carrying a biotin acceptor tag, has been employed to study chromatin composition. Co-expressing BirA leads to in vivo biotinylation of this modified dCas9. Subsequent enrichment of targeted loci via streptavidin allowed to comprehensively study proteins and long-range DNA interactions associated with the β-globin cluster in K562 cells [112].
Figure 3:
Identification of locus associated proteins by dCas9. (A) For enChIP, dCas9 is fused to a FLAG-tag and targeted to a locus of interest. Chromatin is then crosslinked and fragmented. dCas9-bound chromatin fragments are subsequently isolated by FLAG-specific antibodies and analyzed via mass spectrometry. (B) Contrary to enChIP, CasID requires the expression of dCas9 fused to the promiscuous biotin ligase BirA*. After the culture medium has been supplemented with exogenous biotin, BirA* catalyzes the addition of biotin to lysine residues of proteins that are in close proximity to the dCas9-BirA* fusion protein. Lysis of the cells and denaturation of proteins is then followed by affinity purification of biotinylated peptides, which are identified via tandem MS.
Identification of locus associated proteins by dCas9. (A) For enChIP, dCas9 is fused to a FLAG-tag and targeted to a locus of interest. Chromatin is then crosslinked and fragmented. dCas9-bound chromatin fragments are subsequently isolated by FLAG-specific antibodies and analyzed via mass spectrometry. (B) Contrary to enChIP, CasID requires the expression of dCas9 fused to the promiscuous biotin ligase BirA*. After the culture medium has been supplemented with exogenous biotin, BirA* catalyzes the addition of biotin to lysine residues of proteins that are in close proximity to the dCas9-BirA* fusion protein. Lysis of the cells and denaturation of proteins is then followed by affinity purification of biotinylated peptides, which are identified via tandem MS.
Summary
The recent advances in dCas9-based approaches constitute powerful and versatile tools to investigate molecular pathways on a site-specific level. dCas9 as a recruitment platform of transcriptional activators and epigenetic modifiers offers the unique opportunity to dissect the cause and effect of epigenetic modifications on transcriptional regulation and vice versa. Epigenome-wide association studies (EWAS) are generating hundreds and thousands of epigenomic markers associated with human disease. dCas9-based epigenomic editing will be invaluable to single out the disease causing changes from indirect effects opening the doors to new treatment options and a better understanding of human pathogenesis.While Chromatin precipitation and sequencing (ChIP-seq) approaches have been invaluable to investigate the localization of individual proteins throughout the genome, it is the complex interplay of multiple proteins and pathways that govern the regulation of gene expression. Technologies such as CasID and CASPEX will allow unprecedented insights into the complex regulatory mechanisms that control gene expression by unraveling the locus-specific changes in chromatin composition in response to signalling, differentiation and disease progression. Similarly, visualization of genomic sequences using dCas9 will greatly enhance our understanding of nuclear architecture dynamics during cellular differentiation and cell cycle in living cells.Taken together, the CRISPR/Cas system represents a revolution that goes far beyond gene editing/genome engineering approaches. By inactivating the catalytic activity of Cas9, the enzyme becomes a universal and site-specific recruitment platform that opens a plethora of new applications for basic as well as medical research.
Authors: Erin L Garside; Matthew J Schellenberg; Emily M Gesner; Jeffrey B Bonanno; J Michael Sauder; Stephen K Burley; Steven C Almo; Garima Mehta; Andrew M MacMillan Journal: RNA Date: 2012-09-24 Impact factor: 4.942
Authors: Simon M G Braun; Jacob G Kirkland; Emma J Chory; Dylan Husmann; Joseph P Calarco; Gerald R Crabtree Journal: Nat Commun Date: 2017-09-15 Impact factor: 14.919
Authors: Pablo Perez-Pinera; D Dewran Kocak; Christopher M Vockley; Andrew F Adler; Ami M Kabadi; Lauren R Polstein; Pratiksha I Thakore; Katherine A Glass; David G Ousterout; Kam W Leong; Farshid Guilak; Gregory E Crawford; Timothy E Reddy; Charles A Gersbach Journal: Nat Methods Date: 2013-07-25 Impact factor: 28.547
Authors: Katharina Thanisch; Katrin Schneider; Robert Morbitzer; Irina Solovei; Thomas Lahaye; Sebastian Bultmann; Heinrich Leonhardt Journal: Nucleic Acids Res Date: 2013-12-25 Impact factor: 16.971
Authors: Laura M De Plano; Giovanna Calabrese; Sabrina Conoci; Salvatore P P Guglielmino; Salvatore Oddo; Antonella Caccamo Journal: Int J Mol Sci Date: 2022-08-05 Impact factor: 6.208
Authors: Luisa F Bustamante-Jaramillo; Joshua Fingal; Marie-Lise Blondot; Gustaf E Rydell; Michael Kann Journal: Viruses Date: 2022-03-08 Impact factor: 5.048