Literature DB >> 34197800

Allosteric regulation in CRISPR/Cas1-Cas2 protospacer acquisition mediated by DNA and Cas2.

Chunhong Long¹, Liqiang Dai², Chao E³, Lin-Tai Da⁴, Jin Yu⁵.

Abstract

Cas1 and Cas2 are highly conserved proteins across clustered-regularly-interspaced-short-palindromic-repeat-Cas systems and play a significant role in protospacer acquisition. Based on crystal structure of twofold symmetric Cas1-Cas2 in complex with dual-forked protospacer DNA (psDNA), we conducted all-atom molecular dynamics simulations to study the psDNA binding, recognition, and response to cleavage on the protospacer-adjacent-motif complementary sequence, or PAMc, of Cas1-Cas2. In the simulation, we noticed that two active sites of Cas1 and Cas1' bind asymmetrically to two identical PAMc on the psDNA captured from the crystal structure. For the modified psDNA containing only one PAMc, as that to be recognized by Cas1-Cas2 in general, our simulations show that the non-PAMc association site of Cas1-Cas2 remains destabilized until after the stably bound PAMc being cleaved at the corresponding association site. Thus, long-range correlation appears to exist upon the PAMc cleavage between the two active sites (∼10 nm apart) on Cas1-Cas2, which can be allosterically mediated by psDNA and Cas2 and Cas2' in bridging. To substantiate such findings, we conducted repeated runs and further simulated Cas1-Cas2 in complex with synthesized psDNA sequences psL and psH, which have been measured with low and high frequency in acquisition, respectively. Notably, such intersite correlation becomes even more pronounced for the Cas1-Cas2 in complex with psH but remains low for the Cas1-Cas2 in complex with psL. Hence, our studies demonstrate that PAMc recognition and cleavage at one active site of Cas1-Cas2 may allosterically regulate non-PAMc association or even cleavage at the other site, and such regulation can be mediated by noncatalytic Cas2 and DNA protospacer to possibly support the ensued psDNA acquisition.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34197800 PMCID： PMC8390960 DOI： 10.1016/j.bpj.2021.06.007

Source DB: PubMed Journal: Biophys J ISSN： 0006-3495 Impact factor: 3.699

Significance

Cas1-Cas2 protein responsible for initial acquisition of protospacer or psDNA are highly conserved among all types of CRRISPR-Cas systems. We employed all-atom molecular dynamics simulations to probe how a two-fold symmetric Cas1-Cas2 possibly conducts two cleavages quickly on the psDNA, containing only one PAMc sequence for recognition. We hypothesized that communication exists between two remotely separated active sites of Cas1-Cas2, in association with one PAMc and one non-PAMc DNA, respectively. Consistently, we found long-range correlation or allosteric communication to allow the non-PAMc-binding site to be stabilized or cleaved upon the PAMc cleavage at the other site, coordinated by the psDNA and associated Cas2. The study here brings a working scenario of Cas1-Cas2 acquisition in the absence of additional protein factors.

Introduction

Bacteria and archaea use clustered regularly interspaced short palindromic repeats (CRISPRs) along with associated protein (CRISPR-Cas) adaptive immune systems to protect against invading foreign nucleic acids from phages and plasmids (1, 2, 3, 4, 5). The CRISPR-Cas system captures the foreign DNA segments or protospacers and integrate them into the CRISPR array that consists of identical short repeats and variable spacers of similar sizes (1,2,6). The CRISPR-Cas system works in three steps: first, in the protospacer acquisition or adaptation stage, a new spacer is captured and stored in the host genomic CRISPR array (7, 8, 9), which is achieved by the highly conserved Cas1-Cas2 protein complex and possibly with assistances from additional protein factors (10). Second, the CRISPR locus is transcribed and processed into short mature CRISPR RNA (crRNA), which then binds to additional Cas proteins and forms a protein-crRNA complex (2,11). Last, at the targeted interference step, the secondary invading nucleic acid complementary to crRNA is recognized and degraded precisely by the protein-crRNA complex (12,13). Although the overall mechanism of CRISPR-Cas immune system has been well received, some of key steps remain to be elucidated. In particular, the acquisition conducted by Cas1-Cas2 is a least-understood step, even though this step is fundamental and ubiquitous to all types of CRISPR systems (9,14, 15, 16, 17, 18, 19). According to the evolutionary classification of the CRISPR-Cas system and Cas genes, Cas1 and Cas2 are the only two Cas proteins conserved across all CRISPR-Cas systems (20, 21, 22, 23, 24). Previous biochemical studies identify Cas1 and Cas2 as metal-dependent nucleases (25,26). Cas1 is capable of cleaving DNA of various forms in a sequence-independent manner and demonstrates catalytic activity in the protospacer acquisition. Cas2 is found to be able to cleave single-stranded DNA and double-stranded DNA (dsDNA), yet no catalytic activity of Cas2 has been shown in the protospacer acquisition (17,27). Correspondingly, the functional role of Cas2 in the acquisition remains to be elucidated. The conserved Cas1 and Cas2 proteins can assemble into twofold symmetric or dimeric Cas1-Cas2 complex (28,29), with two active sites formed by four Cas1 proteins (Cas1a and Cas1b and Cas1a’ and Cas1b’), respectively, whereas two Cas2 proteins (Cas2 and Cas2’) are sandwiched in between the dimeric Cas1. Such a Cas1-Cas2 complex captures the protospacer DNA (psDNA) before integrating it into the host CRISPR locus. Such an acquisition or adaptation process can possibly include four substeps: psDNA binding and selection, the 3′ overhang cleavages, integration, and DNA repair (5,18,19,28,30, 31, 32) (see Fig. 1 A). In this study, we focus on the first two substeps, i.e., the psDNA binding and selection and cleavages at two active sites, assuming that the dimeric Cas1-Cas2 plays a dominant role in the absence of additional protein factors. During the psDNA binding, the protospacer-adjacent motif (PAM) recognition is key for the selection (28,29,33). The protospacer-adjacent-motif complementary sequence (PAMc) recognition via hydrogen bonding (HB) and amino-acid side-chain stacking in the binding pocket are shown (see Fig. 1 B, bottom).

Figure 1

(A) Schematics of psDNA acquisition or adaption into the host CRISPR array for the CRISPR-Cas adaptive immunity. The acquisition process includes four substeps (from top to bottom): protospacer binding and selection, 3′ overhang cleavage, integration, and DNA synthesis and repair. During the integration step, PAMc sequence C’-OH (cleavage at site1 of Cas1-Cas2) is integrated into the spacer side of Repeat1 in the CRISPR array (likely via the second nucleophilic attack), whereas non-PAMc (cleavage at site2) is integrated into the leader side of Repeat1 (likely via the first nucleophilic attack) (19,28). (B) Crystal structure of E. coli Cas1-Cas2 bound to dual-forked psDNA (Protein Data Bank: 5DQZ) (28). Cas1a and Cas1a’ are colored in orange. The Cas1b and Cas1b’ are colored in magenta. Cas2 and Cas2’ are shown in cyan and green, respectively. The two active site1 and site2 are shown with black circles. In the crystal structure, both sites are bound with PAMc (CTT). The PAMc recognition via hydrogen bonding (HB; shown between heavy atoms in participation) and amino-acid side-chain stacking in the binding pocket are indicated in black and gray dotted lines (bottom left and right), respectively. To see this figure in color, go online. It has been shown that PAM is important for recognition and selection of the protospacer during the acquisition (15,16,34). In particular, psDNAs flanked by the correct PAM (GAA) can be cleaved and integrated efficiently into the CRISPR array, which ensure that the foreign psDNA containing the right PAM is incorporated (33,35,36). Meanwhile, in the crystal structures captured for the psDNA acquisition complex of Cas1-Cas2 (28), symmetrical PAM complementary sequence, or PAMc (CTT) were constructed at the nucleotides (nts) 28–30 in the two 3′ overhangs of the DNA protospacer, and both active sites (labeled as site1 and site2) formed by Cas1a and Cas1a’ are indeed bound with PAMc. Correspondingly, the 3′ PAMc sequence in this system is supposed to be cleaved between C and T, and C remains with the spacer to be integrated to the CRISPR array (see the schematics in Fig. 1 A, bottom). Upon the psDNA association and the 3′overhang PAMc recognition, the CTT can be cleaved by the Cas1-Cas2 complex at one active site because of the high specificity of such reaction (18,19,28,29). Meanwhile, the other active site, which generally binds to a non-PAMc at the other 3′overhang, is also supposed to conduct the catalytic cleavage, though it is not clear whether the cleavage can proceed sufficiently fast on the non-PAMc. Notably, high-resolution structural studies have caught a half-site intermediate integration complex of Cas1-Cas2 (30), which has its non-PAMc side 3′-OH linking to the CRISPR between the leader and Repeat1, as if it happens as the first nucleophilic attack on the CRISPR array (see Fig. 1 A, schematics). Assuming only Cas1-Cas2 works on the psDNA during this process, if the non-PAMc cleavage conducted by Cas1-Cas2 happens much slower than the first PAMc cleavage, then it is likely that an alternative half-site intermediate integration complex dominates, which has its PAMc 3′-OH linked to the CRISPR as the first nucleophilic attack (i.e., between the Repeat1 and the previous spacer or spacer1). The existence of the first half-site integration complex of Cas1-Cas2, with the non-PAMc 3′overhang linked to CRSIPR, thus suggests that the non-PAMc can be cleaved (at site2, labeled for convenience) almost as fast or immediately after the PAMc cleavage (at site1) when Cas1-Cas2 is the dominant player in charge of this acquisition process. We therefore want to explore how the non-PAMc site can be cleaved sufficiently fast in the Cas1-Cas2 acquisition complex. We hypothesized that certain communication exists between the two active sites in the Cas1-Cas2 complex so that the efficient non-PAMc cleavage is enabled. We accordingly conducted atomistic molecular dynamics (MD) simulations on the high-resolution crystal structure of the twofold Cas1-Cas2 complex in association with a dual-forked psDNA (28,29), which contains two PAMc on the two fork regions, respectively. In this dual-PAMc complex, the psDNA lies on the surface of Cas1-Cas2 in a head-to-head orientation (see Fig. 1 B). Two identical PAMc are bound by the two active sites (site1 and site2), respectively. We simulated the original Cas1-Cas2 binding complex with the dual PAMc and identified instead an asymmetrical equilibrium binding pattern between the two active sites. Then, we modified one PAMc to a non-PAMc (at site2) while keeping the other PAMc intact (at the site1) to mimic PAMc binding and recognition in general. We subsequently examined the Cas1-Cas2 systems from the psDNA binding to precatalytic and to a half-site postcatalytic state (right after the site1-PAMc cleavage) and monitored structural and correlation dynamics between the two active sites in respective systems. To substantiate our findings revealing certain correlations between the active sites upon the site1-PAMc cleavage modeled in the original Cas1-Cas2 complex, two synthetical psDNA sequences, which were experimentally identified with low and high acquisition frequencies or efficiencies (37), were further tested in the Cas1-Cas2-psDNA simulation systems at the half-site postcatalytic state. Via comparative studies, we demonstrate negative cooperativity between the two active sites, which seems to rely on allosteric propagation ∼10 nm from the site1-PAMc to the site2-non-PAMc upon the site1-PAMc cleavage, as being mediated by the psDNA in acquisition and noncatalytic Cas2 and Cas2’ in close association with the psDNA, which are harbored in between Cas1a and Cas1b (containing site1) and Cas1a’ and Cas1b’ (containing site2).

Materials and methods

By employing all-atom MD simulations, we analyzed the Cas1-Cas2 protospacer binding and recognition via examining formation of potential HBs and base-stacking characteristics at the active site (see Fig. 1). We first examined the original psDNA binding complex captured from the crystal structure with two identical PAMc bound at both sites (site1 and site2); then, we focused on the modified system, with one PAMc (CTT) bound at site1 and one non-PAMc (TTT) in association with site2, as supposed to be recognized by Cas1-Cas2 in general. For such a one-PAMc system, we examined not only the psDNA binding state (with PAMc but not non-PAMc stably bound) but also a precatalytic state (with catalytic magnesium ions incorporated to the active site1) and a half-site postcatalytic state (with PAMc modeled cleaved at site1, as if being catalyzed via an endonuclease reaction). See Supporting materials and methods and Figs. S1–S4 for modeling at active site, simulation equilibration, water box impacts, and histidine protonation status. For each state of the one-PAMc system, the protein-DNA internal correlations with the active site1 are measured for individual residues and compared among different simulation systems to reveal allosteric signal from the PAMc-binding site to the non-PAMc-binding site and the rest of the Cas1-Cas2 psDNA complex. Repeating equilibrium simulations have been conducted for above systems (each system for up to three repetitive simulation runs of 200 ns individually) to ensure robust results. Because above analyses indicated that Cas2-psDNA in bridging the two active sites play particular roles in the allosteric communication, Cas1-Cas2 in complex with the modified synthetic psL and psH DNA sequences are further examined (with extended simulations from 200 to 500 ns for computational convergence) to verify the allosteric effects and the role of psDNA and Cas2.

Results

Cas1-Cas2 bound with two identical PAMcs are asymmetrically stabilized at one site and nonstabilized at the other site

We first performed an equilibrium MD simulation to the original psDNA binding complex of Cas1-Cas2, which was crystalized with twofold symmetry, and with two identical PAMcs bound symmetrically at both active site1 and -2 (Protein Data Bank: 5DQZ) (28) (see Fig. S4). According to the HBs and stacking interactions formed between the protein and PAMc at site1 and site2 in the crystal structure (also in Fig. S5), we examined whether these interaction characteristics maintain in the equilibrium simulation. To keep simulation model close to the crystal structure, positional restrains were implemented for up to 10 ns at the beginning of the simulation (see Supporting materials and methods). When simulated within a large water box (∼180-Å size or with a minimal distance between the protein-DNA complex and boundary reaching ∼25 Å, and the simulation system ∼0.5 million atoms), the HBs and stacking interactions at site1 were maintained well during the simulation (see Fig. 2, A–C; see Fig. S6 for results from two repetitive simulation runs). Note that using a small simulation box (∼150-Å size or a minimal distance ∼11 Å), the PAMc-site1 could not be stabilized indeed (see Fig. S3); testing with an even larger simulation box (∼200 Å, a minimal distance ∼35 Å), site1 can become stabilized as in the 180-Å case (see Figs. S6 and S7). Note that the water-box-size-dependent behaviors have been demonstrated systematically in a previous work simulating tetrameric complex of human hemoglobin via all-atom MD (38), which highlights the importance of simulating bulk water environment and hydrophobicity of the protein system in conformational sampling and energetics. It is also possible that the multimeric or the protein quaternary structure matters essentially in the comparatively large assembly of the systems.

Figure 2

The equilibrium MD simulation of the Cas1-Cas2 complex with dual-PAMc psDNA and respective association characteristics at both site1 and site2. (A and B) The expected HB interactions with distances measured between residues R138, Y165, K211, and PAMc (CTT) at binding site1 and site2, respectively. (C and D) The expected stacking interactions with distances measured between the centers of mass of residues Y165, Y217, Q287, I291, and PAMc at binding site1 and site2, respectively. (E and F) The structural views at site1 and site2, at the end of the equilibrium simulations. The PAMc nts are colored blue and red at site1 (E) and site2 (F), respectively. The amino acids are colored by atom types (cyan, red, and blue for carbon, oxygen, and nitrogen, respectively). The expected HBs (according to the crystal structure, may not form in the simulation, e.g., at site2) are highlighted by dotted lines. To see this figure in color, go online. In contrast to site1, at the site2, we found that the expected HB between residues R138 and T29 of PAMc (CTT) was broken at ∼70 ns of the simulation, and the distance between them reached up to ∼12 Å (Fig. 2 B). The expected stacking between Q287 and T29 at binding site2 also became unstable after ∼100 ns as the distance between the centers of mass of residues Q287 and T29 reached above ∼6 Å (Fig. 2 D). The binding configurations of the active site1 and site2 at the end of the simulation are shown in Fig. 2, E and F, respectively. In the repetitive simulation runs as well as in the larger simulation box, site2 remains more or less destabilized (see Figs. S6 and S7). These observations indicate that the two active sites largely maintain asymmetrical binding patterns to PAMc, even though the crystal structure shows twofold symmetry and with two identical PAMc association with site1 and site2. The asymmetrical PAMc-binding dynamics of the two active sites of Cas1-Cas2 also implies that it can easily accommodate binding and recognition to psDNA containing only one PAMc considering that two PAMcs separated by ∼30 bp is a rare configuration to be found on psDNA. Below, we focus on the system that is commonly incurred, i.e., Cas1-Cas2 binding to psDNA with only one PAMc.

The site2-non-PAMc cannot be stabilized until after the PAMc cleavage conducted at the site1

Except for the original dual-PAMc psDNA binding complex of Cas1-Cas2, the following systems were all modified to site1-PAMc (CTT) and site2-non-PAMc (TTT), and we modeled the system from the psDNA binding to the precatalytic state and then to the half-site postcatalytic state (see Supporting materials and methods). Similarly, we checked the expected protein-DNA HBs and stacking characteristics as such for the stabilized PAMc binding and recognition. First, we found that the HB between residue R138 and T29 from non-PAMc (TTT) at site2 was absent in both the binding and precatalytic states during respective simulations (see Fig. 3, A and B). Second, the expected HB between residue K211 and T28 from TTT at site2 in the precatalytic state was also absent (Fig. 3 B). In addition, the stacking interaction between Q287 and T29 at site2 was also unstable, in particular at the binding state (see Fig. S8, A and B). In contrast, such expected HBs and stacking interactions were stably maintained at site1 bound with PAMc. Hence, the asymmetrical binding configuration of site1 and site2 does fit well with the PAMc and non-PAMc association, respectively.

Figure 3

The association patterns at the site1-PAMc and site2-non-PAMc in the equilibrium MD simulation of the Cas1-Cas2 psDNA complexes from the binding to precatalytic and to half-site postcatalytic state. Shown are the expected HBs with distances measured between residues Y165, R138, K211 at site1-PAMc (CTT; top), and similarly at site2-non-PAMc (TTT; bottom), in the binding state (A), the precatalytic state (B), and the half-site postcatalytic state with PAMc cleaved at site1 (C). Shown is the molecular view at the end of the equilibrium MD simulation for the site1-PAMc (top) and site2-non-PAMc (bottom), from the site1 binding (D) to precatalytic (E) and to half-site postcatalytic state (F), with site2 changing from nonstabilized (D and E) to stabilized (F). The coloring scheme is the same as Fig. 2, E and F. To see this figure in color, go online. Remarkably, for the half-site postcatalytic state, i.e., because the PAMc was cleaved artificially at site1 (by preparing and modeling the site1-PAMc cleaved for the corresponding simulation system), the site2 in association with non-PAMc soon became stabilized within the 200-ns equilibrium simulation, according to the same HBs and stacking characteristics expected for the stable binding configuration (see Fig. 3 C; Fig. S8 C). The molecular views toward the end of each equilibrium MD simulation, from the binding to the precatalytic state and to the postcatalytic state, for both the site1-PAMc and site2-non-PAMc are also provided (see Fig. 3, D–F). From the molecular views, one can directly see that the expected HB between residues R138 and T29 from TTT at site2, for example, which was absent in both the binding and precatalytic state equilibration simulation, soon became present in the simulation of the half-site postcatalytic state, i.e., upon the PAMc cleavage at site1. In the two sets of repetitive equilibrium MD simulation runs (2 × 3 simulations) for the site1-PAMc and site2-non-PAMc (TTT) systems from the psDNA binding to the precatalytic state and to the half-site postcatalytic state, consistent results show that the site2-non-PAMc cannot be stabilized until after the site1-PAMc cleavage, though detailed HB or stacking dynamics vary slightly in different simulation runs (see Figs. S9 and S10). These results consequently suggest that the two active sites can exhibit allosteric communication and consequently negative cooperativity with each other, and such cooperativity only allows one active site to be stabilized in close association with the psDNA for binding and recognition on one PAMc. As a result, site2 can bind stably with the non-PAMc only after the PAMc is cleaved at site1 as the close association between site1 and psDNA becomes lost. We thus infer that the cleavage to the PAMc at the corresponding binding and recognition site of Cas1-Cas2 can be necessary for the non-PAMc to be bound favorably by the other site and likely to be cleaved fast as well. Besides, we have also examined another two Cas1-Cas2 psDNA complex systems: 1) with site1-non-PAMc (TTT) and site2-PAMc (CTT), and 2) with both site1-non-PAMc (TTT) and site2-non-PAMc (TTT) (see Fig. S11). For the system 1) we found that site2 with PAMc became stabilized, whereas site1 with non-PAMc was unstable. For the system, 2) both site1 and site2 became highly unstable with the non-PAMc, e.g., the expected HBs between residues K211 and T28 and between residues R138 and T29 were absent.

The allosteric propagation from the site1-PAMc to the site2-non-PAMc upon the site1-PAMc cleavage

For each equilibrium simulation of the Cas1-Cas2-psDNA complex, from the binding to precatalytic and to half-site postcatalytic state (for PAMc-bound site1), the dynamics correlation within the protein-DNA complex was calculated (see Supporting materials and methods), in particular, between the active site1-PAMc and the rest part of the protein-DNA complex (see Fig. 4 A). Note that high self-correlation values (>30) show for Cas1a where the active site1 locates and for the PAMc regions in close association with the active site1. Residues 93–108, residues 200–220, and residues 275–295 in Cas1b around the active site1 also show high correlation values (∼20) with the active site1 due to local interactions. For nonlocal regions (Cas1a’ and Cas1b’, the rest part of DNA aside from PAMc, and Cas2 and Cas2’, labeled in Fig. 4 A), the site2-non-PAMc, located in Cas1a’, demonstrates an elevated correlation with the active site1 into the half-site postcatalytic state (i.e., up to 5–10; see Fig. 4 C); psDNA and Cas2 and Cas2’ also show slightly increased correlations with the site1 into the postcatalytic state (Fig. 4, B and D).

Figure 4

The internal correlation strength between the active site1 and the rest parts of the Cas1-Cas2 psDNA complex (site1-PAMc and site2-non-PAMc) during the equilibrium MD simulations. (A) The correlations between the site1 (PAMc) and the rest part, from the binding state (dark line) to precatalytic state (light blue) and to half-site postcatalytic state (orange). (B–D) The zoomed-in views of the correlation patterns between the active site1 and nonlocal residues from Cas2 and Cas2’ (B), Cas1a’ hosting site 2 (C), and psDNA hosting non-PAMc (D). To see this figure in color, go online. Indeed, in comparison with the binding and precatalytic state, the protein-DNA complex upon the site1-PAMc cleavage shows lowered self or local correlations with the site1-PAMc but enhanced nonlocal correlations with the site1 at remote regions, including Cas1a’ and Cas1b’ (hosting site2), psDNA (aside from PAMc), and Cas2 and Cas2’. That is, correlation between the two far-apart active site1 (at Cas1a and Cas1b) and site2 (Cas1a’ and Cas1b’) exists and strengthens upon the PAMc cleavage at site1, which indicate allostery, i.e., action at distance (39,40), is triggered. Therefore, it is inferred from the lowered local correlation and elevated nonlocal correlation with the active site1 (from binding to precatalytic and to postcatalytic state) that the allosteric signal propagates from the active site1 to the rest part of the protein-DNA complex, and such allosteric propagation is mediated by psDNA and Cas2 and Cas2’, reaching remotely to the site2-non-PAMc (∼10.3 nm away from site1), upon the site1-PAMc cleavage. Similarly, we performed two sets of repetitive 200-ns equilibrium simulations on the one-PAMc-binding, precatalytic, and half-site postcatalytic states of Cas1-Cas2 (site1-PAMc and site2-non-PAMc) and calculated the protein internal correlations between the active site1 (with PAMc) and the rest part of the protein-DNA (see Fig. S12). For further comparison, we also examined the protein internal correlations in the original psDNA binding complex of Cas1-Cas2 with dual PAMc, i.e., with site1 stabilized and site2 destabilized in the simulation. The correlations were calculated for both the active site1 and site2 to see how the two sites respectively couple with the rest of the part of the protein-DNA complex. Overall, the correlations in the dual-PAMc system appear larger than those in the single-PAMc system, and one finds that the remote correlations to the stabilized site1 and to the nonstabilized site2 are similarly large (see Fig. S13). Hence, allosteric propagation between the two active sites remote to each other on Cas1-Cas2 appear mutual or bi-directional.

Cas1-Cas2 in complex with psDNA of high acquisition efficiency (psH) shows the most prominent allosteric propagation upon the PAMc cleavage

According to synthetic approach to sequence-dependent protospacer acquisition of CRISPR-Cas1-Cas2, different psDNA sequences can lead to different acquisition frequencies or efficiencies (37). In particular, two representative protospacer sequences with particularly low and high acquisition efficiencies identified experimentally were incorporated into our simulation systems (psL and psH) so that we could examine whether allosteric propagation between two active sites persists in the Cas1-Cas2 complex with psL or psH in comparison with the original psDNA. Similarly, we performed a series of equilibrium simulations on the one-PAMc binding and recognition for both the psL and psH systems at binding, precatalytic, and half-site postcatalytic states of Cas1-Cas2 and calculated the protein internal correlations between the active site1 (with PAMc) or site2 (with non-PAMc) and the rest part of the protein-DNA. Comparing the three sets of systems with the original psDNA, psL, and psH, we found that the overall protein internal correlation is actually highest for the Cas1-Cas2 bound with the psH DNA sequences, particularly in the half-site postcatalytic state (see Fig. 5 for the site1 correlation in the left panel, and site2 correlation in the right panel) as the simulations were extended from 200 to 500 ns. The results thus suggest that the high acquisition efficiency of the psH system can be possibly explained by significant allosteric communication between the two active sites inside Cas1-Cas2. With a significant allosteric signal propagated from the site1-PAMc upon cleavage to the site2-non-PAMc in the psH system, the non-PAMc stabilization and followed cleavage are expected to happen fast so that the acquisition or integration efficiency can be high in the psH system. The results also support the idea that the allosteric propagation from the site1-PAMc to the site2-non-PAMc is regulated by the psDNA and Cas2 and Cas2’, which is in close association with the duplex region of the psDNA.

Figure 5

The protein internal correlation from extended equilibrium MD simulations (up to 500 ns) between the active site1/site2 and the rest parts of the Cas1-Cas2 psDNA complex, in the half-site postcatalytic state (site1-PAMc cleaved) with different psDNAs. The correlations between site1 and the rest of protein-DNA complex (left) and between site2 and the rest of the protein complex (right) are both shown for the psL psDNA (A), the original psDNA (B), and the psH psDNA (C), with the correlation measurements obtained from 0 to 100, 0–200, 0–300, 0–400, and 0–500 ns. To see this figure in color, go online. A color map of the correlation strength (blue, white, and red: high, medium, and low correlation, respectively) on Cas1-Cas2 and psDNA are provided to visualize the allosteric propagation patterns for all the different psDNA systems (original, psL, and psH) at the half-site postcatalytic state (see Fig. 6, top). It becomes quite clear that the psH system is comparatively highly correlated in general upon the first cleavage at PAMc, and the correlation or allosteric propagation proceeds largely via Cas2 and Cas2’ and dsDNA regions in the middle of complex. In contrast, psL appears to have the lowest correlation strength going through the psDNA or reaching to the site2-non-PAMc region. In the original psDNA system, noticeable correlation still shows between the site2 and site1, along with a medium level of psDNA correlation propagation compared with psH and psL. Additionally, we have also calculated the Cas2 and Cas2’-dsDNA association energetics for respective Cas1-Cas2 psDNA complexes (in the half-site postcatalytic state). Notably, one can see that electrostatic interactions between the Cas2 and Cas2’ protein and the duplex part of DNA are strongest in the psH system (see Fig. 6, bottom), which again supports the idea that psDNA contributes directly to the correlation or allosteric communication in the Cas1-Cas2 acquisition complex by associating closely with Cas2 and Cas2’.

Figure 6

The color maps of correlation strength between the active site1 (bound with PAMc) and the rest of the protein-DNA complex viewed on the structures with different psDNA sequences in the half-site postcatalytic state for the 500-ns equilibrium simulations. Shown is the correlation map on the Cas1-Cas2 structure and dual-fork psDNA (blue, white, and red: high, medium, and low correlation values, respectively) for the original psDNA sequences (A), psL (B), and psH (C). Note the correlation data are the same as that used in Fig. 5, measured from the equilibrium MD simulations. (D and E) The electrostatic and Van der Waals energies between Cas2 and Cas2’ and dsDNA for various psDNA complexes: original, psL, and psH (with average values −51 ± 6, −55 ± 6, and −72 ± 14 kcal/mol for the electrostatic energy and −5.9 ± 1.9, −5.8 ± 2.2, and −6.1 ± 2.5 kcal/mol for the Van der Waals energy). To see this figure in color, go online. We also examined internal correlations for psL and psH and the original psDNA systems in the initial psDNA binding and the precatalytic state (see Fig. S14). Indeed, Cas1-Cas2 in complex with psH produces comparatively low correlation remotely in the initial binding state, i.e., even lower than that in the psL or the original system. In the precatalytic state (see detailed interaction characteristic and active site views in Figs. S15–S17); however, the psH system correlation becomes slightly higher than that in the psL or the original DNA system. In the postcatalytic state or upon the site1-PAMc cleavage, the psH system does show quenched fluctuations or stabilization association at the site2-non-PAMc compared with the psL system (see Fig. S17). The results again suggest that the catalysis or cleavage in the site1-PAMc is necessary to activate the allosteric propagation. In addition, we conducted the principal component analysis on the psL, original, and psH psDNA systems (see Fig. S18), and the results show that the psH system demonstrates comparatively stable motions along the first principal component or PC1, which may support collective motion in the system or facilitate the allostery. In particular, under the PC1 motion, one can find significant loop motions around the site2-non-PAMc in the psL system; not much changes around the site2 but subdomain movements in the original psDNA system and medium level of local loop motions combined with some subdomain movements in the psH system (see Fig. S18 C). Last, we examined the HBs formed between Cas1-Cas2 and psDNA at the half-site postcatalytic state, comparing the systems with different psDNA sequences, i.e., the original, psL, and psH. In particular, we calculated HB occupancies for these systems. For HB occupancies >50% between Cas1-Cas2 and DNA during the simulation, the psH system shows much more stabilized HBs than those in the original and psL systems (see Fig. 7, left panel; Table S1). One notices that there are three protein-DNA binding zones in the original psDNA system that show more HBs formed than that in the psL system: zone I (Cas1a and Cas1b and Cas2’ with nts 1–5 on the top strand of psDNA), zone II’ (Cas1b’ and Cas2 and Cas2’ with nts 13–16 on the top strand), and zone II (Cas1b and Cas2 and Cas2’ with nts 13–16 on the bottom strand). Furthermore, the regions that show more HBs in the psH system than in the original system include an additional zone I’ (Cas1a’ and Cas1b’ and Cas2 with nt 1–5 on the bottom strand), whereas zone II interaction is also enhanced in the psH system than in the original or psL system. As a result, one can see that Cas2 and Cas2’ dominate all the essential protein-DNA HB interaction zones connecting the two active sites, with enhanced HB interactions from the psL to the original and to the psH system. Hence, Cas2 and Cas2’ appear critical for the involved allosteric propagation. The key residues at the Cas2 and Cas2’-DNA interface mediating long-range communication are also shown in the right panel of Fig. 7 for structural reviews of respective systems.

Figure 7

The schematics of the Cas1-Cas2 psDNA complex at the half-site postcatalytic state with high HB occupancies obtained from three simulation systems with different psDNA sequences (psL, original, and psH). (A–C) The HB occupancies are compared among the three simulation systems, and those HBs with occupancies >50% during the simulation time are labeled with black arrows, those <50% are labeled as well (in the left panel). Molecular views at Cas2/2’-DNA interface likely involved in the allosteric communications are shown in the right panel for the respective systems. Those contribute to increase HB associations (from psL to original and to psH) are the following: N10, R14, R16, and R77 from Cas2; N10 and R14 from Cas2’; D29 and R41 from Cas1b; and R138, R163, K211, R245, and R248 from Cas1a’ (see also Table S1). The important residues R77, R78, N10, and N16 for the Cas2 and Cas2’ for the Cas2 long-distance communication are also shown (in the right panel) for psL, original, and psH psDNA systems. To see this figure in color, go online. Among these HBs, R77, R78, R16, and R14 from Cas2 and Cas2’, which interact with DNA substantially in the psH or in the original system, have also been found crucial experimentally. Recent high-resolution structural studies show that the mutants of R77 and R78 by Ala residues can reduce spacer acquisition efficiency; in addition, no spacer acquisition was observed for R16A and R14A mutants (28,29). The HB patterns at the protein-DNA interface and the protein-DNA interaction energies together suggest that the correlation builds up between the two active sites, primarily via the Cas2-Cas2’-psDNA interaction region, and such allosteric communication becomes prominent in the psH system at the postcatalysis state once the PAMc gets cleaved at site1, and the system awaits site2 to be engaged with the non-PAMc.

Discussion

In this work, we performed multiple sets of atomistic simulations to investigate how CRISPR-Cas1-Cas2 binds, recognizes, and possibly conducts cleavages efficiently on the psDNA in the adaptation stage of CRISPR-Cas immunity process, assuming Cas1-Cas2 played a dominant role in the adaptation. To simulate Cas1-Cas2 consisting of Cas1a and Cas1b, Cas1a’ and Cas1b’, and Cas2 and Cas2’ in stable association with psDNA, i.e., to stabilize at least one active site of Cas1-Cas2a, a particularly large solvent box with a size ∼18 nm was utilized, which led to a large, solvated simulation system slightly more than ∼0.5 million atoms. A series of sub-microsecond equilibrium MD simulations were then performed systematically to individual systems 1) on the Cas1-Cas2 associating with psDNA of different PAMc and non-PAMc configurations at two active sites (dual PAMc tested with three water box sizes from ∼15 to 18 and to 20 nm), two single PAMc, and dual non-PAMc for binding), 2) on the site1-PAMc and site2-non-PAMc configuration, systems from PAMc binding to precatalytic and to a half-site postcatalytic state (i.e., three states with the original psDNA), and 3) with two synthetic psDNA of low and high acquisition efficiency tested (psL and psH, two additional three-state systems), in a total of >20 simulations for over 5 μs in aggregation. Interestingly, we found that even in association with two identical PAMc, the original twofold symmetric Cas1-Cas2 demonstrate an asymmetric binding pattern between two active sites: the active site1 residues form stabilized HBs and stacking interactions with PAMc from one fork region of the psDNA, whereas the active site2 cannot bind stably or specifically to PAMc on the symmetric fork region, and such results are consistently shown in repetitive simulation runs at two large water box sizes (18 and 20 nm; 2 × 3 simulation runs conducted). Therefore, it appears that only one active site of Cas1-Cas2 is capable of binding and recognition of PAMc at a time, which suits for locating only one PAMc on the psDNA for Cas1-Cas2 target search. It then calls into question on how Cas1-Cas2 possibly binds and cleaves the non-PAMc sufficiently fast at site2, e.g., to support formation of a half-site intermediate integration complex (30). In such a complex, psDNA has its non-PAMc side of 3′-OH overhang linking to the CRISPR locus as if it is resulted from the first nucleophilic attack, which is likely only if the non-PAMc is cleaved almost as fast as the PAMc by Cas1-Cas2, presumably in the absence of additional protein factors. A solution to have non-PAMc on the psDNA cleaved sufficiently fast by Cas1-Cas2 is suggested from our simulation studies of Cas1-Cas2 in complex with dual-forked psDNA, containing one PAMc and one non-PAMc at respective fork regions, i.e., in the site1-PAMc and site2-non-PAMc configuration. We found that even though that the active site2 associates only loosely with non-PAMc at beginning (in the Cas1-Cas2 initial binding state with psDNA), once site1 recognizes PAMc (past the precatalytic state) and cleaves on the PAMc (modeled as the half-site postcatalytic state), site2 immediately becomes stabilized with the non-PAMc to be ready for the second cleavage (3 × 3 repetitive simulation runs conducted). The mechanism is summarized in schematics in Fig. 8. The two active sites of Cas1-Cas2 can demonstrate negative cooperativity in a “seesaw” manner, with one site being capable of binding specifically on the PAMc for recognition and conducting the first catalytic cleavage at that site; the other site then catches up with stabilized DNA association to cleave nonspecifically immediately after the first site cleavage, taking advantage of the negative cooperativity. Such negative or seesawing type of cooperativity had been reported in other enzyme systems, for example, in the homo-dimeric insulin receptor, the membrane-spanning tyrosine kinase allowing insulin to dock into two binding pockets (41); in the Mo-bisPGD enzyme arsenate oxidase, which impacts on the early life metabolic reactions, with “the redox seesaw” cooperativity induced by the pyranopterin ligands (42); and in the “seesaw” model of enzyme regulation of mTORC1, in which to produce a nonlinear, ultrasensitive responses (43).

Figure 8

The schematics on the “seesaw” type of negative cooperativity identified between the two active sites remotely (separated by ∼10 nm) on the Cas1-Cas2 protospacer acquisition complex. The psDNA is represented by red and blue strands. Cas1 and Cas2 consist of dimeric Cas1a-Cas1b, Cas1a’-Cas1b’, and Cas2-Cas2’, which are colored differently and labeled. (A) The preferred binding to PAMc in the first active site of Cas1-Cas2 in psDNA binding and selection. (B) The consequent non-PAMc binding in the second active site of Cas1-Cas2 immediately after the PAMc cleavage at the first active site from (A). The Cas2 and Cas2’ interaction zones with the duplex psDNA are labeled (I and I’ and II and II’) where essential HBs formed. The interaction zones can play important roles for mediating allosteric propagation from the site1-PAMc to site2-non-PAMc. To see this figure in color, go online. The two active sites in the dimeric Cas1-Cas2 system are located symmetrically at two remote regions on the Cas1a-Cas1b and Cas1a’-Cas1b’, respectively, ∼10 nm apart. The ∼23-bp dsDNA region of the psDNA is bound primarily with Cas2 and Cas2’, in between Cas1a and Cas1b and Cas1a’ and Cas1b’ containing the two active sites. To find out how the communications between the two remote sites achieve in the Cas1-Cas2 psDNA acquisition complex, we analyzed dynamic correlations within the complex from the MD trajectories, focusing on how the rest of the part of the protein-DNA complex correlates with the active site1 that is bound with PAMc, from the psDNA binding at site1 to the precatalytic state, and to the half-way postcatalytic state. Indeed, one could identify nontrivial correlation between the site2-non-PAMc and site1-PAMc, and such correlation essentially increases into the half-site postcatalytic state right after the site1 cleavage on the PAMc. The comparative correlation analysis, therefore, suggests that the allosteric communication exists that propagates fluctuations at the active site1-PAMc cleavage as a mechanical signal to the remote site2-non-PAMc. Note that such allosteric communication is not triggered by an effector binding (44) but by catalytic cleavage (of PAMc) on the psDNA. More of the close examinations show that it is primarily the Cas2-Cas2’-psDNA association, i.e., via the protein-DNA electrostatic interactions, as well as the HB interactions formed at the protein-DNA interface (particularly at the Cas2 and Cas2’-dsDNA interaction zones), that are responsible for such allosteric propagation. In such critical Cas2 and Cas2’-dsDNA interaction zones, R14, R16, R77, and R78 residues seem to play important roles to facilitate the allosteric propagation, and the mutants of these arginine residues indeed reduce the protospacer acquisition efficiency significantly (28,29). Correspondingly, we propose that Cas2-Cas2’-dsDNA embedded in the middle of the Cas1-Cas2 dimeric complex mediates the two remotely located active site1 and site2, allosterically. With such mediation, negative cooperativity arises between the two remote sites, which allows the nonspecific non-PAMc cleavages to possibly take place quickly once a specific cleavage of PAMc by Cas1-Cas2 is conducted. To substantiate such an idea, we further constructed Cas1-Cas2 complexes containing two synthetic DNA protospacers, psL and psH, which were experimentally identified with low and high integration efficiency, respectively (37). Similar MD simulations and comparative correlation analyses were also conducted to the modified Cas1-Cas2 psDNA complexes containing psL and psH, and the correlation patterns between site1-PAMc and site2-nonPAMc were monitored in comparison with the original psDNA system. Notably, one finds that the intersite correlation or allosteric propagation becomes significantly enhanced in the psH system, also in the half-site postcatalytic state, upon the PAMc cleavage at site1. Such correlation or allosteric propagation remains low, however, in the psL system. Hence, such studies further support that allosteric propagation from the site1-PAMc to the site2-non-PAMc can possibly impact on the acquisition efficiency, i.e., by accelerating the non-PAMc cleavage that can be otherwise rate limiting. Such allosteric effect can be particularly significant for certain psDNA sequences that tightly associate with Cas2 and Cas2’. Besides, in the allosteric propagation of Cas1-Cas2 via psDNA, a dsDNA ∼23 bp (or 33 bp in full length) actually sets a ruler length for the protospacer acquisition. It is interesting to notice that such a length is right around an upper bound (25–30 bp) for DNA allostery to be effective to regulate binding cooperativity between a pair of proteins at neighboring locations on the DNA (45). Hence, such a psDNA length may indeed be optimized evolutionally to support certain cooperativity between the two active sites on Cas1-Cas2. Our work also suggests that Cas2-Cas2’ plays a key role in the allosteric regulation to support the psDNA acquisition besides its obvious structural role of bridging Cas1a-Cas1b and Cas1a’-Cas1b’ and in association with psDNA. Indeed, Cas2-Cas2’ appears highly stable in the Cas1-Cas2 psDNA association complex in simulation, whereas psDNA is comparatively flexible. Consistently, Cas2-Cas2’ and psDNA become more flexible into the half-site postcatalytic state than in the binding or precatalytic state. Such a role may also explain why Cas2 does not exhibit enzymatic activity in the acquisition process, even though it is enzymatically capable of making cleavages on single-stranded DNA or dsDNA (17,27). Notably, recent studies discovered that two Cas1 dimer alone can still form a mini-integrase that binds psDNA at a shorter length (∼18 bp) in the absence of Cas2 (46). Without Cas2, the two sites presumably come closer on the Cas1-psDNA complex (with a distance ∼6 nm, shortened by 40% as that in the original Cas1-Cas2 system). It would be interesting to find out whether the intersite communication and cooperativity still exist in such a mini-integrase and whether efficient protospacer acquisition is still a part of the ancestral Cas1 function before Cas2 adoption (46). On the other hand, it is important to realize that Cas1-Cas2 protospacer acquisition process can also be supported by Cas3 (6), Cas4 (9), Cas9, and Csn2 in various CRISRP systems (47). Recently, it is also found that DnaQ exonuclease can process the Cas1-Cas2-loaded prespacer precursors into mature prespacers of a suitable size for integration (10). Hence, our suggested scenario of the Cas1-Cas2-efficient cleavages on the protospacer for integration may be one potential mechanism to be employed in certain conditions. The PAM-specific DNA acquisition of Cas1-Cas2 has actually been implemented for designing molecular recording (37,48). By using the nt content, temporal ordering, and orientation of defined DNA sequences within a CRISPR array, Cas1-Cas2 seems to be able to encode arbitrary information within the genomes and has a potential to record and store DNA information for long period of time. It was in such efforts of integrating synthetical DNA sequences via Cas1-Cas2, the psDNA sequences with particularly low and high acquisition efficiency (psL and psH) were identified (37). To allow Cas1-Cas2 to identify different PAMs in the synthetical approach, many mutants of Cas1-Cas2 have also been generated in the lab directed evolution (37). The physical mechanisms revealed in this work can be further tested, for example, in such a variety of Cas1-Cas2 mutants. It is expected that by combining information from experimental synthetical approaches, computational work would reveal substantial physical mechanisms to enable further rational molecular functional redesign.

Author contributions

J.Y. conceived and designed this computational work. C.L. performed the computation. C.E. and L.-T.D. constructed an initial model. C.L., L.D., and J.Y. analyzed the data. J.Y. and C.L. wrote the article.

61 in total

1. CRISPR provides acquired resistance against viruses in prokaryotes.

Authors: Rodolphe Barrangou; Christophe Fremaux; Hélène Deveau; Melissa Richards; Patrick Boyaval; Sylvain Moineau; Dennis A Romero; Philippe Horvath
Journal: Science Date: 2007-03-23 Impact factor: 47.728

2. Selective loading and processing of prespacers for precise CRISPR adaptation.

Authors: Sungchul Kim; Luuk Loeff; Sabina Colombo; Slobodan Jergic; Stan J J Brouns; Chirlmin Joo
Journal: Nature Date: 2020-02-19 Impact factor: 49.962

3. A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair.

Authors: Mohan Babu; Natalia Beloglazova; Robert Flick; Chris Graham; Tatiana Skarina; Boguslaw Nocek; Alla Gagarinova; Oxana Pogoutse; Greg Brown; Andrew Binkowski; Sadhna Phanse; Andrzej Joachimiak; Eugene V Koonin; Alexei Savchenko; Andrew Emili; Jack Greenblatt; Aled M Edwards; Alexander F Yakunin
Journal: Mol Microbiol Date: 2010-12-07 Impact factor: 3.501

4. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus.

Authors: Philippe Horvath; Dennis A Romero; Anne-Claire Coûté-Monvoisin; Melissa Richards; Hélène Deveau; Sylvain Moineau; Patrick Boyaval; Christophe Fremaux; Rodolphe Barrangou
Journal: J Bacteriol Date: 2007-12-07 Impact factor: 3.490

5. Structural basis for DNase activity of a conserved protein implicated in CRISPR-mediated genome defense.

Authors: Blake Wiedenheft; Kaihong Zhou; Martin Jinek; Scott M Coyle; Wendy Ma; Jennifer A Doudna
Journal: Structure Date: 2009-06-10 Impact factor: 5.006

Review 6. Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants.

Authors: Kira S Makarova; Yuri I Wolf; Jaime Iranzo; Sergey A Shmakov; Omer S Alkhnbashi; Stan J J Brouns; Emmanuelle Charpentier; David Cheng; Daniel H Haft; Philippe Horvath; Sylvain Moineau; Francisco J M Mojica; David Scott; Shiraz A Shah; Virginijus Siksnys; Michael P Terns; Česlovas Venclovas; Malcolm F White; Alexander F Yakunin; Winston Yan; Feng Zhang; Roger A Garrett; Rolf Backofen; John van der Oost; Rodolphe Barrangou; Eugene V Koonin
Journal: Nat Rev Microbiol Date: 2019-12-19 Impact factor: 60.633