Clarisse G Ricci1, Janice S Chen2, Yinglong Miao3, Martin Jinek4, Jennifer A Doudna2,2,2,2,2, J Andrew McCammon1,1,1, Giulia Palermo5. 1. Department of Pharmacology, Department of Chemistry and Biochemistry, and National Biomedical Computation Resource, University of California San Diego, La Jolla, California 92093, United States. 2. Department of Molecular and Cell Biology, Department of Chemistry, Howard Hughes Medical Institute, Innovative Genomics Institute, and Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, University of California Berkeley, Berkeley, California 94720, United States. 3. Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047, United States. 4. Department of Biochemistry, University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland. 5. Department of Bioengineering, Bourns College of Engineering, University of California Riverside, 900 University Avenue, Riverside, California 92521, United States.
Abstract
CRISPR-Cas9 is the state-of-the-art technology for editing and manipulating nucleic acids. However, the occurrence of off-target mutations can limit its applicability. Here, all-atom enhanced molecular dynamics (MD) simulations-using Gaussian accelerated MD (GaMD)-are used to decipher the mechanism of off-target binding at the molecular level. GaMD reveals that base pair mismatches in the target DNA at distal sites with respect to the protospacer adjacent motif (PAM) can induce an extended opening of the RNA:DNA heteroduplex, which leads to newly formed interactions between the unwound DNA and the L2 loop of the catalytic HNH domain. These conserved interactions constitute a "lock" effectively decreasing the conformational freedom of the HNH domain and hampering its activation for cleavage. Remarkably, depending on their positions at PAM distal sites, DNA mismatches responsible for off-target cleavages are unable to "lock" the HNH domain, thereby leading to the unselective cleavage of DNA sequences. In consistency with the available experimental data, the ability to "lock" the catalytic HNH domain in an inactive "conformational checkpoint" is shown to be a key determinant in the onset of off-target effects. This mechanistic rationale contributes in clarifying a long lasting open issue in the CRISPR-Cas9 function and poses the foundation for designing novel and more specific Cas9 variants, which could be obtained by magnifying the "locking" interactions between HNH and the target DNA in the presence of any incorrect off-target sequence, thus preventing undesired cleavages.
CRISPR-Cas9 is the state-of-the-art technology for editing and manipulating nucleic acids. However, the occurrence of off-target mutations can limit its applicability. Here, all-atom enhanced molecular dynamics (MD) simulations-using Gaussian accelerated MD (GaMD)-are used to decipher the mechanism of off-target binding at the molecular level. GaMD reveals that base pair mismatches in the target DNA at distal sites with respect to the protospacer adjacent motif (PAM) can induce an extended opening of the RNA:DNA heteroduplex, which leads to newly formed interactions between the unwound DNA and the L2 loop of the catalytic HNH domain. These conserved interactions constitute a "lock" effectively decreasing the conformational freedom of the HNH domain and hampering its activation for cleavage. Remarkably, depending on their positions at PAM distal sites, DNA mismatches responsible for off-target cleavages are unable to "lock" the HNH domain, thereby leading to the unselective cleavage of DNA sequences. In consistency with the available experimental data, the ability to "lock" the catalytic HNH domain in an inactive "conformational checkpoint" is shown to be a key determinant in the onset of off-target effects. This mechanistic rationale contributes in clarifying a long lasting open issue in the CRISPR-Cas9 function and poses the foundation for designing novel and more specific Cas9 variants, which could be obtained by magnifying the "locking" interactions between HNH and the target DNA in the presence of any incorrect off-target sequence, thus preventing undesired cleavages.
CRISPR-Cas9 (CRISPR,
clustered regularly interspaced short palindromic
repeats) is an adaptive immune system found in bacteria and archaea
conferring protection from foreign DNA.[1] By enabling deletion, insertion, or correction of DNA at specific
targeted sites within an organism’s genome, CRISPR-Cas9 is
used as a genome editing technology holding enormous promises for
medical, pharmaceutical, and (bio)technological applications, while
also being of invaluable impact for fundamental research.[1,2] The CRISPR-Cas9 technology is based on a single protein—the
endonuclease Cas9—programmed with single guide RNAs (sgRNAs)
to site-specifically target any desired DNA sequence. The presence
of a short sequence (i.e., a protospacer adjacent motif, PAM) in close
proximity to the cleavage site enables recognition of the desired
DNA sequence across the genome and allows programmable applications.
Upon PAM recognition, the DNA binds Cas9 by base pairing the RNA guide
with one strand (the target strand, TS) and forming an RNA:DNA hybrid,
while the nontarget DNA strand (NTS) is unwound and also accommodated
within the protein complex. Structures of the CRISPR-Cas9 complex
indicate a recognition lobe, which mediates the binding of the nucleic
acids through three REC1–3 regions, flanked by a PAM interacting
(PI) domain and two nuclease domains, HNH and RuvC, which cleave the
TS and NTS, respectively (Figure A).
Figure 1
(A) Crystal structure of the S. pyogenes CRISPR-Cas9
system, including the endonuclease Cas9, a guide RNA (orange), the
target DNA (TS, cyan), and nontarget DNA (NTS, violet) strands.[26] Cas9 is shown in molecular surface, with protein
domains in different colors. The X-ray structure captures the inactive
state of the HNH domain, which is a “conformational checkpoint”
between DNA binding and cleavage. The right panel highlights the PAM
distal sites on the RNA:DNA hybrid and the conformational change of
the HNH domain required for catalysis, which is shown with an arrow,
indicating the movement of catalytic H840 toward the cleavage site
on the TS. (B) Diagram of the DNA and RNA filaments in CRISPR-Cas9,
showing the location of base pair mismatches (mm) associated with
off-target effects, at PAM distal sites. The model systems considered
for Gaussian accelerated MD (GaMD) simulations include the on-target
DNA sequence (on-target) and DNA sequences containing base pair mismatches
at PAM distal sites (i.e., mm@20, mm@19–20, mm@18–20,
and mm@17–20).
(A) Crystal structure of the S. pyogenes CRISPR-Cas9
system, including the endonuclease Cas9, a guide RNA (orange), the
target DNA (TS, cyan), and nontarget DNA (NTS, violet) strands.[26] Cas9 is shown in molecular surface, with protein
domains in different colors. The X-ray structure captures the inactive
state of the HNH domain, which is a “conformational checkpoint”
between DNA binding and cleavage. The right panel highlights the PAM
distal sites on the RNA:DNA hybrid and the conformational change of
the HNH domain required for catalysis, which is shown with an arrow,
indicating the movement of catalytic H840 toward the cleavage site
on the TS. (B) Diagram of the DNA and RNA filaments in CRISPR-Cas9,
showing the location of base pair mismatches (mm) associated with
off-target effects, at PAM distal sites. The model systems considered
for Gaussian accelerated MD (GaMD) simulations include the on-target
DNA sequence (on-target) and DNA sequences containing base pair mismatches
at PAM distal sites (i.e., mm@20, mm@19–20, mm@18–20,
and mm@17–20).In spite of the remarkable advantages of the CRISPR-based
genome
editing technology with respect to traditional therapies, safety and
efficacy issues have to be fully addressed prior to clinical applications.
The most severe issues limiting the applicability of CRISPR-Cas9 in vivo are the off-target DNA cleavages, which produce
mutations at sites in the genome other than the desired target site,
causing unwanted phenotypes.[3]In
this respect, a promising strategy to fight off-target cleavages
is the molecular engineering of highly specific Cas9 proteins,[4−6] which would enable safer and easy-to-use applications of the CRISPR
technology on a genome-wide scale.[7] However,
rational design of CRISPR-Cas9 requires detailed knowledge of the
molecular bases underlying off-target effects, which are not yet understood.[8,9] At the molecular level, off-target effects are the unselective cleavages
of DNA sequences that do not fully match the guide RNA, bearing base
pair mismatches within the DNA:RNA hybrid (Figure ). Extensive biophysical experiments have
been performed to understand the molecular basis of off-target binding.[5,10−13] Kinetic and single molecule (sm) Förster resonance energy
transfer (FRET) experiments have shown that the occurrence of off-target
cleavages is directly related to the conformational state adopted
by the catalytic HNH domain.[5,10] Upon DNA binding, HNH
undergoes a conformational change from an inactive state, in which
the catalytic H840 is far away from the cleavage site on the TS, to
an activated state that is prone for catalysis (Figure A). The inactive state of the HNH domain
has been identified as a “conformational checkpoint”
between DNA binding and cleavage, in which the RNA:DNA complementarity
is recognized before the HNH domain assumes an activated conformation.[10] sm FRET experiments have shown that the presence
of DNA mismatches at the PAM distal ends of the DNA:RNA hybrid can
trap the HNH domain in the “conformational checkpoint”
state, whose population increases by augmenting the number of mismatches
at PAM distal sites. Early experimental characterizations also revealed
that the presence of base pair mismatches at PAM distal ends leads
to the formation of a stable CRISPR-Cas9 complex.[13] In this scenario, however, it is unknown how the presence
of DNA mismatches at these sites can favor the inactivation of HNH.
Detailed molecular knowledge of this mechanism is of major importance
for developing more specific Cas9 variants, in which a single base
pair mismatch is sufficient for trapping HNH in the inactive state,
thus preventing the cleavage of any incorrect DNA sequence.Here, we make use of extensive molecular dynamics (MD) simulations
to characterize at the atomic level the molecular determinants of
off-target binding, providing critical insights on the mechanism of
off-target effects in CRISPR-Cas9. MD simulations have been shown
to be a valuable tool for understanding the molecular basis of the
CRISPR-Cas9 function.[14−18] Among other studies, our group has successfully applied MD simulations
to disclose a mechanism for the conformational activation of Cas9,
clarifying the activation process by which the HNH domain is repositioned
for cleavage, in good agreement with structural and sm FRET experiments.[19] Remarkably, MD simulations as well as experimental
studies have indicated that the conformational changes underlying
the CRISPR-Cas9 function occur over and beyond micro-to-millisecond
time scales.[15,20] Such long time scales require
the use of computational methods that enable enhanced sampling of
the configurational space, such as accelerated MD (aMD) simulations.[21] The aMD method adds a boost potential to certain
regions of the potential energy surface, effectively decreasing the
energy barriers and thus accelerating transitions between low-energy
states. In this way, aMD simulations can capture biological processes
occurring over milliseconds (and in some cases beyond), allowing the
study of complex conformational transitions in folded or unstructured
systems.[22] Recent advances have led to
the development of a robust aMD methodology—i.e., Gaussian
accelerated MD (GaMD)[23]—that extends
the use of aMD to larger and more complex biological systems. Besides
enabling the description of the activation mechanism of the Cas9 protein,[19] GaMD simulations aided to the disclosure of
a catalytically active CRISPR-Cas9 complex.[14] As well, GaMD has been used to determine ligand binding in G-protein-coupled
receptors[24] and, remarkably, the mechanism
of a G-protein mimetic nanobody binding of a medically important GPCR
with intracellular signaling proteins.[25]By using GaMD we investigate the mechanistic basis of binding
of
off-target sequences at PAM distal sites. GaMD simulations reveal
that the presence of base pair mismatches at specific PAM distal sites
of the RNA:DNA hybrid can reduce the conformational mobility of the
HNH domain. Indeed, depending on the positions of base pair mismatches
at PAM distal sites, newly formed interactions are shown to “lock”
the catalytic HNH domain in an inactive “conformational checkpoint”,
preventing its activation toward DNA cleavages. In consistency with
the available experimental data, the ability to “lock”
the catalytic HNH domain in the “conformational checkpoint”
is shown to be a key determinant in the onset of off-target effects.
Overall, this study characterizes at the atomic level the molecular
features of off-target binding in CRISPR-Cas9. This information poses
the foundations for future biophysical investigations and structure-based
design of the system toward improved genome editing.
Results
To determine how off-target sequences affect the conformational
activation of the HNH domain, we performed all-atom GaMD simulations
of the Cas9 protein in the “conformational checkpoint”
state of the HNH domain (i.e., PDB 4UN3).[26] GaMD simulations
have already been used to successfully describe the activation process
of the HNH domain[19] and are therefore ideal
to characterize the effects caused by base pair mismatches in the
conformational dynamics of CRISPR-Cas9 and its HNH domain. For this
purpose, the CRISPR-Cas9 complex has been simulated in complex with
the fully matched RNA:DNA hybrid (considered the reference on-target
system) and in the presence of base pair mismatches at the PAM distal
ends of the RNA:DNA hybrid (Figure B). Specifically, we introduced 1–4 mismatches
(mm) at PAM distal sites (i.e., at positions 20–16 of the RNA:DNA
hybrid), resulting in the following models: mm@20, mm@19–20,
mm@18–20, mm@17–20. These systems are consistent with
experimental models for which the DNA cleavage rates have been measured.[5] Indeed, experimental characterization has shown
that the mm@20, mm@19–20, and mm@18–20 systems cleave
their DNA substrates with rates similar to the on-target system and
thus can be considered “productive” for DNA cleavage.
Contrariwise, the mm@17–20 system cleaves the DNA substrates
at a significant slower rate and is “unproductive” for
DNA cleavage. Each GaMD run has been carried out for >1 μs,
with simulation conditions well-suited for protein/nucleic acid complexes[27] and acceleration parameters that allow for sufficiently
broad exploration of the conformational space and consistent observation
of the molecular consequences of off-target DNA binding.
DNA:RNA Conformational
Dynamics in the Presence of Mismatches
During GaMD simulations,
we observe an opening of the RNA:DNA hybrid
at PAM distal sites with disruption of the Watson–Crick base
pairing in the systems including off-target DNA sequences (Figure ). Contrariwise,
in the system containing the on-target DNA, the DNA:RNA hybrid stably
maintains its Watson–Crick base pairing, in good agreement
with previous conventional and aMD simulations of CRISPR-Cas9.[14−19] Visual inspection of the trajectories reveals that all systems containing
base pair mismatches display ill-behaved base pairs at the very end
of the DNA-RNA hybrid (positions 20–18), whereas the on-target
system remains well-behaved throughout the entire DNA:RNA hybrid (Figure ). The difference
between the “productive” systems (i.e., mm@20, mm@19–20,
and mm@18–20, which include one to three base pair mismatches
at the RNA:DNA ends) and the “unproductive” system (i.e.,
mm@17–20, with four mismatches) lies at position 17 of the
RNA:DNA hybrid, where a remarkable loss of the Watson–Crick
base pairing is observed in the “unproductive” system
(Figure , right panel).
With the aim of exploring more in-depth the effect of base pair mismatches
around position 17, as well as to confirm the outcomes of the mm@17–20
system, a further replica of the system has been simulated including
base pair mismatches at positions 16–17. As a result, this
simulation confirms the remarkable loss of base pairing at position
17 (Figure S1). At position 15, all systems
keep their Watson–Crick base pairing intact, regardless of
mismatches (Figure and Figure S1). It is important to note
that transient openings at the end of a DNA duplex are not unusual
in long time scale MD simulations, as well as that base flipping can
occur in well-matched DNAs, as shown by independent research groups
including ours.[28−32] However, in the simulations of the on-target CRISPR-Cas9 system,
the RNA:DNA hybrid maintains the Watson–Crick base pairing,
most likely stabilized by the protein framework, as previously observed
in conventional and aMD simulations of this system.[14−19] Clearly, the opening of the RNA:DNA hybrid at PAM distal sites only
occurs in the presence of base pair mismatches.
Figure 2
Conformations adopted
by the RNA:DNA hybrid along GaMD simulations.
Representative snapshots extracted from GaMD simulations of the CRISPR-Cas9
system, including the on-target DNA (A) and base pair mismatches (mm)
at different positions of the hybrid: mm@20 (B), mm@19–20 (C),
mm@18–20 (D), and mm@17–20 (E). “Productive”
systems, which efficiently cleave their DNA substrate at rates similar
to the on-target Cas9, are highlighted using cool colors (black for
the “on-target” CRISPR-Cas9 and blue for the “off-target”
systems), whereas the “unproductive” mm@17–20
system, which slowly cleaves the DNA substrate, is highlighted in
red (warm color).[5] The RNA (orange) and
the target DNA (TS, cyan) are shown as ribbons. Mismatched bases on
the TS are highlighted in magenta. The protein environment is shown
as a molecular surface. These configurations are representative of
the conformational changes detailed in Figures S3–S5 and in Figure .
Conformations adopted
by the RNA:DNA hybrid along GaMD simulations.
Representative snapshots extracted from GaMD simulations of the CRISPR-Cas9
system, including the on-target DNA (A) and base pair mismatches (mm)
at different positions of the hybrid: mm@20 (B), mm@19–20 (C),
mm@18–20 (D), and mm@17–20 (E). “Productive”
systems, which efficiently cleave their DNA substrate at rates similar
to the on-target Cas9, are highlighted using cool colors (black for
the “on-target” CRISPR-Cas9 and blue for the “off-target”
systems), whereas the “unproductive” mm@17–20
system, which slowly cleaves the DNA substrate, is highlighted in
red (warm color).[5] The RNA (orange) and
the target DNA (TS, cyan) are shown as ribbons. Mismatched bases on
the TS are highlighted in magenta. The protein environment is shown
as a molecular surface. These configurations are representative of
the conformational changes detailed in Figures S3–S5 and in Figure .
Figure 3
RNA:DNA geometrical properties along GaMD
simulations. (A) The
RNA:DNA minor groove width has been computed in the PAM distal region
at six different levels (i–vi, from positions 20 to 16 of the
TS), which are schematically shown on the 3D structure of the RNA:DNA
hybrid. The probability distributions of the RNA:DNA minor groove
width at the iii–v levels are shown for the on-target CRISPR-Cas9
and for the systems including base pair mismatches at different positions
of the hybrid (i.e., mm@20, mm@19–20, mm@18–20, and
mm@17–20). A vertical bar indicates the experimental minor
groove width (i.e., ∼11 Å from X-ray crystallography,
enlarged by ∼1 Å if NMR data are considered).[29] Full data are reported in Figure S2. (B) Scatter plots of the geometrical base pair
descriptors, computed at position 17 of the RNA:DNA hybrid for all
studied systems (full data are in Figures S3–S5). Translational (i.e., shear, stretch, and stagger) and angular
(i.e., buckle, propeller, and opening) descriptors are expressed in
Å and degrees, respectively. The RNA:DNA hybrid is shown on the
right for the on-target CRISPR-Cas9 (gray) superposed to the mm@17–20
(red) system.
To better estimate the extent
and the precise location of the nucleic
acid distortions promoted by mismatches, we performed an in-depth
analysis of the minor groove width at the PAM distal region of the
DNA:RNA hybrid. In detail, the minor groove width has been computed
at six different levels (i–vi, from positions 20 to 16 of the
TS, Figure A), perpendicularly to the global helical axis (full
details are in the Materials and Methods section).
As an effect of the instability promoted by the loss of base pairing,
we detect an increase of the minor groove width at PAM distal ends
of the RNA:DNA hybrid in the systems containing off-target DNA sequences
(Figure S2). Interestingly, while “productive”
systems promote the minor groove widening only at the very end of
the hybrid (i.e., levels i and ii, Figure S2), the “unproductive” system mm@17–20 shows
a remarkable increase of the minor groove at more upstream regions
of the hybrid, as shown by a shift of the probability distribution
of the minor groove width toward larger values at the levels iii–v
(Figure A). To complement
the conformational analysis of the hybrid, the geometrical descriptors
defining the base pair complementarity (shear, propeller, stagger,
buckle, stretch, opening) have been computed. At position 17 (Figure B), the broadly distributed
scatter plots produced by the “unproductive” system
mm@17–20 are indicative of significant loss of base pairing.
Contrariwise, systems that are “productive” for DNA
cleavage (on-target, mm@20, mm@19–20, and mm@18–20)
display confined distributions of scattered dots at position 17, as
consistent with well-matched base pairs. Figures S3–S5 report data for all simulated systems computed
at positions 20–15, showing that the “productive”
systems lose the complementarity only at the very end of the hybrid
(i.e., positions 20–18), while the “unproductive”
mm@17–20, as well as the simulation replica including base
pair mismatches at positions 16–17, also shows remarkable loss
of the complementarity at upstream positions. Indeed, in these systems,
the opening and distortion of the RNA:DNA hybrid includes position
17 and reaches position 16 (Figures S3–S5). At position 15 all systems keep their Watson–Crick base
pairing intact, consistently with the visual inspection of the trajectory
(Figure and Figure S1).RNA:DNA geometrical properties along GaMD
simulations. (A) The
RNA:DNA minor groove width has been computed in the PAM distal region
at six different levels (i–vi, from positions 20 to 16 of the
TS), which are schematically shown on the 3D structure of the RNA:DNA
hybrid. The probability distributions of the RNA:DNA minor groove
width at the iii–v levels are shown for the on-target CRISPR-Cas9
and for the systems including base pair mismatches at different positions
of the hybrid (i.e., mm@20, mm@19–20, mm@18–20, and
mm@17–20). A vertical bar indicates the experimental minor
groove width (i.e., ∼11 Å from X-ray crystallography,
enlarged by ∼1 Å if NMR data are considered).[29] Full data are reported in Figure S2. (B) Scatter plots of the geometrical base pair
descriptors, computed at position 17 of the RNA:DNA hybrid for all
studied systems (full data are in Figures S3–S5). Translational (i.e., shear, stretch, and stagger) and angular
(i.e., buckle, propeller, and opening) descriptors are expressed in
Å and degrees, respectively. The RNA:DNA hybrid is shown on the
right for the on-target CRISPR-Cas9 (gray) superposed to the mm@17–20
(red) system.Taken together, the base
pair geometrical descriptors well agree
with the analysis of the minor groove widths, indicating for the “unproductive”
mm@17–20 system a significant minor groove widening at levels
iii–v (Figure A) and the loss of base pairing at position 17 (Figure B), which is not observed in
the “productive” systems. This pinpoints that the conformational
dynamics of the RNA:DNA hybrid is affected by the presence of base
pair mismatches at PAM distal ends, and that the extent and location
of the distortions in the hybrid are related to the DNA cleavage activity.
Indeed, by taking together the outcomes from the here presented GaMD
simulations and the available experimental DNA cleavage assays,[5] mismatches that perturb the hybrid downstream
of position 18 are “productive” for DNA cleavage, while
mismatches whose distortions occur (or are propagated) up to position
17 are “unproductive” for DNA cleavage, as is the case
of mismatches mm@17–20 (Figure ).[5]
Effect of off-Target Binding
on the Catalytic Domains
The observed opening of the RNA:DNA
hybrid causes novel interactions
with the protein framework. Here, we detail the interactions established
by the RNA:DNA hybrid with the catalytic domains HNH and RuvC. Specifically,
we measure the tight contacts (i.e., within 4 Å radius) established
along the dynamics by the hybrid and the neighboring residues.[33] As a result, we detect a remarkable increase
of interactions between the hybrid and the HNH domain in the “unproductive”
system (mm@17–20), which is particularly relevant with the
polar and charged residues (Figure and Figure S6). Remarkably,
the relevance of these newly formed contacts is confirmed by the statistical
error nonoverlapping with the “productive” systems.
An increase of interactions for the “unproductive” system
is also observed with the RuvC domain. Importantly, the HNH domain
connects RuvC through two flexible loops: L1 (residues 765–780)
and L2 (residues 902–918), which are part of HNH and are important
in the function of CRISPR-Cas9 (Figure A, right panel). Indeed, L1/L2 intervene in the repositioning
of HNH from the inactivated (i.e., “conformational checkpoint”)
to the activated state, by changing configuration.[19,34] As well, L1/L2 exert an allosteric control on the activity of HNH
and RuvC, enabling the information transfer for concerted cleavages
of the two DNA strands.[17,35] To understand the role
of these two loops in the interaction with the hybrid, we specifically
measured the interactions established by L1/L2 and the DNA:RNA hybrid.
As a result, the interactions of L1 do not show relevant differences
among the investigated systems (Figure S6B). Indeed, L1 is a disordered loop that is highly flexible, resulting
in overlapping error bars. Contrariwise, the interactions established
by the RNA:DNA hybrid and L2 are remarkably increased in the “unproductive”
mm@17–20 system (Figure and Figure S6B). As noted above,
the increase in interactions involving L2 in these systems has nonoverlapping
error bars with the “productive” systems and is particularly
relevant for polar and charged residues.
Figure 4
Quantitative evaluation
of the interactions between the RNA:DNA
hybrid and the Cas9 protein. The number of tight contacts (i.e., within
4 Å radius) established along the dynamics by the RNA:DNA hybrid
with the neighboring residues of the HNH and RuvC domains, as well
as with the L2 loop connecting HNH to RuvC at PAM distal ends, has
been computed for the simulated systems (i.e., on-target, mm@20, mm@19–20,
mm@18–20, and mm@17–20). The polar, apolar, and charged
groups of residues have been considered. A cartoon of the mm@17–20
system, highlighting the RNA:DNA hybrid and its interactions with
HNH and RuvC, as well as with the L1/L2 loops, is shown (right panel).
Quantitative evaluation
of the interactions between the RNA:DNA
hybrid and the Cas9 protein. The number of tight contacts (i.e., within
4 Å radius) established along the dynamics by the RNA:DNA hybrid
with the neighboring residues of the HNH and RuvC domains, as well
as with the L2 loop connecting HNH to RuvC at PAM distal ends, has
been computed for the simulated systems (i.e., on-target, mm@20, mm@19–20,
mm@18–20, and mm@17–20). The polar, apolar, and charged
groups of residues have been considered. A cartoon of the mm@17–20
system, highlighting the RNA:DNA hybrid and its interactions with
HNH and RuvC, as well as with the L1/L2 loops, is shown (right panel).Analysis of the interactions between
the hybrid and the HNH domain
excluding the residues belonging to L2 corroborates that the increase
of interactions between HNH and the hybrid occurs at the level of
L2—practically vanishing when L2 is excluded from the analysis—and
mainly involves polar and charged residues (Figure S6B). A more detailed inspection of the trajectories reveals
specific interactions that are formed in the “unproductive”
mm@17–20 system and could be key for explaining the effect
of base pair mismatches on the capability of Cas9 to cleave off-target
sequences.[5] We detect the formation of
a salt bridge between R904 of L2 and the TS backbone at position 17
(Figure ; Movie S1). This interaction with the TS backbone
is also formed along the simulation of the system including base pair
mismatches at positions 16–17, whereas it does not form in
the on-target Cas9 or in the “productive” off-target
systems (Figure S7). In the “unproductive”
mm@17–20 system, additional interactions are also established
upon ∼0.45 μs between the D911 and S908 residues of L2
and the base 18 of the TS, which is flipped out of the hybrid (Figure ). Remarkably, the
interaction with the TS backbone is conserved along the dynamics of
the “unproductive” system and also observed in the presence
of base pair mismatches at positions 16–17 (Figure S7), as it is favored by the local unwinding of the
DNA:RNA hybrid at position 17, which is in turn promoted by the presence
of base pair mismatches at this level (Figures and 3). Contrariwise,
when the hybrid is well-formed at this level (i.e., position 17),
as in the case of the “productive” mismatches and for
the on-target DNA, the TS backbone is not free to approach and bind
the L2 loop.
Figure 5
Locking interactions between the DNA target strand (TS)
and the
L2 loop of the HNH domain, which decrease the HNH conformational flexibility
in the presence of 4 base pair mismatches at PAM distal sites. (A)
Representative snapshot of the mm@17–20 system, showing the
interactions established by the RNA:DNA hybrid with the residues of
the L2 loop. (B) The time evolution of the interactions established
by R904, D911, and S908 and the TS at positions 17 (R904) and 18 (D911
and S908) is reported. Data for the on-target Cas9 (black) are compared
to the mm@17–20 system (red).
Locking interactions between the DNA target strand (TS)
and the
L2 loop of the HNH domain, which decrease the HNH conformational flexibility
in the presence of 4 base pair mismatches at PAM distal sites. (A)
Representative snapshot of the mm@17–20 system, showing the
interactions established by the RNA:DNA hybrid with the residues of
the L2 loop. (B) The time evolution of the interactions established
by R904, D911, and S908 and the TS at positions 17 (R904) and 18 (D911
and S908) is reported. Data for the on-target Cas9 (black) are compared
to the mm@17–20 system (red).When formed, these interactions constitute a “lock”
for the L2 loop, decreasing its conformational flexibility. This is
confirmed by a lower root-mean-square deviation (RMSD) of the L2 loop
in the “unproductive” mm@17–20 system as compared
to the “productive” systems (Figure S8). Considering that the activation of the HNH domain requires
a substantial conformational rearrangement of L2,[19,34,35] which moves away from the PAM distal region
of the hybrid, the newly formed interactions observed in the “unproductive”
system can prevent HNH from undergoing its conformational activation.
This mechanism is consistent with several experimental studies showing
that the presence of base pair mismatches up to positions 17 trap
HNH in a “conformational checkpoint” state, preventing
it from reaching the active conformation.[5,10,36] On the contrary, the presence of up to 3
mismatches at positions 20–18 still allows HNH to correctly
reposition for cleavage, although with slightly slower rates. Remarkably,
this experimental evidence is consistent with the outcomes of GaMD,
showing that 3 mismatches at positions 20–18 are unable to
“lock” the HNH domain (Figure S7). In light of this experimental evidence, our simulations reveal
the interactions—formed between a locally unwound TS and the
L2 loop—by which specific base pair mismatches prevent HNH
activation for cleavage.
Effect of Off-Target Binding on the Hybrid
Recognition
The formation of the RNA:DNA hybrid has been
shown to be a prerequisite
for the on-target selectivity.[5] Single
molecule and bulk experiments have shown that the REC lobe of CRISPR-Cas9
exerts a key role in the recognition. Specifically, the REC3 region,
which directly contacts the very end of the hybrid, would “sense”
the formation of the DNA:RNA hybrid and allow for HNH nuclease activation.[5,15]To understand the role of REC3 in the hybrid recognition and
in the on-target selectivity, we monitored the interactions established
by the RNA:DNA hybrid with REC3 in the simulated systems. In the “unproductive”
mm@17–20 system, we observe a significant increase of interactions
between the hybrid and REC3 (involving polar, apolar, and charged
residues; Figure A
and Figure S9). In this system, due to
the extended opening of the RNA:DNA hybrid, the 692–700 α-helix
inserts within the heteroduplex, causing novel electrostatic interactions
and steric contacts mediated by apolar residues (Figure B). On the contrary, in the
on-target Cas9, as well as in the “productive” systems
including off-target sequences, the 692–700 α-helix does
not insert within the hybrid (Figure C). The 692–700 α-helix includes a set
of key residues (N692, M694, Q695, and H698), whose mutation to alanine
confers increased selectivity.[5] Indeed,
the hyper accurate Cas9 (HypaCas9) variant, including the N692A, M694A,
Q695A, and H698A mutations, cleaves the on-target DNA with rates similar
to the wt-Cas9, but the cleavage is reduced in the presence of mismatches.
The specific interactions of these residues have been monitored during
GaMD. As a result, in the on-target Cas9, N692 and Q695 bind the TS
backbone throughout the dynamics, while M694 and H698 establish additional
interactions (Figure S10), contributing
also to the stabilization of the fully matched RNA:DNA hybrid (Figure ). Remarkably, the
interactions established by N692 and Q695 with the TS are also observed
in the “productive” off-target systems, but are lost
in the “unproductive” mm@17–20 system, as also
shown by including base pair mismatches at positions 16–17
(Figure S11). Indeed, in the presence of
“unproductive” mismatches, the 692–700 α-helix
inserts within the heteroduplex, contributing in the opening of the
RNA:DNA hybrid. These data indicate that the 692–700 α-helix
is a key element for the recognition of the complementarity of the
RNA:DNA hybrid and for the selection of the “productive”
systems. Indeed, in the “unproductive” mm@17–20
system, the insertion of the α-helix 692–700 within the
heteroduplex contributes to the observed opening of the RNA:DNA hybrid
and, in turn, to the establishment of interactions between the TS
and the L2 loop (Figure ). This establishes a mechanism of selectivity where, in the “productive”
systems, REC3 “senses” the formation of a formed hybrid
through the 692–700 α-helix, in agreement with the experimental
hypothesis that the REC3 region would “sense” the formation
of the DNA:RNA hybrid and allow for HNH nuclease activation.[5,15] Indeed, in the on-target Cas9 and in the presence of 1–3
mismatches at PAM distal ends, the N692 and Q695 stably bind the TS
backbone, contributing also to the stabilization of the hybrid itself.
Contrariwise, in the presence of an extended opening of the RNA:DNA
hybrid, as including base pair mismatches at positions 17–16,
the 692–700 α-helix moves apart losing its interaction
with the TS backbone (Figure S11) and results
in the insertion within the hybrid.
Figure 6
(A) Number
of tight contacts (i.e., within 4 Å radius) established
along the dynamics by the RNA:DNA hybrid and the neighboring residues
of the REC3 region computed for the simulated systems (i.e., on-target,
mm@20, mm@19–20, mm@18–20, and mm@17–20). The
polar, apolar, and charged groups of residues have been considered.
(B) Representative snapshot from GaMD simulations of the mm@17–20
system, showing the extended opening of the RNA:DNA hybrid and the
insertion of the 692–700 α-helix (gray) within the heteroduplex.
(C) Snapshot from GaMD of the on-target CRISPR-Cas9, showing a well-behaved
RNA:DNA hybrid and the conserved interactions established by Q965,
N692, and the DNA TS. The RNA (orange) and the DNA TS (cyan) are shown
as ribbons. The 692–700 α-helix (gray) is shown as a
cartoon; interacting protein residues are shown as sticks.
(A) Number
of tight contacts (i.e., within 4 Å radius) established
along the dynamics by the RNA:DNA hybrid and the neighboring residues
of the REC3 region computed for the simulated systems (i.e., on-target,
mm@20, mm@19–20, mm@18–20, and mm@17–20). The
polar, apolar, and charged groups of residues have been considered.
(B) Representative snapshot from GaMD simulations of the mm@17–20
system, showing the extended opening of the RNA:DNA hybrid and the
insertion of the 692–700 α-helix (gray) within the heteroduplex.
(C) Snapshot from GaMD of the on-target CRISPR-Cas9, showing a well-behaved
RNA:DNA hybrid and the conserved interactions established by Q965,
N692, and the DNA TS. The RNA (orange) and the DNA TS (cyan) are shown
as ribbons. The 692–700 α-helix (gray) is shown as a
cartoon; interacting protein residues are shown as sticks.These computational outcomes also provide a rationale
for the “energy
excess” hypothesis[37] that has been
used as a foundation for the recent engineering of Cas9 systems with
improved on-target specificity, such as HypaCas9.[4,5,7] Accordingly to this hypothesis, the disruption
of the contacts between the RNA:DNA hybrid and the neighboring residues
of the REC3 region might alter the energetics of the complex, such
that it might somehow retain a diminished ability to cleave mismatched
off-target sequences. GaMD simulations show that the residues belonging
to REC3, which are responsible for the “energy excess”
in HypaCas9 (N692, M694, Q695, and H698), have a key role in the discrimination
between “productive” and “unproductive”
sequences. Indeed, while these residues tightly bind the TS backbone
in the “productive” systems conferring stabilization
to the RNA:DNA hybrid, in the presence of “unproductive”
mismatches the residues contribute to the widening and destabilization
of the RNA:DNA hybrid. This indicates a tight electrostatic and steric
control at the level of the RNA:DNA hybrid, which is exerted by the
residues of REC3. As such, the substitution of these residues by alanine,
as in HypaCas9,[5] would reduce the electrostatic
and steric interactions with the hybrid, altering the mechanism depicted
above and the energetics of the complex.Overall, these observations
provide a rationale for the role of
REC3 in the on-target selectivity, revealing how, by “sensing”
the RNA:DNA hybrid, REC3 discriminates “productive”
from “unproductive” sequences for cleavage. As well,
by taking together these observations and the “energy excess”
hypothesis,[37] the mutation of the N692,
M694, Q695, and H698 residues to alanine, as in HypaCas9,[5] would reduce the electrostatic and steric interactions
with the hybrid, altering the capability of REC3 to bind and “sense”
the formation of the RNA:DNA hybrid. In light of the above, and to
fully understand the role of the REC3 residues responsible for increased
specificity, future GaMD studies of HypaCas9 will be required to further
understand and ultimately capture the role of alanine substitution
in the mechanism of altered specificity.
Discussion
Off-target
effects represent a severe issue hindering a full exploitation
of the CRISPR-Cas9 technology.[3−7] At the molecular level, off-target effects are the unselected cleavage
of DNA sequences that do not fully match the guide RNA, bearing base
pair mismatches within the RNA:DNA hybrid. Here, we investigated the
molecular basis for the binding of base pair mismatches at PAM distal
sites, by using accelerated MD simulations. A Gaussian accelerated
MD methodology[23]—enabling the capture
of long time scale conformational changes that are not accessible
via conventional MD simulations—has been applied to characterize
the binding of off-target DNA sequences, compared to a fully matching
on-target DNA. The simulations reveal that while the on-target DNA
remains fully matched to its complementary RNA, the presence of base
pair mismatches induces an opening of the RNA:DNA hybrid (Figure ). We observe that
1–3 mismatches up to position 18 perturb the hybrid at its
very end, whereas base pair mismatches up to positions 17–16
produce an extended opening of the RNA:DNA heteroduplex, which includes
position 17 and reaches position 16 (Figures S3–S5). The mechanical distortion of the DNA in the presence of mismatches
has been very recently also observed through optical tweezers and
fluorescence experiments, further supporting the conformational changes
observed through molecular simulations.[47] Remarkably, the presence of 1–3 base pair mismatches up to
position 18 at PAM distal sites has been experimentally shown to still
enable productive cleavages of the DNA substrate at similar rates
of the on-target Cas9.[5] Contrariwise, by
increasing the number of base pair mismatches up to positions 17,
Cas9 is rendered unproductive, exhibiting slower DNA cleavage rates.[5] In light of this experimental evidence, mismatches
that perturb the hybrid downstream of position 18 are “productive”
and lead to unselective DNA cleavages, while mismatches whose distortions
occur (or are propagated) up to positions 17 are “unproductive”
for DNA cleavage.As an effect of the extended opening observed
in the presence of
base pair mismatches at positions including 17, the DNA TS engages
in conserved interactions with the L2 loop of the HNH domain (Figure and Figure S7). In particular, R904 of the L2 loop
stably binds the TS backbone at position 17 in the “unproductive”
mm@17–20 system (Movie S1). This
stable interaction is confirmed in the mm@16–17 system, which
also displays an extended opening of the RNA:DNA hybrid (Figures S3–S5). Additional interactions
with the TS bases also involve D911 and S908. These interactions constitute
a “lock” for the L2 loop, which decreases its conformational
freedom in the presence of “unproductive” mismatches
(Figure S8). Considering that the activation
of the HNH domain requires the conformational rearrangement of L2,[19,34,35] which enables HNH to properly
relocate for the TS cleavage, the newly formed interactions prevent
HNH from undergoing its conformational activation. This suggests that
base pair mismatches up to (or including) position 17 of the RNA:DNA
hybrid can “lock” the conformational activation of the
catalytic HNH domain by binding at the level of the L2 loop. This
is consistent with the experimental evidence that 4 base pair mismatches
at PAM distal ends (up to position 17) trap HNH in an inactive “conformational
checkpoint” state, preventing it from reaching the active conformation.[5,10,36] Contrariwise, GaMD simulations
show that the presence of up to 3 base pair mismatches at positions
20–18 does not result in stable interactions “locking”
L2 (Figure S7). This computational finding
is also in agreement with the experimental fact that 3 mismatches
(at positions 20–18) still allow the repositioning of HNH and
are “productive” for DNA cleavage.[5,10,36] In light of this experimental evidence,
GaMD simulations suggest a mechanism for the binding of off-target
sequences at PAM distal sites, capturing the interactions—formed
between a locally unwound TS backbone and the L2 loop—by which
base pair mismatches including position 17 prevent HNH activation
for cleavage. GaMD also shows that the REC3 region of the Cas9 recognition
lobe undergoes a significant conformational rearrangement in the presence
of base pair mismatches including position 17 of the RNA:DNA hybrid
(Figure ). Indeed,
the 692–700 α-helix of REC3 inserts within the heteroduplex,
further contributing to the observed opening of the RNA:DNA hybrid
and, in turn, to the establishment of interactions between the TS
and the L2 loop. Contrariwise, in the presence of an on-target DNA,
as well as including up to 3 base pair mismatches at PAM distal sites,
the 692–700 α-helix does not insert within the hybrid,
while the N692 and Q695 residues stably bind the TS backbone. This
clarifies the mechanism by which REC3 “senses” the formation
of a formed hybrid through the 692–700 α-helix,[5,15] contributing also to the stabilization of the hybrid itself.
Conclusions
Off-target effects are a severe issue limiting the applicability
of the CRISPR-Cas9 technology.[3−7] In this paper, we investigated the molecular basis for the binding
of DNA off-target sequences at the molecular level, by using all-atom
Gaussian accelerated MD (GaMD) simulations. The simulations reveal
that, by introducing up to 4 base pair mismatches in the target DNA
at PAM distal sites, an extended opening of the RNA:DNA hybrid is
observed, which leads to newly formed interactions between the unwound
DNA and the L2 loop of the catalytic HNH domain. These conserved interactions
constitute a “lock” effectively decreasing the conformational
freedom of the HNH domain and its activation for cleavage. Contrariwise,
up to 3 base pair mismatches at PAM distal ends are unable to stably
“lock” HNH in an inactivated state, still allowing its
repositioning for cleavage.[5,10,36] This mechanism agrees with the experimental evidence that base pair
mismatches up to position 17 of the hybrid trap HNH in an inactive
“conformational checkpoint” state, while up to 3 base
pair mismatches (up to position 18) allow HNH to reach its active
conformation for cleavage.[5,10,36] Therefore, the ability to “lock” the catalytic HNH
domain in an inactive “conformational checkpoint” is
shown to be a key determinant in the onset of off-target effects.
Overall, the outcomes of the here presented simulations provide a
mechanistic rationale that contributes in clarifying a long lasting
open issue in the CRISPR-Cas9 function. Building on this study, novel
computational investigations are ongoing in our laboratories to fully
characterize the dynamic and energetic features of a large database
of off-target sequences, exploring the effect of base pair mismatches
along the DNA:RNA hybrid[11,13] and implementing key
point mutations.[4,5,7] This
will provide a more comprehensive understanding of the CRISPR-Cas9
specificity. Finally, the findings reported here also pose the basis
for novel rational engineering of the system toward improved genome
editing. Indeed, the structural modifications of the L2 loop—by
using for instance non-natural amino acids—could be implemented
with the goal of magnifying the “locking” interactions
with the TS in the presence of off-target sequences. This could help
the development of more specific Cas9 variants, in which a single
base pair mismatch is sufficient for trapping HNH in its “conformational
checkpoint”[10] state, thus preventing
the cleavage of any incorrect DNA sequence.
Materials and Methods
Structural
Models
MD simulations have been based on
the X-ray structure of the Streptococcus pyogenes Cas9 in complex with RNA and DNA (PDB 4UN3), solved at 2.58 Å resolution,[26] which identifies the inactivated state of the
HNH domain (i.e., the so-called “conformational checkpoint”).[10] Based on this X-ray structure, five model systems
have been built, including the on-target DNA sequence (on-target)
and base pair mismatches at different positions of the RNA:DNA (namely,
mm@20, mm@19–20, mm@18–20, mm@17–20, as in Figure B). Base pair mismatches
within the RNA:DNA hybrid have been introduced by substituting the
nucleobases in the DNA TS crystallized in the PDB 4UN3 (i.e., 3′–TATT–5′, Figure B), with nucleobases that do
not match the RNA guide, resulting in the following base pair mismatches
within the RNA:DNA hybrid: A:C (at position 20), U:G (at position
19), A:C (at position 18), and A:C (at position 17). These systems
are consistent with the experimental models for which the DNA cleavage
rates have been measured.[5] Moreover, to
further explore the effect of base pair mismatches around position
17, as well as to confirm the outcomes of the mm@17–20 system,
a further replica of the system has been simulated including base
pair mismatches at positions 16–17 (namely, mm@16–17).
At this position, the C:A base pair mismatch has been introduced in
the RNA:DNA hybrid. All model systems have been embedded in explicit
waters, while Na+ ions were added to neutralize the total
charge, leading to an orthorhombic periodic simulation cell of ∼145·110·145
Å,[3] for a total of ∼220 000
atoms.
Conventional Molecular Dynamics (MD) Simulations
MD
simulations have been performed to equilibrate the systems and to
provide starting points for GaMD simulations. A simulation protocol
tailored for RNA/DNA endonucleases has been adopted,[27] embracing the use of the Amber ff12SB force field, which
includes the ff99bsc0 corrections for DNA[31] and the ff99bsc0+χOL3 corrections for RNA.[38,39] The Ȧqvist[40] force field has been
employed for Mg ions, as in previous studies on similar Mg-aided RNA/DNA
nucleases.[27,41] An integration time step of 2
fs has been employed. All bond lengths involving hydrogen atoms were
constrained using the SHAKE algorithm. Temperature control (300 K)
has been performed via Langevin dynamics,[42] with a collision frequency γ = 1. Pressure control was accomplished
by coupling the system to a Berendsen barostat,[43] at a reference pressure of 1 atm and with a relaxation
time of 2 ps. The system has been subjected to energy minimization
to relax water molecules and counterions, keeping the protein, the
RNA, DNA, and Mg ions fixed with harmonic position restraints of 300
kcal/(mol Å2). Then, the system has been heated up
from 0 to 100 K in the canonical ensemble (NVT), by running two simulations
of 5 ps each, imposing position restraints of 100 kcal/(mol Å2) on the above-mentioned elements of the system. The temperature
was further increased up to 200 K in ∼100 ps of MD in the isothermal–isobaric
ensemble (NPT), reducing the restraint to 25 kcal/(mol Å2). Subsequently, all restraints were released, and the temperature
of the system was raised up to 300 K in a single NPT simulation of
500 ps. After ∼1.1 ns of equilibration, ∼10 ns of NPT
runs were carried out allowing the density of the system to stabilize
around 1.01 g/cm3. Finally, the production run was carried
out in the NVT ensemble, collecting ∼100 ns. Simulations have
been performed using the GPU version of AMBER 16.[44] The well-equilibrated systems have been used as a starting
point for GaMD simulations.
Gaussian Accelerated MD (GaMD) Simulations
Accelerated
MD (aMD) is an enhanced sampling method that works by adding a non-negative
boost potential to smoothen the system potential energy surface (PES),
thus effectively decreasing the energy barriers and accelerating transitions
between the low-energy states.[21] Here,
aMD simulations have been performed using the novel and more robust
Gaussian aMD (or GaMD)[23] implementation,
in which the boost potential follows Gaussian distribution. This allows
reconstructing the original shape of the potential energy surface,
therefore obtaining the canonical ensemble, through accurate reweighting
using cumulant expansion to the second order. As such, even with biasing
potential, the same low-energy physical states are sampled in the
GaMD simulations. Hence, reweighting can allow for quantitative recovery
of conformational distributions, while un-reweighted results, as here,
capture the low-energy physical states and also provide a useful semiquantitative
ranking of their probabilities. The capability of the method in accurately
describing the canonical distribution has been shown for a large biomolecular
system, such as CRISPR-Cas9[14,19] and G-protein-coupled
receptors,[24,25] in agreement with the available
experimental data. GaMD therefore extends the use of aMD to large
biological systems, for which it has been difficult to attain the
canonical ensemble, given the large statistical noise.[45]Considering a system with N atoms at positions , when the system
potential is lower than a threshold energy E, the energy
surface is modified by adding a boost potential
aswhere k is the harmonic force
constant. The two adjustable parameters E and k are automatically determined by applying the following
three criteria. First, for any two arbitrary potential values and found on the original energy surface,
if , ΔV should be a
monotonic function that does not change the relative order of the
biased potential values, i.e., . Second, if , the potential difference observed on the
smoothened energy surface should be smaller than that of the original,
i.e., < . By combining the first two criteria and
plugging in the formula of and ΔV,
we obtainwhere Vmin and Vmax are
the system minimum and maximum potential
energies. To ensure that eq is valid, k has to satisfy k ≤ 1/Vmax – Vmin. By defining k ≡ k0(1/Vmax) – Vmin, then 0 < k ≤
1. Third, the standard deviation of ΔV needs
to be small enough (i.e., narrow distribution) to ensure accurate
reweighting using cumulant expansion to the second order: σΔ = k(E – Vavg)σ ≤ σ0, where Vavg and σ are the average
and standard deviation of the system potential energies, σΔ is the standard deviation of ΔV, and σ0 is a user-specified upper limit
(e.g., 10 kBT) for accurate reweighting. When E is
set to the lower bound, E = Vmax, according to eq , k0 can be calculated asAlternatively, when the threshold
energy E is set to its upper bound E = Vmin + 1/k, k0 isif k0″ is calculated between
0 and 1. Otherwise, k0 is calculated using eq instead of being set to
1 directly as described in the original paper.[23]Based on extensive testing, performed in our previous
study on
the CRISPR-Cas9 conformational dynamics,[19] the system threshold energy has be set to E = Vmax for all GaMD simulations. The boost potential
has been applied in a dual-boost scheme, in which two acceleration
potentials are applied simultaneously to the system: (i) the torsional
terms only and (ii) across the entire potential. A time step of 2
fs has been used. Given an average system size of ∼220 K atoms,
the maximum, minimum, average, and standard deviation values of the
system potential (Vmax, Vmin, Vavg and σ) has been obtained from an initial ∼100
ns NVT simulation with no boost potential (see details above). Each
GaMD simulation proceeded with a ∼50 ns run, in which the boost
potential has been updated every 1.6 ns, thus reaching equilibrium
values. Finally, ∼1 μs of GaMD simulations have been
carried out in the NVT ensemble. For each system, ∼1 μs
(GaMD production) + 50 ns (GaMD equilibration) + 100 ns (pre-equilibration
conventional MD) have been carried out. Since GaMD has been applied
on 6 model systems, including on- and off-target DNAs, a total of
∼7 μs (i.e., ∼1 μs of production + 150 ns
of equilibration × 6 systems) of simulations has been produced.
These simulations have been performed with the GPU version of AMBER
16.[44] Importantly, GaMD simulations have
been preceded by the equilibration of the structures by means of conventional
MD simulations, as described above. These runs have been performed
for ∼100/120 ns, for a total of ∼0.7 μs (i.e.,
∼120 ns × 6 systems).
Analysis of the Results
Analysis of the RNA:DNA conformational
dynamics has been done over the GaMD production runs using the CURVES+[46] code. Specifically, we computed the geometrical
parameters defining the base pair complementarity (i.e., shear, stretch,
stagger, buckle, propeller, and opening) and the minor groove width
at the PAM distal sites of the RNA:DNA hybrid. As standard in Curves+,
the minor groove has been measured between cubic spline curves running
through the phosphorus atoms of the nucleic backbone and then reduced
by 5.8 Å (2 × 2.9 Å) to discount the average radius
of two adjacent phosphodiester backbones. Hence, the computed widths
do not correspond to the phosphorus–phosphorus distance between
the base pair on the RNA:DNA hybrid, but are a measure of the groove
width with respect to (i.e., perpendicularly to) the global helical
axes. This measurement enables a reliable description of the groove
widths, which is superior to the simple measurement of the distance
between phosphates on different strands. The minor groove widths have
been measured at six different levels (i–vi, from positions
20 to 16 of the TS), as illustrated in Figure A and Figure S2.The interactions established by the RNA:DNA hybrid and the
surrounding protein domains (HNH, RuvC, REC3, and the L1/L2 loops
interconnecting HNH and RuvC) have been characterized by calculating
the statistical distribution of the direct contacts.[33] In detail, the number of tight contacts (i.e., within 4
Å radius) established by the hybrid and the neighboring residues
has been measured and averaged over the last ∼400 ns of GaMD
simulations. This allowed us to consider fully established interactions.
Indeed, as shown in Figure B, D911 and S908 engage in interactions with the TS after
∼0.45 μs. Data are reported in Figures and 6A and Figure S6 with the associated statistical errors.
Authors: Alberto Pérez; Iván Marchán; Daniel Svozil; Jiri Sponer; Thomas E Cheatham; Charles A Laughton; Modesto Orozco Journal: Biophys J Date: 2007-03-09 Impact factor: 4.033
Authors: Martin Jinek; Krzysztof Chylinski; Ines Fonfara; Michael Hauer; Jennifer A Doudna; Emmanuelle Charpentier Journal: Science Date: 2012-06-28 Impact factor: 47.728
Authors: Marie Zgarbová; Michal Otyepka; Jiří Sponer; Arnošt Mládek; Pavel Banáš; Thomas E Cheatham; Petr Jurečka Journal: J Chem Theory Comput Date: 2011-08-02 Impact factor: 6.006
Authors: Levi C T Pierce; Romelia Salomon-Ferrer; Cesar Augusto F de Oliveira; J Andrew McCammon; Ross C Walker Journal: J Chem Theory Comput Date: 2012-07-27 Impact factor: 6.006
Authors: Kyle W East; Erin Skeens; Jennifer Y Cui; Helen B Belato; Brandon Mitchell; Rohaine Hsu; Victor S Batista; Giulia Palermo; George P Lisi Journal: Biophys Rev Date: 2019-12-14
Authors: Kyle W East; Jocelyn C Newton; Uriel N Morzan; Yogesh B Narkhede; Atanu Acharya; Erin Skeens; Gerwald Jogl; Victor S Batista; Giulia Palermo; George P Lisi Journal: J Am Chem Soc Date: 2020-01-09 Impact factor: 15.419