Christopher Maffeo1, Thuy T M Ngo2, Taekjip Ha3, Aleksei Aksimentiev4. 1. Department of Physics, University of Illinois at Urbana -Champaign, Urbana, Illinois, United States. 2. Center for Biophysics and Computational Biology, University of Illinois at Urbana -Champaign, Urbana, Illinois, United States. 3. Department of Physics, University of Illinois at Urbana -Champaign, Urbana, Illinois, United States ; The Howard Hughes Medical Institute , Chevy Chase, Maryland, United States. 4. Department of Physics, University of Illinois at Urbana-Champaign , Urbana, Illinois, United States ; Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign , Urbana, Illinois, United States.
Abstract
A simple coarse-grained model of single-stranded DNA (ssDNA) was developed, featuring only two sites per nucleotide that represent the centers of mass of the backbone and sugar/base groups. In the model, the interactions between sites are described using tabulated bonded potentials optimized to reproduce the solution structure of DNA observed in atomistic molecular dynamics simulations. Isotropic potentials describe nonbonded interactions, implicitly taking into account the solvent conditions to match the experimentally determined radius of gyration of ssDNA. The model reproduces experimentally measured force-extension dependence of an unstructured DNA strand across 2 orders of magnitude of the applied force. The accuracy of the model was confirmed by measuring the end-to-end distance of a dT14 fragment via FRET while stretching the molecules using optical tweezers. The model offers straightforward generalization to systems containing double-stranded DNA and DNA binding proteins.
A simple coarse-grained model of single-stranded DNA (ssDNA) was developed, featuring only two sites per nucleotide that represent the centers of mass of the backbone and sugar/base groups. In the model, the interactions between sites are described using tabulated bonded potentials optimized to reproduce the solution structure of DNA observed in atomistic molecular dynamics simulations. Isotropic potentials describe nonbonded interactions, implicitly taking into account the solvent conditions to match the experimentally determined radius of gyration of ssDNA. The model reproduces experimentally measured force-extension dependence of an unstructured DNA strand across 2 orders of magnitude of the applied force. The accuracy of the model was confirmed by measuring the end-to-end distance of a dT14 fragment via FRET while stretching the molecules using optical tweezers. The model offers straightforward generalization to systems containing double-stranded DNA and DNA binding proteins.
It has become apparent that
physical properties of DNA can play a fundamental role in its biological
function and can determine the utility of DNA for nanotechnological
applications.[1−3] Although the genetic code is stored in double-stranded
form, it is the single-stranded form of DNA that plays active roles
in central biological processes such as transcription, replication,
and DNA repair.[4−7] The emerging field of DNA nanotechnology exploits self-assembly
of single-stranded DNA (ssDNA) to create novel nanostructures[8] where DNA is used as a structural element,[9,10] functionalization and assembly agent,[11,12] transport
system,[13] etc. Thus, ssDNA is ubiquitous
in biology and bio(nano)technology, yet much less is known about its
properties in comparison to double-stranded DNA (dsDNA).Computer
simulations can provide detailed insights into the structure,
dynamics, and energetics of a biological or nanotechnological system.[14−16] In this regard, all-atom molecular dynamics (MD) simulations that
explicitly represent every atom of the system can offer the most detailed
account of the system’s inner workings.[17,18] However, all-atom MD simulations are currently limited to time scales
that are short in comparison to the relaxation time scales of even
quite small ssDNA molecules.[19] By describing
DNA using a less detailed, “coarse-grained” model, the
time scale accessible to simulation can be significantly expanded.[20−22]All coarse-grained (CG) models of DNA feature coarse-grained
interaction
sites, which typically represent groups of atoms on the nucleic acid.
Having many interaction sites,[23−26] aspherical sites[23,27,28] or multibody nonbonded potentials[25] can often enhance the accuracy of the model, usually at
the cost of computational performance and/or portability to existing
MD codes. Potentials describing interactions between sites are usually
obtained by a “top-down” approach,[26−32] in which functional forms are assumed with parameters typically
assigned by hand to match experimental observables such as the dsDNA
structure and/or thermodynamic melting data. In some cases, the single-stranded
DNA of such models has exhibited clear signs of overoptimization to
the dsDNA conformation.[30] The alternative
“bottom-up” approach to parametrization usually involves
systematic optimization of interaction parameters to replicate features
observed in atomistic[23,24,33−35] or quantum mechanical[23,36] simulations.
Some models fall between the usual classifications by employing “bottom-up”
style optimization to match structural distributions from experimentally
obtained atomic structures.[25,37] Each approach to parametrization
has its drawbacks. Top-down models rely on physical intuition, trial-and-error,
and (typically) low spatial and temporal resolution experimental measurements
to describe interactions. Bottom-up models rely on expressiveness
of the underlying atomic force field, which have historically been
corrected many times for various deficiencies.Although many
models of DNA are capable of representing the single-stranded
form,[23,24,26−28,32−34,36,38] the existing coarse-grained
models of DNA have been foremost optimized to reproduce the properties
of the double-stranded form.[23,24,26−28,32−36,38] Validation of the single-stranded
properties produced by DNA models against experimental observables
has been sparse. Because RNA structures typically contain a significant
amount of single-stranded material, we may expect that CG models of
RNA[25,37] may better represent single-stranded nucleic
acid conformations. However, the properties of ssRNA and ssDNA differ
significantly in solution.[39]We have
developed a simple, CG model of unstructured ssDNA. Although
we have used a homopolymer of thymine nucleotides as a target for
our parametrization, our model can describe any single-stranded DNA
lacking a secondary structure. In our model, each nucleotide is represented
using two interaction sites, B (base) and P (backbone/phosphate) beads,
shown schematically in Figure 1. The interactions
between beads are described through interaction potentials tabulated
to accurately reproduce the conformations of ssDNA observed in all-atom
simulations. The part of the potentials that describes chemical bonds
between DNA atoms—bonded potentials—explicitly specifies
pairwise interaction of each B bead with two neighboring P beads and
also includes four three-bead angle terms and three four-bead dihedral
angle terms. The three nonbonded potentials describe interactions
between one P and one B bead, two P beads, and two B beads using 1–3
exclusions: the nonbonded potential is zero between beads separated
by one or two bonds. The solvent surrounding the DNA is modeled implicitly
through a Langevin thermostat and ion concentration-dependent nonbonded
potentials. Supporting Information Figures
S1, S2, S4, and S5 illustrate the CG potentials of our model.
Figure 1
Scheme used
to map atoms onto coarse-grained beads. The left panel
shows a portion of an all-atom model of ssDNA. The backbone/phosphate
and sugar/base atoms of two nucleotides are enclosed by green or cyan
semitransparent surfaces, respectively. The enclosed groups of atoms
are mapped onto the P (backbone/phosphate) and B (sugar/base) beads,
which are shown in the right panel as green and cyan spheres, respectively.
Scheme used
to map atoms onto coarse-grained beads. The left panel
shows a portion of an all-atom model of ssDNA. The backbone/phosphate
and sugar/base atoms of two nucleotides are enclosed by green or cyan
semitransparent surfaces, respectively. The enclosed groups of atoms
are mapped onto the P (backbone/phosphate) and B (sugar/base) beads,
which are shown in the right panel as green and cyan spheres, respectively.To obtain CG potentials that are
consistent with the all-atom model,
we performed a set of reference all-atom MD simulations of a dT60 strand submerged in 100 and 1000 mM NaCl electrolytes. All
our atomistic simulations employed the CHARMM36 force field,[40] which, according to a recent report, represents
conformational dynamics of ssDNA more accurately than the AMBER force
field.[18] We used custom corrections to
vdW interactions of DNA phosphateoxygens and sodium ions[41] to accurately describe the ionic atmosphere
of DNA.[42] A full description of the reference
simulations is provided in the Supporting Information.The resulting all-atom trajectories (9.6 μs of aggregate
simulation time) were converted into our CG representation (P and
B beads). For a given nucleotide, the P bead represented the O5T′,
O5′, P, O1P, O2P, and C5′ atoms of that nucleotide and
the C3′ and O3′ atoms of the adjacent nucleotide such
that the bead was roughly centered on the phosphorus group of the
DNA backbone. The remaining atoms of the nucleotide were mapped onto
the B bead. For both types of beads, the conversion procedure was
done by computing the center of mass of the respective groups of atoms;
hydrogen atoms were neglected during the conversion procedure.An initial guess for each CG potential was obtained via Boltzmann
inversion of the corresponding distribution extracted from the CG-mapped
all-atom trajectory. The effect of ion concentration was taken into
account by introducing two sets of CG potentials (for 100 and 1000
mM electrolytes) parametrized using the corresponding all-atom trajectories.
Using a CG system identical in composition and size to the all-atom
one, the bonded—but not nonbonded—potentials were refined
by performing 30 iterations of the iterative Boltzmann inversion (IBI)
procedure. In IBI, the CG potentials are iteratively adjusted until
the distributions obtained from CG simulations converge to the target
all-atom distributions, see Supporting Information for details.Upon completion of the IBI refinement of bonded
interactions, the
distributions of CG beads corresponding to the bonded interactions
(i.e., the distribution of bond lengths, angles, and dihedrals) were
in excellent agreement with equivalent distributions obtained by CG-mapping
the reference all-atom simulations. Next, nonbonded interaction potentials
were refined through the IBI procedure applied to the 1–3 excluded
pair distribution functions of the P and B beads; the bonded potentials
were kept fixed. The IBI refinement of the nonbonded interactions
yielded a CG model that produced pair distribution functions of the
beads that were in excellent agreement with equivalent distributions
obtained by CG-mapping the reference all-atom simulations; the agreement
between bonded distributions was maintained. All CG simulations were
performed using a custom version of the MD software NAMD.[43] Details of the IBI refinement, final potentials,
and distributions (along with the target distributions extracted from
all-atom simulations) are provided in the Supporting
Information.Having completed the “bottom-up”
stage of parametrization,
the radius of gyration—a representation of the size of a molecule—was
determined for a dT60 molecule using the 100 and 1000 mM
parametrizations of the CG model. The radius of gyration was found
to be 89% and 75% of the experimentally measured values (44.6 and
38.2 Å).[44] This disagreement likely
originates from the imperfections of the all-atom model but may also
be caused by the finite size of the all-atom reference system. Thus,
we further refined the non-bonded potentials of our CG model until
agreement with experimentally measured radii of gyrations was reached.
For this “top-down” refinement, the nonbonded potential
describing the interaction of the P beads was systematically altered
by adding or subtracting a Yukawa potential of the appropriate Debye-length
value, see Supporting Information for details.
Upon completion of this procedure, the radius of gyration obtained
in CG simulations agreed well with that measured in experiments across
a broad range of polymer lengths and at both ion concentrations, see
Figure 2.
Figure 2
Radius of gyration, R, of a thymine homopolymer versus the
homopolymer length. (A) Illustration
of the experimental method, small-angle X-ray scattering, used to
measure R. (B) Radius
of gyration as a function of the polymer length. Good agreement was
obtained between experimental measurements (filled symbols) and coarse-grained
simulation (open symbols) under NaCl concentrations of 100–125
mM (blue squares) and 1000–1025 mM (green triangles). Experimental
data were extracted from Sim et al.[44]
Radius of gyration, R, of a thymine homopolymer versus the
homopolymer length. (A) Illustration
of the experimental method, small-angle X-ray scattering, used to
measure R. (B) Radius
of gyration as a function of the polymer length. Good agreement was
obtained between experimental measurements (filled symbols) and coarse-grained
simulation (open symbols) under NaCl concentrations of 100–125
mM (blue squares) and 1000–1025 mM (green triangles). Experimental
data were extracted from Sim et al.[44]Without any further refinement
of the model, the simulated force–extension
dependence of a dT200 molecule was in excellent agreement
with the experimentally measured dependence of one strand of λ-phage
DNA (48 500 nts) under high applied force and similar ionic
conditions,[45] see Figure 3. Similar simulations were performed using two“top-down”
CG models 3SPN.2[27] and oxDNA,[31] see Supporting Information for details of these simulations. Our model, which was optimized
specifically for ssDNA, performs extremely well for dT200 if compared to the 3SPN.2 model. Comparison with the oxDNA model
is not entirely possible because that model was parametrized for 500
mM monovalent electrolyte. In the high force regime (above 20 pN),
where electrolyte conditions are not expected to influence the extension
of ssDNA,[46] the oxDNA model fits the experimental
data well.
Figure 3
Simulated and measured force–extension dependence of ssDNA.
(A) Illustration of the experimental method used to apply the force
and measure the extension. Double-stranded λ-phage DNA (48.5
kbp) was caught between two beads in a dual optical trap. Melting
and washing off the complementary DNA strand allowed the force–extension
curve of the remaining strand to be determined. (B) Force–extension
curve of ssDNA obtained through CG simulations and experiment. Data
obtained using our CG model (open symbols) are in good agreement with
experimental results[45] (filled symbols)
at high force and both NaCl concentrations. Quantitative comparison
at low force is not possible because of the secondary structure of
λ-phage DNA that shortens DNA extension in the experiment. For
comparison, we present force–extension dependence of dT200 in 100 mM electrolyte obtained using the 3SPN.2 model[27] and of 200 base-average nucleotides in 500 mM
electrolyte obtained using the oxDNA model.[31]
Simulated and measured force–extension dependence of ssDNA.
(A) Illustration of the experimental method used to apply the force
and measure the extension. Double-stranded λ-phage DNA (48.5
kbp) was caught between two beads in a dual optical trap. Melting
and washing off the complementary DNA strand allowed the force–extension
curve of the remaining strand to be determined. (B) Force–extension
curve of ssDNA obtained through CG simulations and experiment. Data
obtained using our CG model (open symbols) are in good agreement with
experimental results[45] (filled symbols)
at high force and both NaCl concentrations. Quantitative comparison
at low force is not possible because of the secondary structure of
λ-phage DNA that shortens DNA extension in the experiment. For
comparison, we present force–extension dependence of dT200 in 100 mM electrolyte obtained using the 3SPN.2 model[27] and of 200 base-average nucleotides in 500 mM
electrolyte obtained using the oxDNA model.[31]At forces below ∼10 pN,
secondary structures form in λ-phage
DNA, reducing its extension if compared to poly(dT). In one experimental
study, glyoxal was used to chemically denature λ-phage DNA,
allowing the authors of that study to probe the low-force extension
of ssDNA in the absence of secondary structure formation.[46] Unfortunately, the denaturation process may
have introduced chemical cross-links so that the absolute length of
the ssDNA molecule was unknown, precluding comparison of absolute
extension per nucleotide. Nevertheless, after dividing the extension
values by the extension at 20 pN for each measurement, the simulated
extension of dT200 was in good agreement with the experimentally
measured extension across 2 orders of magnitude of the applied force,
see Figure 4. The 3SPN.2 and oxDNA models also
agree well with the rescaled force–extension curves.
Figure 4
Simulated and
measured force–extension dependence of ssDNA.
(A) Illustration of the experimental method used to apply the force
and measure the extension. Chemically denatured λ-phage DNA
was stretched between a glass slide and a magnetic bead. The denaturant,
which prevents formation of secondary structure, also introduced cross-links
between parts of the DNA, which made determination of the absolute
extension not possible. (B) Force–extension curve of ssDNA
obtained through CG simulations and experiment. Good agreement was
observed between experimental measurements[46] (filled symbols) and coarse-grained simulations performed using
our model (open symbols), the 3SPN.2 model[27] in 100 mM electrolyte and the oxDNA model[31] in 500 mM electrolyte.
Simulated and
measured force–extension dependence of ssDNA.
(A) Illustration of the experimental method used to apply the force
and measure the extension. Chemically denatured λ-phage DNA
was stretched between a glass slide and a magnetic bead. The denaturant,
which prevents formation of secondary structure, also introduced cross-links
between parts of the DNA, which made determination of the absolute
extension not possible. (B) Force–extension curve of ssDNA
obtained through CG simulations and experiment. Good agreement was
observed between experimental measurements[46] (filled symbols) and coarse-grained simulations performed using
our model (open symbols), the 3SPN.2 model[27] in 100 mM electrolyte and the oxDNA model[31] in 500 mM electrolyte.The use of long, mixed-sequence DNA molecules in previous
experimental
studies of ssDNA elasticity[45,46] complicates direct
comparison with the simulation data. To validate our CG model for
very short, chemically unmodified DNA fragments we turned to advanced
single molecule techniques. Specifically, we used fluorescence resonance
energy transfer (FRET) detection to measure the extension of a dT14 molecule under tension applied by an optical trap,[47] see Figure 5A. In this
assay, the DNA construct was immobilized on a poly(ethylene glycol)
coated glass slide at one end using the biotin–neutravidin
interaction. The other end of the construct was connected to a micrometer
polysterene bead via a λ-phage DNA linker. The preparation of
the DNA construct is described in detail in the Supporting Information. The bead was optically trapped, putting
the DNA construct under tension. A pair of dyes (Cy3 and Cy5) was
attached at the two ends of the dT14 fragment to provide
a FRET signal that effectively allowed the end-to-end distance to
be monitored as a function of the applied force.
Figure 5
Force–extension
dependence of dT14. (A) Illustration
of the experimental method used to simultaneously stretch DNA and
measure the FRET signal. A green laser excites the Cy3 donor dye;
some energy is nonradiatively transferred to the Cy5 acceptor dye.
The amount of energy transferred, the FRET efficiency, is related
to the distance between the dyes. An optical trap applies tension.
(B) FRET efficiency vs force observed in the experiment (filled symbols)
and calculated from the CG simulations of dT14 under tension
(open symbols). The following expression was used to compute FRET
based on the distance r between the terminal P beads:
⟨1/(1 + ((r + δ)/R0)6)⟩, where R0 = 60 Å is the Förster distance and δ = 22 Å
is a constant factor associated with the physical dimensions and placement
of the dyes. The angle brackets represent an average over the simulation
trajectory. The agreement between simulation and experiment was good
at 100 (blue squares) and 1000 mM (green triangles).
Force–extension
dependence of dT14. (A) Illustration
of the experimental method used to simultaneously stretch DNA and
measure the FRET signal. A green laser excites the Cy3donor dye;
some energy is nonradiatively transferred to the Cy5 acceptor dye.
The amount of energy transferred, the FRET efficiency, is related
to the distance between the dyes. An optical trap applies tension.
(B) FRET efficiency vs force observed in the experiment (filled symbols)
and calculated from the CG simulations of dT14 under tension
(open symbols). The following expression was used to compute FRET
based on the distance r between the terminal P beads:
⟨1/(1 + ((r + δ)/R0)6)⟩, where R0 = 60 Å is the Förster distance and δ = 22 Å
is a constant factor associated with the physical dimensions and placement
of the dyes. The angle brackets represent an average over the simulation
trajectory. The agreement between simulation and experiment was good
at 100 (blue squares) and 1000 mM (green triangles).Figure 5B shows the FRET
vs force curves
obtained at two different salt conditions as described in the Supporting Information. As the tension increases,
the FRET signal decreases, indicating extension of the dT14 fragment of the construct. In the low-force regime, the FRET values
depend on the ionic conditions but converge in the high-force regime
(>10 pN). The low-force FRET is larger under high NaCl concentration,
implying greater compaction of ssDNA caused by stronger electrostatic
screening. This observation is consistent with our earlier work[48] and with the observed shrinking of ssDNA under
high salt conditions.[44] At high force,
the FRET curves converge as the extension of the polymer approaches
its contour length. The FRET efficiency computed from our CG simulations
of dT14 under tension is in good agreement with the experimental
FRET traces at 100 and 1000 mM NaCl, see Figure 5B.From a practical perspective, our CG model permits microsecond-per-day
simulations of hundreds of nucleotides on a single processor core.
One should note that dynamics are usually enhanced in coarse-grained
simulations compared to all-atom, for example, due to smoothing of
the free energy landscape.[49] For dT60 using our model, each CG nanosecond corresponds to ∼80
real-world nanoseconds. However, we found that the enhancement for
a DNA molecule depends on its length, see the Supporting Information for details. The only nonstandard features
of an MD code required to perform CG simulations using our model are
tabulated bonded and nonbonded potentials. It must also be possible
to apply bonds by bead index rather than bead type as the P–B and B–P potentials
differ.The most significant limitations of our model are 3-fold.
First,
the model is currently limited to simulations of unstructured ssDNA,
such as poly(dT). Second, the base beads are spheres that lack orientation
that may be important for accurate modeling of base-pairing and base-stacking
in duplex DNA. However, anisotropic base–base interactions
require additional computation and reduce portability of the model.
Finally, our model lacks a description of hydrodynamic interactions,
which makes interpretation of kinetic information difficult, see the Supporting Information for details.Despite
its simplicity, our CG model provides a structurally accurate
portrayal of a poly(dT) molecule across a wide range of polymer lengths,
applied tensions and ion concentrations. This makes our model immediately
suitable for CG studies of ssDNA systems where sequence-specific effects,
including base-pairing and strong adenosine stacking,[50] can be neglected. Thus, our model should be best suited
to the simulation of ssDNA comprising thymine and cytosine nucleotides.
We have already used a preliminary version of our model to study the
effect of local heating on the process of ssDNA transport through
a solid-state nanopore.[51]With only
two sites per nucleotide, extensions to the model can
be easily made. For example, it was trivial to create a toy model
of double-stranded DNA by adding a set of harmonic potentials to describe
base-paring of two CG DNA strands, Figure 6. The physical properties of the resulting dsDNA model could be easily
adjusted by changing the strength of the interstrand harmonic potentials.
Other extensions are also possible. For example, by representing proteins
using a grid-based potential, one may study the interaction between
a CG DNA molecule and a DNA-binding protein. Our simple, computationally
efficient, yet accurate model of ssDNA is a step toward a complete
physical model of the DNA processing machinery of a living cell.
Figure 6
Custom
flexibility CG model of dsDNA. (A) Definition of the persistence
length Lp. The persistence length is a
measure of a polymer’s flexibility that can be directly derived
from a simulation trajectory. (B) Representative conformations of
a 100-bp dsDNA fragment having different parametrization of interstrand
interactions (a = 0.4, blue; 1.0, teal; and 2.0,
green). In our dsDNA model, parameter a defines the
strength of interstrand interactions. Individual DNA strands are described
using our CG model of ssDNA. (C) Angular correlation of two DNA fragments
versus distance between the fragments. An exponential fit reveals
the persistence length. Colors are as in B. The flexibility of dsDNA
can be controlled by changing the scaling factor a. The harmonic restraints used to enforce base-pairing preclude the
possibility of melting. Additional details are provided in the Supporting Information.
Custom
flexibility CG model of dsDNA. (A) Definition of the persistence
length Lp. The persistence length is a
measure of a polymer’s flexibility that can be directly derived
from a simulation trajectory. (B) Representative conformations of
a 100-bp dsDNA fragment having different parametrization of interstrand
interactions (a = 0.4, blue; 1.0, teal; and 2.0,
green). In our dsDNA model, parameter a defines the
strength of interstrand interactions. Individual DNA strands are described
using our CG model of ssDNA. (C) Angular correlation of two DNA fragments
versus distance between the fragments. An exponential fit reveals
the persistence length. Colors are as in B. The flexibility of dsDNA
can be controlled by changing the scaling factor a. The harmonic restraints used to enforce base-pairing preclude the
possibility of melting. Additional details are provided in the Supporting Information.
Authors: Jonathan P K Doye; Thomas E Ouldridge; Ard A Louis; Flavio Romano; Petr Šulc; Christian Matek; Benedict E K Snodin; Lorenzo Rovigatti; John S Schreck; Ryan M Harrison; William P J Smith Journal: Phys Chem Chem Phys Date: 2013-10-11 Impact factor: 3.676
Authors: Farhan Chowdhury; Isaac T S Li; Thuy T M Ngo; Benjamin J Leslie; Byoung Choul Kim; Joshua E Sokoloski; Elizabeth Weiland; Xuefeng Wang; Yann R Chemla; Timothy M Lohman; Taekjip Ha Journal: Nano Lett Date: 2016-05-12 Impact factor: 11.189
Authors: Oliver Henrich; Yair Augusto Gutiérrez Fosado; Tine Curk; Thomas E Ouldridge Journal: Eur Phys J E Soft Matter Date: 2018-05-10 Impact factor: 1.890
Authors: Tamara Frembgen-Kesner; Casey T Andrews; Shuxiang Li; Nguyet Anh Ngo; Scott A Shubert; Aakash Jain; Oluwatoni J Olayiwola; Mitch R Weishaar; Adrian H Elcock Journal: J Chem Theory Comput Date: 2015-04-30 Impact factor: 6.006