Literature DB >> 25400520

DNA Duplex Formation with a Coarse-Grained Model.

Maciej Maciejczyk¹, Aleksandar Spasic², Adam Liwo³, Harold A Scheraga⁴.

Abstract

A middle-resolution coarse-grained model of DNA is proposed. The DNA chain is built of spherical and planar rigid bodies connected by elastic virtual bonds. The bonded part of the potential energy function is fit to potentials of mean force of model systems. The rigid bodies are sets of neutral, charged, and dipolar beads. Electrostatic and van der Waals interactions are parametrized by our recently developed procedure [Maciejczyk, M.; Spasic, A.; Liwo, A.; Scheraga, H.A. J. Comp. Chem.2010, 31, 1644]. Interactions with the solvent and an ionic cloud are approximated by a multipole-multipole Debye-Hückel model. A very efficient R-RATTLE algorithm, for integrating the movement of rigid bodies, is implemented. It is the first coarse-grained model, in which both bonded and nonbonded interactions were parametrized ab initio and which folds stable double helices from separated complementary strands, with the final conformation close to the geometry of experimentally determined structures.

Entities: Chemical Disease Species

Year: 2014 PMID： 25400520 PMCID： PMC4230386 DOI： 10.1021/ct4006689

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

DNA is a blueprint of life. Since Watson and Crick proposed its three-dimensional structure,[1] much effort has been spent to understand its structure and dynamics. Formation of the Watson–Crick double-helix from separated strands is a biologically very important process observed during replication, translation, or DNA repair. It has been the subject of many experimental biochemical and biophysical studies. Also, recently, the rapidly growing field of computer simulations of biological systems has developed models and tools that can provide some insight into this important process. All-atom molecular dynamics (MD) methods offer the deepest insight into the time evolution of DNA at the atomic level. The CHARMM[2] and AMBER[3] simulation packages have been used to study the duplex dynamics of DNA, mainly in the vicinity of equilibrium. The highlights of near-equilibrium all-atom DNA molecular dynamics simulations include the systematic study of nearest-neighbor effects on various short DNA sequences performed by the Ascona B-DNA Consortium,[4] and the microsecond simulations of the Drew-Dickerson dodecamer[5] or the κB DNA element.[6] Although all-atom MD simulations have proven to be very useful for studying near-equilibrium dynamics, their very high computational cost, related to the large number of degrees of freedom and high-frequency oscillations, forcing a short time-step of integration, currently render them unsuitable for simulating long time-scale nonequilibrium processes, such as formation of duplex DNA from separate strands. Only recently, those enhanced sampling methods, such as replica exchange,[7] facilitated simulations of formation of very short (4 base pair (bp)) duplexes of DNA.[8] The combination of metadynamics[9] and replica exchange, called BE-META method,[10] was applied to slightly larger system, DNA hexamer,[11] but the more-important biological process of formation of long helices remains prohibitively expensive for all-atom MD simulations and has recently become a target for coarse-graining methodology. Coarse-grained models can be divided into three categories: statistical models, continuum models, and reduced interaction-center models. In the first category, the early work of Zimm and Rice[12] and Poland and Scheraga[13] neglected all structural and dynamical information and used only the free-energy gain per base-pair formation for computation of the partition function, which, in turn, was transformed into a thermodynamic picture of the process of helix formation. The continuum models approximate dsDNA as a continuum elastic rod and by design cannot be applied to the DNA formation process.[14−16] The last category covers all models in which groups of atoms are replaced by a reduced number of interaction centers connected by elastic and/or rigid virtual bonds.[17−45] A properly designed coarse-graining procedure should reduce the number of interaction centers, leading to a significant speedup of computations, but simultaneously, the simplified potential energy function should include the most important interactions that determine the behavior of the real system. There are two common approaches to the problem of parametrization of a coarse-grained model.[37] In the “top-down” parametrization method, selected parameters are adjusted to reproduce large-scale properties of the investigated systems. In the “bottom-up” parametrization method, interactions are directly determined from either all-atom molecular dynamics simulations or quantum chemistry computations applied to some model systems. In the past decade, many coarse-grained models of DNA have been developed, but only several of them managed to form double-stranded DNA from separated complementary chains. A three-bead-per-nucleotide model of de Pablo and co-workers[27] was successfully used to study DNA melting[30] and the renaturation processes.[31] The model was designed around the potential energy function, which effectively depends on the single parameter ϵ, which was optimized to reproduce experimental melting curves. The base–base interactions were approximated by a Go̅-like potential.[46] The model of Ouldridge et al.[37] successfully addresses the phenomena of single-strand stacking, duplex hybridization, and DNA hairpin formation processes. The model was also applied to study DNA Holliday junctions self-assembly[32] and DNA nanotweezers.[36] Although both models successfully fold double-stranded DNA from separate complementary chains, they rely on base–base nonbonded potentials, which either distinguish between native and non-native pairs[27] or nearest-neighbors and all remaining pairs.[32] Such an approach should speed up the in silico folding process, because many “unwanted” free energy local minima are eliminated, but in the real physical system, the nonbonded potential energy of two interacting bases does not depend on their sequential position in the polynucleotide chain but rather depends on their relative position and orientation in space. The third model named NARES-2P, which does not incorporate any Go̅-like potentials and successfully folds double-stranded DNA from separate strands was recently published by our group.[43] NARES-2P model was developed in parallel to the rigid-body model presented in this paper. NARES-2P is a lower-resolution model with two interaction centers per nucleotide (located on phosphate groups and bases), and it seems to be a minimal physics-based model capable of folding of a double-helix from separate strands. The rigid-body model presented here has more interaction centers and more internal degrees of freedom than NARES-2P, and consequently, it is computationally more demanding. Although the conformational space of rigid-body model is larger than in case of NARES-2P, it is still capable of locating double-helical structures in simulated annealing process started from separate strands. The advantage of the dipolar-bead model over recently published NARES-2P is its higher resolution. Another model, which is capable of folding dsDNA from separate strands, was published recently by Cragnolini et al.[45] The degree of coarse-graining (6–7 interaction sites per nucleotide) is similar to that of our dipolar-bead model. However, as opposed to our dipolar-bead model, which places more emphasis on the representation of nucleic-acid bases, the model of Cragnolini et al. elaborates on the nucleotide backbone, which is represented by 5 sites per nucleotide than on the nucleic-acid bases, which are represented by 1 or 2 interaction sites, depending on the kind of base; it also utilizes a classical bonded potential and a many-body nonbonded potential responsible for correct Watson–Crick hydrogen-bonding and was parametrized in a “top-down” fashion. This model does not incorporate any Go̅-like potentials and does not treat the interactions between nearest-neighbor nucleotides differently than those between the other pairs of nucleotides. In this paper, we present a physics-based, middle-resolution model of DNA, which is capable of folding the double-helical structure from separate complementary strands. The model was parametrized in the “bottom-up“ fashion, although some adjustment of parameters was necessary to obtain correct balance of key interactions. The non-bonded pairwise potentials do not depend on the sequential position of interacting bases in the polynucleotide chain because the native and non-native, nearest-neighbors, and all other pairs rely on the same potential energy function, which includes a Lennard-Jones and an electrostatic term. The full, physics-based (PB model) version of model keeps stable canonical B DNA double helices in a long room-temperature simulations and folds short DNA molecules. The reduced version of the model, in which base–base intrastrand interactions are limited to nearest neighbors (NN model), successfully folds canonical B DNA from separate complementary strands for all tested systems.

Methods

The Model

The coarse-grained model of DNA is built of three types of particles: neutral bead, uncharged Lennard-Jones sphere, charged bead, Lennard-Jones sphere with an electric charge located in its center, dipolar bead, Lennard-Jones sphere with an electric dipole located in its center. The DNA chain is built of six types of chemical units: phosphate group (P), deoxyribose (S), adenine (A), thymine (T), guanine (G), and cytosine (C). In the coarse-grained approximation, atoms that constitute each unit are replaced by the three types of beads defined above. The deoxyribose ring is replaced by one neutral bead, the phosphate group is replaced by one charged bead and each base is replaced by a set of dipolar beads. The distances between the dipolar beads of each base are fixed.[47] A schematic representation of the coarse-grained units overlapped on their atomic representation is shown in Figure 1. The parametrization of the bead model is described in the Parametrization section.

Figure 1

Coarse-grained representation of fragments of the DNA chain superimposed on the atomic representation. Four basic building blocks A, T, G, C attached to small fragments of backbones are shown. Charged beads, which replace phosphate groups, are marked red; neutral beads, which replace deoxyriboses, are marked dark blue. Dipolar beads have different colors for different bases: A, pink; G, green; T, cyan; and C, yellow. Dipolar beads symbols are defined for all building blocks. The internal degrees of freedom used for the definition of the bonded part of the potential energy function (eq 2) are defined in Tables 1–4. For clarity, symbols of beads used for definition of bonded potentials of the backbone are shown for thymine only. The longer fragment of coarse-grained strand of DNA with ATGC sequence overlapped on the atomic model is shown in the right panel. Sequential numbers of consecutive building blocks are assigned.

Table 1

Definitions of Virtual Bonds and Parameters of the Bonded Part of the Potential Energy Functiona

virtual bond	k_d (kcal/(mol Å²))	d₀ (Å)
P5–S	12.45	3.86
P3–S	25.85	3.58
S–B(Pur)	26.55	4.82
S–B(Pyr)	54.23	4.20

For definition of symbols see Figure 1.

Table 4

Definitions of the Virtual-Bond Improper Dihedral Angles and the Parameters of the Bonded Part of the Potential Energy Functiona

virtual-bond improper dihedral angle	k_γ (kcal/(mol·rad²))	γ₀ (rad)
P5–P3–S–B(Pur)	7.74	2.10
P5–P3–S–B(Pyr)	7.74	1.90

cf. Figure 1.

Each chemical unit in the coarse-grained approximation constitutes a rigid body. The origin of the local coordinate frame of each rigid body is located at its center of mass, and its axes are aligned with the principal axes of the moment of inertia tensor. The position and orientation of each rigid body with respect to the global coordinate frame is described by a vector–tensor pair (R,Q), where R is the position of the center of mass and Q is the rotation matrix. In the model, two types of rigid bodies can be distinguished—spherical (P and S) and planar (A,T,G,C). The spherical rigid bodies interact with other particles only by central forces (see “Potential Energy Function”), and their orientation is irrelevant; therefore, a constant rotation matrix Qsphere = const is assigned to them. The orientation of each planar rigid body is described by a 3 × 2 rotation matrix Qplanar and varies as the rigid body changes its orientation during a simulation run. The geometry of the chain is defined in vector–tensor space (R,Q) where i = 1,...,N and N is the number of rigid bodies. The relations between global coordinates (R, Q) and distances between various beads, used for definitions of nonbonded interactions, are shown in Figure 2.

Figure 2

Relations between global coordinates (R,Q) and distances between beads for interactions of (a) two spherical rigid-bodies, where there is only one distance r = |R|; (b) spherical and planar rigid body, where distances between interacting beads are functions of global coordinates given by eq 9; and (c) two planar rigid-bodies, where distances between interacting beads are given by eq 13. Rigid bodies are connected by elastic virtual bonds and form a nucleic acid chain as shown in Figure 1. Three rigid bodies connected by two virtual bonds form a virtual-bond angle and four rigid bodies connected by three virtual bonds form a virtual-bond dihedral angle. All virtual bonds, virtual-bond angles, and virtual-bond dihedral angles are listed in Tables 1, 2, and 3, respectively.

Table 2

Definitions of the Virtual-Bond Angles and the Parameters of the Bonded Part of the Potential Energy Functiona

virtual-bond angle	k_α (kcal/(mol·rad²))	α₀ (rad)
P5–S–P3	10.69	2.11
S5–P5–S	7.59	1.66
P5–S–B(Pur)	7.02	1.83
P5–S–B(Pyr)	8.68	1.62
P3–S–B(Pur)	10.47	1.90
P3–S–B(Pyr)	13.18	2.00
S–B–X(Ade)	32.87	2.71
S–B–Y(Ade)	32.87	1.97
S–B–X(Thy)	42.82	1.10
S–B–Y(Thy)	42.82	2.64
S–B–X(Gua)	31.61	2.22
S–B–Y(Gua)	31.61	0.67
S–B–X(Cyt)	43.95	2.31
S–B–Y(Cyt)	43.95	2.38

S–B–X and S–B–Y are virtual-bond angles between the S–B virtual bond and the X and Y axes of the moment of inertia tensor of a rigid body. (cf. Figure 1)

Table 3

Definitions of the Virtual-Bond Dihedral-Angle Parameters of the Bonded Part of the Potential Energy Functiona,b

virtual-bond dihedral angle	C₀	C₁	C₂	C₃	C₄	D₁	D₂	D₃	D₄
P5–S–P3–S3	0.553	–0.340	0.453	0.001	–0.044	0.102	–0.240	0.090	–0.145
S5–P5–S–P3	0.985	0.422	–0.485	–0.002	–0.053	–0.230	–0.343	–0.084	–0.057
S5–P5–S–B (Pur)	1.168	–0.772	–0.310	–0.008	0.023	0.530	–0.380	0.150	–0.245
S5–P5–S–B (Pyr)	1.246	–0.403	–0.212	0.095	0.082	–0.092	–0.661	–0.038	–0.067
S3–P3–S–B (Pur)	0.478	0.098	0.126	–0.043	–0.028	0.251	0.202	–0.045	–0.033
S3–P3–S-B (Pyr)	0.706	–0.069	0.316	–0.151	–0.125	–0.201	–0.085	0.043	–0.049
P5–S–B–X (Ade)	3.500	3.202				1.412
P5–S–B–X (Thy)	3.500	3.499				0.731
P5–S–B–X (Gua)	3.500	–3.497				0.650
P5–S–B–X (Cyt)	3.500	–3.497				0.153

P5–S–B–X virtual-bond dihedral angle describes rotation of the base around the S–B pseudoglycosydic bond.

cf. Figure 1. All parameters in kcal/mol.

For definition of symbols see Figure 1. S–B–X and S–B–Y are virtual-bond angles between the S–B virtual bond and the X and Y axes of the moment of inertia tensor of a rigid body. (cf. Figure 1) P5–S–B–X virtual-bond dihedral angle describes rotation of the base around the S–B pseudoglycosydic bond. cf. Figure 1. All parameters in kcal/mol.

Equations of Motion

The algorithm of Leimkuhler and Reich,[48] which is based on the RATTLE[49] constraint method applied to rotation matrices, was used for the description of rotation of rigid bodies. A slightly different version of this algorithm, based on the SHAKE[50] (rather than the RATTLE) constraint method, showed superior stability[51] compared to the other commonly used quaternion-based integration scheme. However, this algorithm (based on SHAKE) was not implemented in our work. The time-evolution of the system is defined in the vector-tensor phase space (R, p, Q, P), where p and P are the linear momentum and generalized angular momentum of the i-th particle, respectively. For a planar rigid body, the rotation matrix is a 3 × 2 tensor, subject to the orthogonality condition QT·Q = I2, where I2 is a 2 × 2 unit diagonal matrix. The leapfrog (Verlet) algorithm was used for propagation of translations, and the R-RATTLE (rotational RATTLE) discretization scheme was applied to the rigid-body rotations.[48]

Potential Energy Function

As in our UNRES model of polypeptide chains developed earlier,[52] the effective energy function of our nucleic-acid model stems from the potential of mean force (PMF) of polynucleotide systems immersed in water. The PMF is then to be expanded into a cluster-cumulant series, which, in general, gives rise to the neo-classical local and two-body and nonclassical multibody terms.[52] However, because we keep more explicit degrees of freedom than in the UNRES model, the multibody terms do not seem to be necessary. In particular, in contrast to the UNRES and NARES-2P models, we do not average over any degrees of freedom of the nucleic-acid bases, the interactions between which are highly directional. In the UNRES model, we did average over the angles of rotation (λ) of the peptide groups about the Cα...Cα axes; hence, the multibody terms in the UNRES force field are important.[52] In our highly reduced NARES-2P model of nucleic acids,[43] averaging is carried out over the rotation of nucleic bases about their long axes. The potential energy function of the system is a sum of the bonded and nonbonded parts:Although both the bonded and nonbonded parts depend on the Cartesian-rigid-body degrees of freedom (R,Q), it is more convenient to define the former as a function of the internal degrees of freedom defined in Tables 1–4 (the internal degrees of freedom are functions of the (R,Q) vector). The bonded part has a commonly used formwhere:Both bond-stretching and angle-bending terms are approximated by harmonic potentials. k and d are harmonic force constants and equilibrium distances for bond stretching interactions; kα and α are harmonic force constants and equilibrium angles for the bond-angle bending potential. The general form of the dihedral-angle torsion potentials is a fourth-order Fourier series, although such a long expansion was used only for the backbone dihedral angles (see details in the Parametrization section). C and D are fourth-order Fourier expansion coefficients. The torsional terms are the only higher (second[52]) order terms that are present in the energy function; however, they still belong to the neo-classical term set. The harmonic Uimproper potential was introduced to maintain the correct position of the center of mass of the base with respect to the sugar ring and surrounding phosphate groups. kγ and γ are harmonic force constant and equilibrium values of the improper dihedral angles. For a detailed geometric definition of the internal degrees of freedom related to the bonded potentials see Figure 1 and Tables 1–4. The force constants and equilibrium positions were determined by fitting the analytical expressions to the potentials of mean force for model systems. The procedure is described in the Parametrization section. cf. Figure 1. The nonbonded energy is a sum of van der Waals and electrostatic energies and is given bywhere the P, B, and S symbols denote the phosphate group, base, and sugar ring, respectively. The sugar–sugar and sugar–phosphate energyinclude only the Lennard-Jones term and depend only on the distance r between two beads. The Lennard-Jones potential has a commonly used formwhere, ALJ and BLJ are Lennard-Jones coefficients. The phosphate–phosphate energy additionally includes the charge–charge repulsion termThe three remaining energy terms of eq 4 involve interactions with the planar rigid-bodies (bases). The sugar–base energy is a sum of excluded volume interactions between a neutral bead and the dipolar beads. This kind of all-repulsive term is also used in the UNRES force field to account for excluded-volume interactions between the peptide groups and side chains[53]where N is the number of dipolar beads of the j-th base and r is the distance between the neutral bead of the i-th sugar and the k-th dipolar bead of the j-th base. It should be noted that r distances are functions of the distance R between the sugar and the base and the orientation Q of the base (see Figure 2).where r′ is the position of the k-th dipolar bead in its rigid-body local coordinate frame. The excluded volume potential has a formwhere the parameters AEX and BEX depend on the type of interacting beads. The phosphate-base interaction additionally includes the charge-dipole terms The final term of eq 4 has the most complicated form and describes the interaction between the dipolar beads of two baseswhere N and N are the number of dipolar beads located on the i-th and j-th bases, respectively. The distances r between dipolar beads are functions of the relative position R and orientations Q, Q of the rigid bodieswhere r′ is the position of the k-th (l-th) dipolar bead in the local coordinate frame of i-th (j-th) base. The electrostatic interactions are approximated by a multipole–multipole Debye–Hückel model[54]where q = e is the negative unit charge located on phosphate group, n = r/r is a unit vector pointing from the first to the second interaction center, and p are electric dipole vectors described in the global coordinate frame:where p′ is an electric dipole vector in the local coordinate frame of the i-th (j-th) base. Four functionsapproximate the screening of the electrostatic interactions by ions and solvent, where κ–1 is the Debye screening length. For charge–charge interactions distance-dependent dielectric constant of the following form was applied:[40,55]where ϵinf = 78.0 is a dielectric constant of bulk water and ϵint is a dielectric constant of the helix “interior”, which was taken to be 8.0. The switching distances r0 and r1, which determine boundary between unscreened and screened electrostatic interactions, were set to the values of 4 Å and 13 Å.[40] The α parameter was adjusted to keep continuity of dielectric constant. Because the interactions between the nearest neighbors, that is, particles separated by less than three virtual bonds, are included in the bonded part of the potential, they were excluded from the nonbonded interactions.

Parametrization

Nonbonded Interactions

The nonbonded interactions and solvation effects are very important for DNA folding and its conformational stability and were approximated by the excluded-volume potential (eq 10) for P–B and S–B, the Lennard-Jones potential for the remaining nonbonded interactions (eq 6), and a multipole-multipole potential energy function with Debye–Hückel screening. The optimization procedure developed previously[47] was applied to the extended system, which now includes the phosphate group, deoxyribose, and the four bases. The parameters of the B–B interactions, including the positions r′ of the dipolar beads in the local coordinate frame, the electric dipole vectors p′ in the local coordinate frame, and the Lennard-Jones parameters ALJ,BLJ, were retained from the 3445 model of ref (47). For convenience, they can also be found in Tables A-II to A-V of Supporting Information. The missing parameters for the extended system are the charges located on neutral and charged beads, their positions in the local coordinate frame, and the Lennard-Jones parameters of P–P, P–S, S–S, S–B and P–B interactions. The charge of a neutral bead was, by definition, set to zero, the charge of a charged bead was set to −|e|, which is equal to the charge of the phosphate group in the DNA backbone. The positions of the charges in both the neutral and charged beads were fixed at the center of mass of the corresponding unit. Consequently, the only parameters, which must be determined by the optimization procedure were the Lennard-Jones ALJ and BLJ for P–P, P–S, S–S, S–B, and P–B interactions. The reference energy grids for missing Lennard-Jones interactions were computed in the same manner as the reference grids in ref (47) and the optimization procedure[47] was applied to the extended system. Although it was initially planned that the Lennard-Jones potential would be applied to all pairwise interactions in the system, extensive testing of dsDNA dynamics showed that its stability was improved when the Lennard-Jones potential of P–B and S–B interactions was replaced by the excluded-volume one. Therefore, the potential was switched to an excluded volume of the form of eq 10, and the ALJ and BLJ parameters were retained from the fitting procedure, that is, AEX = ALJ, BEX = BLJ. The values of the computed Lenard-Jones and the excluded volume parameters, presented in the form of σ = (A/B)1/6 and ϵ = B2/(4A), are collected in Table A-I of the Supporting Information. The Debye screening length κ–1 was set to 8 Å, which corresponds to the physiological ionic concentration.[56] In this work, the dielectric constant parameter of the bulk water ϵ∞ was set to 78.0. The stability of the helix and folding efficiency was tested with several values of ϵint and various distances r0 and r1 of the dielectric constant function (eq 17) (see Conformational Stability in the Results and Discussion section).

Bonded Interactions

The virtual bond, virtual-bond angle, and virtual-bond dihedral angle parameters were derived to reproduce the behavior of model systems in the all-atom representation. The interactions between atoms were modeled by the AMBER ff99bsc0[57−59] force field. The potentials of mean force (PMF) of each internal degree of freedom of the coarse-grained model were calculated by using the umbrella sampling method.[60−62] Analytical functions of the form of eqs 3a–3d were then fitted to the numerical PMF’s obtained from the WHAM procedure. The parameters of all virtual bond, virtual-bond angle, and virtual-bond dihedral angle degrees of freedom were computed by using four model systems, which were obtained from three-nucleotide backbones with methyl end groups. The base of the middle nucleotide was either adenine, cytosine, guanine, or thymine, for a total of four different model systems. One such system with a cytosine base in the middle nucleotide is shown in Figure 3. This is a minimal size molecule for which every internal degree of freedom of our coarse-grained model can be defined. Terminal bases were removed to avoid implicitly including electrostatic and van der Waals interactions of bases since our model has the nonbonded parameters derived separately.[47]

Figure 3

One of four model systems used in deriving the bonded potentials. Model systems for parametrization were created from three-nucleotide chains by removing the bases from the nucleotides at the 5′- and 3′-end and replacing them with methyl groups. The middle nucleotide was left unchanged and can be cytosine (in the figure), thymine, adenine, or guanine. Umbrella sampling simulations, described in the Supporting Information, of different model systems produced results that are similar for each internal degree of freedom. Figure A-I in the Supporting Information shows the individual and average PMFs for several virtual bonds, virtual-bond angles, and virtual-bond dihedral angles. The other degrees of freedom from Tables 1–3 show similar trends. Therefore, it was decided to average the degrees of freedom for purines (model systems with adenine or guanine) and pyrimidines (model systems with cytosine or thymine), except for the degrees of freedom such as the S–P–S virtual-bond angle and the P5–S–P3–S3, S5–P5–S–P3 virtual-bond dihedral angle, which contains two deoxyriboses, and therefore, the base type cannot be uniquely assigned. For these angles, the averaging was done across all four model systems (bases). Some degrees of freedom, which influence orientation of bases with respect to glycosidic bond, were assigned separately for each type of base (see Tables 2 and 3). For each degree of freedom, the distributions of data from each window and for appropriate model systems were combined, the biasing potential was removed, and the PMF along that coordinate was calculated by using the WHAM procedure[60−62] implemented in a program by Dr. A. Grossfield.[63] The analytical expressions of the form of eqs 3a–3d were then fitted to PMF’s. The fitting was done using a trust-region[64] nonlinear least-squares fit algorithm as implemented in MATLAB software. Figure A-II shows the comparison of several PMFs derived from the simulations with the potential calculated using analytical expressions. Although both the force constants and the equilibrium positions for the virtual bonds and virtual-bond angles were determined by fitting to the PMFs of model systems, it was decided to shift them slightly to match the equilibrium values of the “ideal” B-DNA conformation,[65] in the coarse-grained model. The minima of the virtual-bond dihedral angles were left unmodified. Additionally, harmonic virtual-bond improper P5–P3–S–B with arbitrary force constant and an equilibrium values set to those of the “ideal“ B-DNA[65] were incorporated. The rotation of the base around the virtual glycosidic bond (S–B) was restricted by a cosine-shaped potential with the minimum corresponding to the anti conformation of the base and the barrier height was preliminary set to 7.0 kcal/mol. The internal degrees of freedom for which the bonded terms were determined and used in the model are listed in Tables 1–4. We note that a model system and bonded degrees of freedom almost identical to those presented here were used in work by Morriss-Andrews et al.[34] However, the predicted values for the equilibrium values and force constants were somewhat different. There are several reasons for this, first, Morriss-Andrews et al. use statistics from free simulations and CHARM27 force field to calculate the degrees of freedom while this work used umbrella sampling along the degrees of freedom and Amber’s ff99bsc0 force field. Most importantly though, the position of the sugar in our model is at the center of mass of sugar ring while they set it at the C1′ atom. This affects not only the equilibrium values but the force constants as well, which makes the direct comparison between bonded degrees of freedom of two models difficult.

Model Systems

The model was tested on the following tree systems, for which the structure was determined experimentally: Drew–Dickerson dodecamer, 12 base-pair dsDNA with the sequence 5-CGCGAATTCGC-3′ and its compliment, which has exactly the same sequence (PDB code 1BNA), Narayana–Weiss hexadecamer, 16 base-pair dsDNA with the sequence 5′-ACTACAATGTTGCAAT-3′ and its compliment 5′-ATTGCAACATTGTAG-3′ (PDB code 3BSE). 21 base-pair dsDNA with the sequence 5′-ACAGCTTATCATCGATCACGT-3′ and its compliment 5′-ACGTGATCGATGATAAGCTGT-3′ (PDB code 2JYK). The following 60-nuclotide sequence of ref (34) was used for computation of persistence lengths of single and double stranded DNA: HETSS = d(CATCCTCGACAATCGGAACCAGGAAGCGCCCCGCAACTCTGCCGCGATCGGTGTTCGCCT) HETDS = HETSS + complementary strand

Canonical and Microcanonical Ensemble Simulations. Simulated Annealing

For canonical ensemble simulations, Berendsen’s thermal-bath weak-coupling algorithm[66] was implemented. The coupling time was set to 0.1 ps. The initial velocities of the rigid bodies were drawn from the Maxwell distribution. The stability of the integration algorithm was checked by simulations of the model system in the microcanonical ensemble. The energy of the system was minimized and initial velocities were assigned from the Maxwell distribution; then, the integration of the equations of motion was performed without coupling of the system to the thermal bath. The ability of the model to fold double-stranded DNA from separated chains was tested with a two-stage simulated-annealing[67,68] procedure. Two complementary strands of each molecule were separated by a random distance and randomly rotated with respect to each other. The separation distance ensured that two strands did not overlap for any random orientation and was different for chains of various lengths. For 1BNA, the separation distance between centers of mass was picked randomly from 40 to 45 Å interval and for 3BSE and 2JYK separation intervals were 50–55 Å and 60–65 Å, respectively. In the first stage, the initial structure was heated up to 600 K (620 for 2JYK) with velocities randomly assigned from Maxwell distribution for 0.1 ns and then cooled to 300 K (320 for 2JYK) with a temperature step ΔT = 50 K. The final temperature was set approximately around 20 K below melting temperature of each molecule. The simulation at each temperature was 0.1 ns long. At the end of each simulation, the presence of contacts between any two bases from separate strands was checked. A contact was assumed to be present when the centers of mass of any two bases from separate chains came within a distances of at most 6.5 Å. If at least one contact was present, the simulation was extended by 1.5 ns for Drew–Dickerson (2.0 ns and 2.5 ns for Narayana–Weiss and 2JYK, respectively) at 300 K in the second stage of the canonical simulation; otherwise, the cycle was terminated and the final structure was excluded from the analysis. The total simulation time of a two-stage simulated annealing cycle was 2.2–3.2 ns (depending on the chain length), and each one was followed by a short energy minimization with Powell’s algorithm.[69] The cycle was repeated several thousand times. The number of SA cycles performed for each system is shown in Table 5. Dissociation of chains was prevented by application of a flat-bottom restraint on the distance between their centers of mass with a force constant kCM–CM = 1 kcal/(mol Å2) and a minimum distance of d0 = 40 Å for 1BNA (50 Å for 3BSE and 60 Å for 2JYK), at which the attractive harmonic force was switched on. The artificial restraint force was switched off when the distance between the centers of mass dropped below 40 Å for 1BNA (50 Å for 3BSE and 60 Å for 2JYK).

Table 5

Statistics of dsDNA Folding for the Nearest-Neighbor (NN) Modela

	1BNA		3BSE		2JYK
trajectories	1352		2428		1937
long trajectories	670	100%	147	100%	161	100%
0 native contacts	160	24%	63	43%	68	42%
<25% no. contacts	277	41%	54	37%	67	42%
25–50% no. contacts	21	3%	2	1%	2	1%
50–75% no. contacts	54	8%	4	3%	0	0%
>75% no. contacts	158	24%	24	16%	24	15%

The first row shows the total numbers of simulated annealing cycles computed for each molecule. The second row shows the number of trajectories for which the cooling cycle ended up with at least one contact between bases. Statistics of contacts for these trajectories, which were extended by 1.5 ns to 2.5 ns canonical simulation, is shown in the next 5 rows. Numbers of long trajectories with various percentage of native contacts are presented in rows 3–7.

Mechanical Properties of DNA: Persistence Length

The persistence length lp is a measure of flexibility of a polymer and can be computed from decay of correlation function of unit vectors tangent to helical axiswhere s is the position along the length of the chain.[70,71] For ssDNA the tangent vector is defined as a normalized vector between consecutive neutral beads (centers of mass of deoxyriboses). For dsDNA tangent vector is defined as normalized vector connecting midpoints of sugar–sugar distances of complementary bases, separated by 5 base pairs.

Implementation

A new software package for this coarse-grained model, composed of rigid-bodies, was developed. The software is written in C/C++ programming language and currently can perform the following tasks: read the PDB-format structure and transform it into the coarse-grained representation, optimize the geometry of the molecule (by energy minimization), perform microcanonical and canonical ensemble simulations with Berendsen’s weak thermal-bath coupling, and perform simulations of heating and simulated annealing cycles. The force and energy calculations were parallelized using OpenMP libraries.[72] The efficiency of the parallelization up to 12 cores is shown in Figure 4. The test was performed on the Drew–Dickerson dodecamer.

Figure 4

Efficiency of code parallelization.

Results and Discussion

Numerical Stability, Time Step, and Speedup

The stability of the Verlet-R-RATTLE algorithm was tested on the Drew–Dickerson double-helix. Microcanonical simulations of 10 ns length were run with various time-steps: dt = 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 fs. The stability of the total energy is shown in Figure 5. The algorithm maintains good stability up to dt = 22 fs resulting in an order of magnitude speedup compared to the commonly used all-atom time-step of 2 fs (with SHAKE applied on hydrogens), and shows superior stability compared to the quaternion-based rigid-body integration scheme, applied previously[19] to the model of a DNA chain. The drift of the energy over a 20 ns trajectory with dt = 22 fs was only 0.017 kcal/mol, compared to ∼10 kcal/mol over 1 ns simulation with the same time-step using a quaternion integration scheme (see Figure 5 in ref (19)). The drifts of the energy and the RMS fluctuations for various time-steps are collected in Table 6. All canonical simulations and simulated annealing cycles utilized time-step dt = 10 fs.

Figure 5

Stability of the Verlet-R-RATTLE integration algorithm over 0.1 μs microcanonical simulation of the Drew–Dickerson dodecamer with time-steps ranging from 10 fs to 28 fs.

Table 6

RMS Fluctuations and Drifts of the Total Energies Obtained from 10 ns Simulations of the Drew–Dickerson Dodecamer in the Microcanonical Ensemblea

Δt (fs)	10	12	14	16	18	20	22	24	26	28
RMSF (kcal/mol)	0.05	0.06	0.09	0.12	0.15	0.20	0.23	0.39	1.0	1.7
drift (kcal/(mol ns))	0.0002	0.007	0.0011	0.0021	–0.015	–0.015	0.017	0.053	0.371	1.8

RMS fluctuations were calculated for each trajectory with drift subtracted. Drift was calculated as a difference between the averages of the last and the first 1 ns of each trajectory.

Stability of the Verlet-R-RATTLE integration algorithm over 0.1 μs microcanonical simulation of the Drew–Dickerson dodecamer with time-steps ranging from 10 fs to 28 fs. RMS fluctuations were calculated for each trajectory with drift subtracted. Drift was calculated as a difference between the averages of the last and the first 1 ns of each trajectory. The efficiency of the model was compared to that of all-atom simulations performed with the AMBER 11 package. The model system was the Drew–Dickerson dodecamer immersed in a truncated octahedron box filled with 5099 TIP3P water molecules. A 10 Å cutoff and the particle-mesh Ewald summation method were applied for electrostatic interactions. The simulation of 1 ns took 17476 s on 8-cores using Intel Xeon E5645 2.4 GHz CPUs of the computing cluster of the Chemistry Research Computing Facility of Baker Laboratory of Chemistry and Chemical Biology, Cornell University. The same simulation with our coarse-grained model and dt = 20 fs took 235 s with the same CPU, leading to almost 2 orders of magnitude speedup. It should be noted that almost 2 orders of magnitude speedup was achieved without any cutoffs applied to the nonbonded interactions. Implementation of cutoffs, especially to the short-range Lennard-Jones (U ∼ r–6) and the dipole–dipole (U ∼ r–3) interactions will further increase the speed of computation, especially for long chains.

Conformational Stability

The conformational stability of double-helices was tested with the following procedure. The initial crystal structure (3BNA, 3BSA, and 2JYK) was heated from 100 to 300 K with ΔT = 50 K. The simulation at each temperature was 0.1 ns long, and the production run at 300 K was 0.1 μs long. Snapshots of trajectories were recorded every 1 ps. The initial simulations of the Drew–Dickerson dodecamer led to a collapse of the helical structure, caused by overestimation of the strength of the Lennard-Jones interactions with phosphates and sugars, which were fitted to reproduce the AMBER potential energy in vacuum. The stability of the system was improved by scaling of Lennard-Jones interaction energy between neutral and charged beads (P–P, S–P, and S–S) by a factor of 0.1. The stability of a double-helix was also tested with varying r0 and r1 parameters of eq 17. Table 7 shows the average number of native contacts as a function of distances r0 and r1 for Drew-Dickerson dodecamer with ϵint = 8.0. The largest average number of native contacts is located in the lower right corner of Table 7, which means that sufficient electrostatic repulsion of phosphate groups of interacting chains is necessary for stability of double-helix. Smaller switching distances r0 and r1 causes instability of the helix termini, which can even destabilize helix interior. On the other hand the reduced (and less physical) model, in which intrastrand base–base interactions were limited only to nearest-neighbors, was tested. The NN model generates very stable double helices even for short switching distances r0 = 4 Å and r1 = 13 Å, as applied in the ref (40). It seems that stronger electrostatic repulsion of phosphate groups is necessary to prevent deformations of helix termini caused by base–base intrastrand interactions. On the other hand, stronger electrostatic repulsion affects folding rate in the fast simulated annealing procedure (see section Folding of dsDNA from Separate Chains).

Table 7

Average Number of Native Contacts Obtained from 0.1 μs Trajectories of Drew–Dickerson Dodecamer for Various Dielectric Constant Switching Values r0 and r1 (See Equation 17)a

	r₁(Å)
r₀(Å)	13	14	15	16	17	18	19	20
4	7.2	8.6	8.6	8.2	8.5	9.0	9.1	9.0
6	8.0	6.7	7.8	8.9	9.4	9.0	9.5	8.6
7	7.0	8.6	9.2	8.9	9.7	9.9	9.5	9.9
8	8.3	7.8	8.3	9.3	9.9	10.0	9.6	10.2

The dielectric constant of helix interior was set to 8.0.

The dielectric constant of helix interior was set to 8.0. Table 8 collects mean helical parameters obtained from trajectories for three model systems for full physics-based model and for the simplified nearest-neighbor model. These parameters were calculated from coarse-grained representations of experimental reference structures and the structures obtained in simulations, respectively. A precise determination of all helical parameters would require conversion of the coarse-grained structurs to an all-atom representation; however, the values calculated by using a coarse-grained representation are sufficient for comparison of the simulated structures with the respective experimental structures. For all model systems, the average number of native contacts is larger for the NN model. For shorter molecules, 1BNA and 3BSE, the radius of gyration within NN approximation is close to the value obtained for experimental structure. For the PB model applied to 1BNA and 3BSE the radius of gyration is slightly too short, which is caused by deformation of helix termini. For the longest molecule both models overestimates the radius of gyration. Both models underestimate helix radius and overestimate helical rise for all dsDNAs, although these discrepancies are relatively small. Larger deviations are observed for the size of minor groove, which is significantly overestimated by both models. The major groove size is relatively well preserved.

Table 8

Mean Parameters Describing the Geometry of the Helix During 0.1 μs Simulation and Comparison to the Crystal Structure Geometrya

param. (Å)	native contacts	radius of gyration	helix diameter	riseb	major groove width	minor groove width
Drew–Dickerson (PB)	7.3(2.0)	12.5(0.4)	18.1(0.9)	3.4(0.6)	15.5(2.2)	15.5(2.0)
Drew–Dickerson (NN)	10.5(1.2)	13.5(0.3)	18.1(0.7)	3.7(0.4)	16.2(2.6)	15.6(1.8)
Drew–Dickerson (Exp)	12.0	13.2	18.9(0.6)	3.5(0.2)	17.5(0.5)	11.1(1.8)
Narayana–Weiss (PB)	11.8(1.9)	16.6(0.5)	18.2(0.8)	3.5(0.5)	15.9(2.3)	15.7(2.1)
Narayana–Weiss (NN)	13.3(1.6)	17.1(0.4)	18.3(0.7)	3.7(0.4)	16.3(2.2)	15.8(2.0)
Narayana–Weiss (Exp)	16.0	17.1	19.2(1.0)	3.4(0.1)	18.2(0.8)	11.6(1.2)
2JYK (PB)	16.3(1.9)	20.7(0.6)	18.3(0.7)	3.5(0.6)	16.2(2.5)	15.7(2.0)
2JYK (NN)	18.0(1.6)	21.4(0.6)	18.3(0.7)	3.7(0.4)	16.6(2.4)	15.8(1.9)
2JYK (Exp)	21.0	19.0	19.3(0.2)	3.5(0.3)	17.7(1.0)	10.8(1.3)

Numbers in parentheses are standard deviations. PB and NN denotes the full physics-based and the nearest-neighbor model, respectively.

The helical rise is approximated by the distance between consecutive centaral points of lines connecting centers of mass of Watson–Crick-paired bases.

Numbers in parentheses are standard deviations. PB and NN denotes the full physics-based and the nearest-neighbor model, respectively. The helical rise is approximated by the distance between consecutive centaral points of lines connecting centers of mass of Watson–Crick-paired bases.

Persistence Length

Persistence lengths of both HETSS and HETDS were computed from 0.1 μs canonical simulations at 300 K with 145 mM salt concentration for both full PB and simplified NN model. The results are presented in Table 9. For both chains PB model produces smaller persistence lengths than NN model, although the relative difference is much bigger for ssDNA. This behavior is expected, because switching off some of the intrastrand base–base interactions should decrease bending of the chain, especially for ssDNA, in which bases are exposed to solvent and can easily form intrastrand hairpins with complementary bases. Figure 6 presents typical configurations of HET60SS and HET60DS molecules taken from T = 300 K canonical simulation. The conformation of ssDNA taken from full PB model simulation differs significantly from the one taken from NN approximation. The ssDNA obtained from full PB model bends and forms hairpins, which can act as kinetic traps in the folding process of dsDNA.

Table 9

Persistence Lengths of HET60SS and HET60DS for Full PB and Reduced NN Modelsa

persistence lengths (Å)	HET60_SS	HET60_DS
full PB model	6.5	151.7
reduced NN model	28.3	158.3

Persistence lengths were obtained by fitting of eq 18 the the correlation functions obtained from canonical ensemble trajectory computed at 300 K with 145 mM salt concentration.

Figure 6

Typical configurations of HET60DS (left panel), HET60SS within nearest-neighbor approximation (middle panel) and HET60SS for full physics-based model (right panel).

Typical configurations of HET60DS (left panel), HET60SS within nearest-neighbor approximation (middle panel) and HET60SS for full physics-based model (right panel). Persistence lengths were obtained by fitting of eq 18 the the correlation functions obtained from canonical ensemble trajectory computed at 300 K with 145 mM salt concentration. The value of persistence length of ssDNA obtained with NN model agrees reasonably well with experimental values.[73] The persistence length of dsDNA appears to be around three times too small compared to experimental values,[74−77] and it is close to the value computed by early version of 3SPN model.[27] Although the agreement with experimental data is not perfect it appears to be quite satisfactory for “bottom-up”-parametrized physics-based coarse-grained model and this discrepancy will be corrected in next versions of the model.

Mismatches

The influence of mismatches was tested on the 30-bp system of 10 G/G pairs flanked by 10 A/T pairs at both 5′ and 3′-end. During the 0.1 μs canonical simulation at 300 K, opening of mismatched G/G pairs was not observed. Although interaction of complementary bases is different than that between mismatched bases, the model does not seem to be very specific with respect to base–base interactions. This problem may be result of exaggeration of base–base Lennard-Jones interactions with respect to electrostatics and will be addressed in the next version of the model.

Folding of dsDNA from Separate Chains

The simulated annealing (SA) procedure described in the Methods section was applied to some model systems. The folding efficiency of the full PB model was low. Only 3 out of 2100 trajectories led to the final helical structures with more than 10 native base–base contacts for Drew–Dickerson dodecamer. Longer molecules did not fold at all. Much better folding rate was achieved with NN model in which intrastrand kinetic traps (hairpins), resulting from intrastrand long-range base–base interactions, were eliminated. The statistics of the folding process of three model systems within NN approximation are presented in Table 5. After the majority of simulated annealing cooling cycles (first stage), the chains did not come into contact because of electrostatic repulsion of the highly charged chains and the relatively short time of the simulation. The frequency of making at least one contact after a cooling cycle is different for each simulated system and changes from 50% for the dodecamer to 6% for the hexadecamer, and to 8% for the 21-bp system. This phenomenon can be explained by increasing volume of an accessible space for longer chains, which decreases probability of making contact during fast cooling process. The effect is enhanced by increased electrostatic repulsion of the longer negatively charged backbones. After the second stage of simulated annealing, native-like structures were obtained. The percentage of the native-like structures, based on the number of native contacts, is shown in Table 5. It should be noted that the numbers summarized in Table 5 were obtained from SA calculations which do not reflect the actual amount of the native-like structures that a force-field can produce. The cooling time is always too small to obtain an equilibrated ensemble. The amounts of contacts in Table 5 are, therefore, underestimated. Nevertheless 24% of extended trajectories led to double-helical structure with more than 10 native contacts for 1BNA molecule. Roughly the same number of structures broke up into separate chains or formed misfolded structures without native contacts. 41% trajectories ended up in misfolded structures with 1–3 native contacts, which were mostly parallel dsDNA’s, in which 5′-end of one chain makes nonative contacts with 5′-end of the other chain. Figure 7 shows the potential energy of final structures as a function of all-bead RMSD computed with respect to experimental structure. Two low-energy clusters of structures are clearly visible. The one located in the lower left corner of Figure 7 is a cluster of correctly folded antiparallel dsDNA’s and the other one represents misfolded parallel dsDNA’s, in which 3′-ends (and 5′-ends) of chains are paired. The important observation that comes from this graph is that lowest energy structure is clearly located in the correctly folded group and its energy is around 10 kcal/mol lower than misfolded parallel dsDNA, although the lowest energy structure is not the structure with lowest RMSD. The mean potential energy of antiparallel dsDNA cluster is around 8 kcal/mol lower than the mean energy of the parallel DNA cluster, which clearly favors correctly folded structures, although this difference for real systems might be larger (experimental structures of the parallel Drew–Dickerson dodecamer were not observed) and further refinement of the electrostatics and/or Lennard-Jones balance of the model is required to increase this gap and improve specificity of base–base interactions.

Figure 7

Energy vs RMSD with respect to the experimental structure for the 1BNA dodecamer. Two major clusters, corresponding to antiparallel and parallel dsDNA, and the energy-minimized experimental structure are pointed by arrows. The reference structure marked by bigger dot in Figure 7 is the energy-minimized experimental structure. The optimization of experimental structure was necessary to remove small steric clashes in original structure caused by imperfectness of the coarse-grained model. The potential energy of original 1BNA molecule is more than 100 kcal/mol higher than that of the optimized one, but the all-bead RMSD between the optimized and original structure is only 0.4 Å. This effect can be expected for the coarse-grained model with force-field parameters developed in the “bottom-up” fashion. The potential energy of the reference structure is almost exactly the same (<1 kcal/mol) as the lowest-energy structure obtained from simulated annealing procedure. The folding efficiency of longer polymers decreases as the dimensionality of phases spaces increases (see Table 5). 16% of long trajectories of 3BSE molecule led to final structures with more than 75% of native contacts (13–16 contacts). For the 2JYK molecule, the folding rate is almost the same as that for 3BSE and equals 15%. Nevertheless, the energy vs RMSD graphs (Figures 8 and 9) are qualitatively similar to these obtained for 1BNA molecule (Figure 7). In both cases, two clusters of antiparallel and parallel dsDNAs are clearly visible and correct—antiparallel cluster has mean potential energy around 20 kcal/mol lower than this of incorrect—parallel dsDNA cluster. These values should effectively discriminate misfolded structures in the real-, long-time folding process. This mean-energy difference increases to 22 kcal/mol for the longest 2JYK molecule. The energy-minimized reference structures, marked by bigger dot in Figures 8 and 9, have RMSD only 0.6 Å away from original structure. In both cases, the reference structures have potential energy around 10 kcal/mol lower than these obtained from simulated annealing procedure. This observation suggests that longer cooling process should lead to improvements in geometry of final structures, as the minimum of potential energy was not reached by any simulated annealing cycle for longer model systems.

Figure 8

Same as Figure 7 but for the 3BSE hexadecamer.

Figure 9

Same as Figure 7 but for the 2JYK molecule.

Same as Figure 7 but for the 3BSE hexadecamer. Same as Figure 7 but for the 2JYK molecule. Figure 10 presents the three lowest-energy structures of tested molecules superimposed on their experimental structure. The all-bead RMSDs of 1BNA, 3BSE, and 2JYK molecules with respect to experimental references are 2.1 Å, 3.1 Å, and 4.2 Å, respectively, and as expected, they increase with the length of polymers. All structures clearly form double-helices, although some discrepancies of the geometry are visible, especially for the longest molecule, for which overestimation of the size of minor groove is clearly visible.

Figure 10

Lowest-energy structures of 1BNA, 3BSE, and 2JYK molecules obtained from extended simulated annealing procedure, superimposed on experimental structures. Their all-bead RMSDs are 2.1 Å, 3.1 Å, and 4.2 Å, respectively. The reference structures are shown in red. The backbone of the computed structures is shown in blue and the bases are A (cyan), T (pink), G (green), C (yellow). Analysis of 158 trajectories of 1BNA, which led to structures with 10–12 native contacts revealed two qualitatively different mechanisms of folding. The representative of the first most common mechanism (93 of 158 trajectories) is shown in Figure 11. Two separated chains (0 ns) were heated to 600 K and evolved into random chains (0.11 ns, T = 550 K). At 0.56 ns (T = 350 K) the 5′-GCGC-end established native contacts with the complementary 3′-CGCG-end, and since then, more native contacts were formed, and the dsDNA molecule was zipped. At 0.68 ns, all native Watson–Crick hydrogen bonds were formed. This type of folding mechanism is referred to as zippering.[31]

Figure 11

Snapshots of trajectory of a zippering mechanism of the folding process of the Drew–Dickerson dodecamer.

Snapshots of trajectory of a zippering mechanism of the folding process of the Drew–Dickerson dodecamer. A different and less common folding process (65 of 158 trajectories) is shown in Figure 12. It was started from two separated ssDNA chains, which were heated to 600 K and, after 0.3 ns (T = 500 K), evolved into a completely random conformation. First, contacts between the bases of the 5′-ends of the chains were established after 0.49 ns. It should be noted that these are non-native Watson–Crick G/C contacts. After that, the chains started to slide with respect to each other along a Watson–Crick interface. After 0.89 ns, two G/C contacts were estabilished but also A/C and A/G mismatched pairs were formed. A 1.4 ns snapshot shows a molecule with 10 contacts established and with distorted termini. After 1.7 ns a full helix with all native base–base contacts was formed. This type of folding mechanism is referred to as slithering.[31]

Figure 12

Snapshots of trajectory of a slithering mechanism of the folding process of the Drew–Dickerson dodecamer.

Snapshots of trajectory of a slithering mechanism of the folding process of the Drew–Dickerson dodecamer. As can be expected, the zippering mechanism was also observed for longer molecules, 3BSE and 2JYK. Nevertheless, some trajectories with a slithering folding mechanism were also recorded. At first, this observation is surprising because, as opposed to the 1BNA molecule, the initial mis-alignment of the two strands creates a number of mismatched base–base pairs. Therefore, the fact that the simulated folding of 3BSE and 2JYK proceeded almost as frequently by the zippering and by the slithering mechanism probably resulted from low base–base pairing specificity of the model. The snapshot from helix initiation process for 3BSE molecule is shown in Figure 13a). The 3′-end of a complementary strand made contacts with 3′-end of the leading strand. Besides three nonative Watson–Crick contacts (two A/T and one G/C) and two mismatched contacts (G/T and T/T) were established. Three Watson–Crick pairs and two mismatched pairs form initial structure from which monomers start slithering with respect to each other until final proper double-helix is formed. The helix nucleation occurred between nucleotides of 3′-ends but also the mechanisms where 3′-end (or 5′-end) of one chain made a contact with midsection of the other chain were observed.

Figure 13

Examples of slithering pathways of DNA hybridization of (a) 3BSE and (b) 2JYK molecules.

Examples of slithering pathways of DNA hybridization of (a) 3BSE and (b) 2JYK molecules. For the longest chain, the slithering mechanism was observed only for two out of 22 folding trajectories. In one case, nucleation started from formation of contacts of 3′-end of one chain with middle section of the other chain (see Figure 13b). In the second slithering trajectory, the 5′-end made initial contact with middle section of the other chain. No trajectory where two 3′-ends (or 5′-ends) made initial contact was recorded. At this point, the analysis of DNA duplex hybridization is reduced only to qualitative observations presented above. Further development of the model is necessary, especially further reduction of computational cost via implementation of cutoffs, to obtain large set of trajectories, which is required for proper analysis of thermodynamics and kinetics of DNA hybridization process. Nevertheless, the mechanism of hybridization can be qualitatively compared to hybridization mechanisms proposed by de Pablo[30] and Doye[44] groups. The former group showed two mechanisms also called slithering and zippering, which follows double-helix nucleation process.[30] The former mechanism is dominant for repetitive sequences and zippering is characteristic of random sequences. Both mechanisms seem to be similar to folding mechanisms obtained with our dipolar bead model, although it should be noticed that sequences of simulated systems are not exactly the same. The Drew–Dickerson dodecamer is symmetric and might have pathways of folding that are similar to slithering mechanism observed for 3SPN model. Our model shows that slithering and zippering mechanisms for 1BNA are almost equally probable. The 3BSE and 2JYK molecules have a sequence that can be described as random, and their dominant folding mechanism is molecular zippering. The zippering mechanism was also observed for the model of Ouldridge et al.[44] They showed that there is a bias toward initial structures with contacts between ends of the strands, an observation that can be confirmed by our model. Almost all zippering trajectories were initiated by contacts of monomers termini. One of the mechanisms of duplex formation observed for repetitive sequences, so call inchworm mechanism, was not observed in our simulations, although we did not test polymers with exactly the same sequence. The inchworm mechanism may also be related to the specificity of base–base interactions. It should also be noticed that tendency for slithering, instead of inchworm-like behavior, in our model may result from insufficient specificity of base–base interactions. This problem will be addressed in the future development of the model. Finally, it should be stressed that these results were obtained in unrestrained simulations started from separated chains. This mode of simulations is different from that of many other works in which simulations are started from the experimental structure. Unlike many other force-fields developed for DNA, our force-field, therefore, has ab initio folding capacity.

Influence of Backbone Dihedral Angle Potentials on the Formation of Double-Helix

In our recent paper, it was shown that dihedral angle bonded potentials are not prerequisites for double-helix formation, which is rather driven by base–base dipole interaction.[43] Also, the model developed by Ouldridge et al. does not require backbone dihedral angle potentials to obtain DNA duplex hybridization.[36] A question can, therefore, be asked: Are the backbone potentials a necessary factor for double-helix formation, or is this process solely dependent on the base stacking and pairing? The problem was addressed by running the simulated annealing procedure with P5–S–P3–S3, S5–P5–S–P3, S5–P5–S–B, and S3–P3–S–B dihedral angle potentials switched off. The test performed on the Drew–Dickerson dodecamer showed the same efficiency of double-helix hybridization process (24%). The lowest energy structure shown in Figure 14a is only 3.2 Å away from the experimental structure. Next, improper rotamer-like potentials P5–P3–B–S, P5–S–B, and P3–S–B were additionally switched off. The lowest energy structures have a ladder-like shape, as shown in Figure 14b. It seems that backbone dihedral angle potentials are not really necessary for DNA duplex hybridization. Switching off rotamer-like potentials (P5–S–B, P3–S–B, and P5–P3–S–B) causes formation of ladder-like structures instead of double-helices. This effect might be related to the lack of specificity of base–base interactions, which might be the result of overestimation of base–base van der Waals interactions with respect to dipole–dipole electrostatic interactions. Nevertheless, the dipolar-bead model suggests that only backbone bond-stretching, angle-bending, and rotamer-like potentials are necessary for double stranded DNA hybridization.

Figure 14

Lowest energy structures obtained from simulated annealing procedure with the NN model with some of bonded potentials switched off. (a) Dihedral angle potentials S5–P5–S–P3, P5–S–P3–S3, S5–P5–S–B, and P3–S3–S–B switched off. (b) Additionally, rotamer-like potentials P5–S–B, P3–S–B, and P5–P3–S–B switched off.

Conclusions

A physics-based rigid-body coarse-grained model of DNA is proposed. The model was parametrized in the “bottom-up” fashion. The bonded interactions are fitted to the potentials of mean force of the model system. The equilibrium virtual-bond lengths and virtual-bond angles were set to the values of the ideal B-DNA double-helix. The nonbonded interactions are approximated by Lennard-Jones, excluded volume and electrostatic interactions of charges and dipoles. The solvation and ionic cloud screening is approximated by the Debye–Hückel model. The model does not incorporate any Go̅-like potentials, which force the structure toward the experimental one. In the full physics-based model, nonbonded potentials do not distinguish between near-neighbors and any other base pairs. A full physics-based version of the model the nearest-neighbor approximation, in which intrastrand base–base interactions are limited to nearest-neighbors only, was tested. It should be stressed that in the reduced model all interstrand base–base interactions are present. The R-RATTLE rigid-body integration algorithm, implemented in our software package, demonstrated superior stability compared to a quaternion-based algorithm, with low drift of the total energy even for a time-step as long as 20 fs. The total speedup of computations compared to all-atom simulations is close to 2 orders of magnitude. This result was achieved without application of cutoffs on the nonbonded interactions. Stable double-helix trajectories were obtained for both PB and NN models after adjustment of several force-field parameters as discussed in the Conformational Stability section. The geometry of two experimental DNA structures is preserved during 0.1 μs simulations at 300 K. The persistence lengths of ssDNA and dsDNA are underestimated by the factor of 2–3 compared to experimental data, discrepancy that seems to be reasonable for physics-based model designed in the “bottom-up” fashion. The other drawback of the model, which seems to have the same origin as discrepancy in persistence lengths, is insufficient specificity of base–base interactions, which affects simulations of dsDNA with mismatches. Opening of mismatched pairs was not observed during 0.1 μs canonical simulations at 300 K. Nevertheless, the model successfully folds short (12–21 bp) DNAs from separated strands. The full PB model is only able to fold shortest double-stranded DNA dodecamer from separated chains with relatively low efficiency caused by formation of intrastrand hairpins—kinetic traps in the folding process. The reduced NN model removes kinetic traps and greatly improves the folding efficiency. All three tested molecules with lengths varying from 12 to 21 bp were folded from separate strands by NN model, although the efficiency of folding drops with increasing size of the simulated system. The possible explanation is that the cooling process is too fast to sample conformational space effectively and find the global minimum on the complicated multi-minima free energy hyper-surface. Application of a replica-exchange method, which will be implemented in our software package, should help to explain the cause of the low helical populations produced by the simulated annealing procedure. The model predicts two mechanisms of dsDNA hybridization. The more frequent process called zippering starts from formation of native base–base contacts between 3′ and 5′-ends of two chains and is followed by helix propagation. In the second, less frequent process called slithering, initial non-native Watson–Crick contacts (also mismatched contacts for 3BSE and 2JYK) are established between 5′-ends (or 3′-ends) of two strands; then, chains slide with respect to each other along the Watson–Crick interface until all native contacts are formed. This process was observed for all model systems although it is much more frequent for the dodecamer, which has specific sequence, than for other molecules. The slithering process might also be an artifact caused by insufficient specificity of base–base interactions. Finally, the influence of backbone dihedral angle and rotamer-like potentials on a double-helix formation process was tested. The backbone dihedral potentials do not seem to be a necessary prerequisite for dsDNA hybridization, but without rotamer-like potentials, ladder-like structures instead of double helices were observed. The model has some weaknesses, such as low specificity of base–base interactions, underestimation of persistence lengths of both single and double-stranded DNA, and small influence of mismatched pairs on double-helix dynamics. The origin of mentioned problems most probably arises from overestimation of Lennard-Jones base–base interactions, and we will address these problems in the next evaluations of the dipolar bead model.

49 in total

1. Escaping free-energy minima.

Authors: Alessandro Laio; Michele Parrinello
Journal: Proc Natl Acad Sci U S A Date: 2002-09-23 Impact factor: 11.205

2. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid.

Authors: J D WATSON; F H CRICK
Journal: Nature Date: 1953-04-25 Impact factor: 49.962

3. Coarse-grained force field for the nucleosome from self-consistent multiscaling.

Authors: Karine Voltz; Joanna Trylska; Valentina Tozzini; Vandana Kurkal-Siebert; Jörg Langowski; Jeremy Smith
Journal: J Comput Chem Date: 2008-07-15 Impact factor: 3.376

4. Distance-dependent dielectric constants and their application to double-helical DNA.

Authors: J Mazur; R L Jernigan
Journal: Biopolymers Date: 1991-11 Impact factor: 2.505

5. Sequence effects in the melting and renaturation of short DNA oligonucleotides: structure and mechanistic pathways.

Authors: E J Sambriski; V Ortiz; J J de Pablo
Journal: J Phys Condens Matter Date: 2008-12-17 Impact factor: 2.333

6. A systematically coarse-grained model for DNA and its predictions for persistence length, stacking, twist, and chirality.

Authors: Alex Morriss-Andrews; Joerg Rottler; Steven S Plotkin
Journal: J Chem Phys Date: 2010-01-21 Impact factor: 3.488

7. Simulation of DNA double-strand dissociation and formation during replica-exchange molecular dynamics simulations.

Authors: Srinivasaraghavan Kannan; Martin Zacharias
Journal: Phys Chem Chem Phys Date: 2009-09-15 Impact factor: 3.676

8. Overstretching B-DNA: the elastic response of individual double-stranded and single-stranded DNA molecules.

Authors: S B Smith; Y Cui; C Bustamante
Journal: Science Date: 1996-02-09 Impact factor: 47.728

9. Prediction of protein conformation on the basis of a search for compact structures: test on avian pancreatic polypeptide.

Authors: A Liwo; M R Pincus; R J Wawak; S Rackovsky; H A Scheraga
Journal: Protein Sci Date: 1993-10 Impact factor: 6.725

10. Coarse-grained model of nucleic acid bases.

Authors: Maciej Maciejczyk; Aleksandar Spasic; Adam Liwo; Harold A Scheraga
Journal: J Comput Chem Date: 2010-06 Impact factor: 3.376

10 in total

1. Open-Boundary Molecular Dynamics of a DNA Molecule in a Hybrid Explicit/Implicit Salt Solution.

Authors: Julija Zavadlav; Jurij Sablić; Rudolf Podgornik; Matej Praprotnik
Journal: Biophys J Date: 2018-04-09 Impact factor: 4.033

2. The "sugar" coarse-grained DNA model.

Authors: N A Kovaleva; I P Koroleva Kikot; M A Mazo; E A Zubova
Journal: J Mol Model Date: 2017-02-09 Impact factor: 1.810

Review 3. Adaptive resolution simulations of biomolecular systems.

Authors: Julija Zavadlav; Staš Bevc; Matej Praprotnik
Journal: Eur Biophys J Date: 2017-09-13 Impact factor: 1.733

4. Coarse-grained simulation of DNA using LAMMPS : An implementation of the oxDNA model and its applications.

Authors: Oliver Henrich; Yair Augusto Gutiérrez Fosado; Tine Curk; Thomas E Ouldridge
Journal: Eur Phys J E Soft Matter Date: 2018-05-10 Impact factor: 1.890

5. Sequence-Dependent Three Interaction Site Model for Single- and Double-Stranded DNA.

Authors: Debayan Chakraborty; Naoto Hori; D Thirumalai
Journal: J Chem Theory Comput Date: 2018-06-26 Impact factor: 6.006

6. Physics-Based Potentials for Coarse-Grained Modeling of Protein-DNA Interactions.

Authors: Yanping Yin; Adam K Sieradzan; Adam Liwo; Yi He; Harold A Scheraga
Journal: J Chem Theory Comput Date: 2015-04-14 Impact factor: 6.006

7. cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.

Authors: Lennart De Bruin; John H Maddocks
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

8. Focus on PNA Flexibility and RNA Binding using Molecular Dynamics and Metadynamics.

Authors: Massimiliano Donato Verona; Vincenzo Verdolino; Ferruccio Palazzesi; Roberto Corradini
Journal: Sci Rep Date: 2017-02-17 Impact factor: 4.379

9. Determining Sequence-Dependent DNA Oligonucleotide Hybridization and Dehybridization Mechanisms Using Coarse-Grained Molecular Simulation, Markov State Models, and Infrared Spectroscopy.

Authors: Michael S Jones; Brennan Ashwood; Andrei Tokmakoff; Andrew L Ferguson
Journal: J Am Chem Soc Date: 2021-10-13 Impact factor: 15.419

10. Accurate Sequence-Dependent Coarse-Grained Model for Conformational and Elastic Properties of Double-Stranded DNA.

Authors: Salvatore Assenza; Rubén Pérez
Journal: J Chem Theory Comput Date: 2022-04-08 Impact factor: 6.578

10 in total