Literature DB >> 35394775

Accurate Sequence-Dependent Coarse-Grained Model for Conformational and Elastic Properties of Double-Stranded DNA.

Abstract

We introduce MADna, a sequence-dependent coarse-grained model of double-stranded DNA (dsDNA), where each nucleotide is described by three beads localized at the sugar, at the base moiety, and at the phosphate group, respectively. The sequence dependence is included by considering a step-dependent parametrization of the bonded interactions, which are tuned in order to reproduce the values of key observables obtained from exhaustive atomistic simulations from the literature. The predictions of the model are benchmarked against an independent set of all-atom simulations, showing that it captures with high fidelity the sequence dependence of conformational and elastic features beyond the single step considered in its formulation. A remarkably good agreement with experiments is found for both sequence-averaged and sequence-dependent conformational and elastic features, including the stretching and torsion moduli, the twist-stretch and twist-bend couplings, the persistence length, and the helical pitch. Overall, for the inspected quantities, the model has a precision comparable to atomistic simulations, hence providing a reliable coarse-grained description for the rationalization of single-molecule experiments and the study of cellular processes involving dsDNA. Owing to the simplicity of its formulation, MADna can be straightforwardly included in common simulation engines. Particularly, an implementation of the model in LAMMPS is made available on an online repository to ease its usage within the DNA research community.

Entities: Chemical

Mesh：

Substances：
Nucleotides
DNA

Year: 2022 PMID： 35394775 PMCID： PMC9097290 DOI： 10.1021/acs.jctc.2c00138

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.578

Introduction

Sequence-dependent conformational and elastic properties of DNA are of the utmost importance for its regulation in vivo, as they directly affect DNA–protein interactions.[1] The detailed shape of a DNA fragment and its deformability are indeed key determinants of protein recognition and binding.[1,2] For instance, the unique conformational properties of A-tracts are known to affect nucleosomal organization,[3] as well as DNA replication and recombination.[4] Analogously, TATA-box elements present a strong deviation from the canonical B-DNA conformation, which is exploited to enhance binding by the TATA-box binding protein.[1] Moreover, proteins continuously exert mechanical stress on bound DNA. For example, torsion is applied on DNA by topoisomerases and polymerases[5,6] and plays a central role in chromatin remodeling.[7] Also, DNA stretching is relevant in vivo, e.g., in the action of recombinases[8] or for site recognition in nucleosomes.[9] Together with the practical implications of DNA elasticity in nanotechnological applications,[10,11] this fundamental interest has prompted a conspicuous amount of experimental efforts devoted to the detailed characterization of DNA conformational ensemble and elasticity. For the study of conformers, classical crystallographic and NMR studies are nowadays being complemented by X-ray interferometry[12,13] and by single-molecule imaging with atomic force microscopy (AFM)[14,15] or cryoelectron microscopy.[16] Single-molecule techniques like AFM, optical tweezers, and magnetic tweezers are employed to assess the elastic properties of long DNA molecules, providing stiffness values for various mechanical perturbation modes such as stretching,[17−21] twisting,[20,22−27] bending,[17−19,21,28−30] and the coupling between them.[20,31−35] At short length scales, the interpretation of experiments becomes cumbersome due to both theoretical and experimental challenges. Indeed, direct AFM imaging has provided conflicting results on the bendability of short DNA fragments.[36,37] For other experimental approaches such as cyclization assays,[38−41] the elastic parameters can be obtained only by interpreting the data within specific theoretical frameworks, and conflicting conclusions have been reported based on different physical assumptions.[12,42−44] Molecular simulations provide a valuable means to complement experimental studies, particularly in view of the discrepancies mentioned above. All-atom molecular dynamics has been successfully employed to capture DNA elastic and conformational features[45−49] as well as their dependence on sequence.[35,50−53] An evident limit of atomistic simulations originates from the associated high computational cost, which puts severe boundaries on the length and time scales that can be simulated. The need to overcome this barrier has fostered the development of coarse-grained approaches, where one selects only the degrees of freedom relevant to the problem at hand. An elegant solution is implemented in the software packages cgDNA[54] and MC-eNN,[55] where the nucleotides are represented as rigid frames and the energy of the system is described by means of a stiffness matrix. This approach is extremely efficient from a computational perspective and enables assessing the conformational ensemble of long DNA chains.[56] However, an important drawback lies in the absence of an explicit description of the backbone and the related difficulty in interfacing this representation with external perturbations present in most systems of interest, e.g., mechanical stress, confinement, or a binding protein. A good compromise between computational efficiency and modeling flexibility is achieved by considering coarse-grained models in which effective particles represent the various moieties along the DNA molecule.[57−65] These models have been employed to describe a wide palette of systems involving DNA, including, e.g., DNA origami,[66] nucleosomes,[67] and cyclization assays.[44] Despite this success, the coarse-grained models of the latter kind present in the literature do not satisfactorily capture all the main elastic features of double-stranded DNA at once (see, e.g., refs (62 and 68)). In this work, we fill this gap by introducing the mechanically-accurate DNA (MADna) model, a sequence-dependent coarse-grained model whose parameters are tuned to reproduce local conformational features of double-stranded DNA and the stiffness of the various mechanical perturbation modes obtained via all-atom molecular dynamics.[52] We show that the model satisfactorily reproduces the atomistic values also for DNA fragments different than the ones employed for fitting, as well as experimental data from the literature. Particularly, and at variance with existing models, our coarse-grained description captures the negative twist–stretch coupling recently unveiled by single-molecule force spectroscopy[31,32] as well as the experimentally determined sequence dependence of dsDNA helical pitch and persistence length.[28] The full details of MADna and the definitions of the various physical and geometrical quantities are provided in Methods. Nevertheless, they are shortly introduced also in Results and Discussion, making this section somewhat self-standing. In this way, the main findings of this work are presented succinctly, leaving the interested reader to look up the technical details within the Methods section. Finally, we remark that an online repository is available (https://github.com/saassenza/MADna/tree/main/LAMMPS) where MADna has been implemented in LAMMPS[69] without the need to introduce additional custom packages. The repository also contains a molecular builder which automatically creates the initial coordinates and topology for simulation in LAMMPS.

Methods

Coarse-Grained Model

MADna describes a molecule of double-stranded DNA (dsDNA) by considering three effective particles per nucleotide, located in the geometric centers of the phosphate group, the sugar, and the base, as proposed in the past in the 3SPN[57] and TIS[65] models. In Figure , two cartoons are reported to compare the atomistic and coarse-grained description of a representative sequence. Further details on the coarse-graining procedure are reported in section S1 in the Supporting Information. In spite of the similar level of coarse-graining, MADna diverges from the other models in terms of the choice of bonded interactions both in the way they are built and in their parametrization. Particularly, our choice of bonded interactions enables capturing the structural heterogeneity predicted by atomistic simulations, as will be showed below when we comment on the results in Figure and Figure i. On the other hand, we chose to fix the double-stranded structure by means of bonded interactions, which simplifies the implementation of the model (for instance, MADna does not need modulation factors to account for anisotropic interactions) but makes it less general than the other ones as it cannot account for DNA thermodynamics. In this regard, MADna is more akin to the MARTINI representation of double-stranded DNA, where the double helix is maintained through a suitably defined elastic network.[64] We will add the possibility of strand separation in future versions of MADna. As for the parametrization, MADna extracts information from a large data set provided by atomistic simulations, while both TIS and 3SPN are parametrized by means of experiments. Experimental results are of course the ultimate source of information, but we decided to tune the parameters of MADna from atomistic simulations based on both the far richer amount of available information and their demonstrated accuracy in capturing local conformational and elastic features of dsDNA. As we show in the Results and Discussion section, this choice is rewarded with a very good performance of MADna in capturing many distinct experimental observations on the structure and mechanical response of dsDNA.

Figure 1

Figure 2

Average values obtained by coarse-graining the atomistic simulations for the angles SBB (a) and the dihedrals SBBS (b). Vertical lines separate the distinct sequences, which are listed in Section S3.1 in the Supporting Information. In the legends, the various labels correspond to the particular bases involved in the local conformation under consideration. For instance, SAT considers a SBB angle in which an adenine is bound to the sugar. Note also that the dihedrals are symmetric with respect to the inversion of the involved bases, as it just corresponds to changing the arbitrary reference strand.

Figure 5

Scatter plot comparing atomistic and coarse-grained results at f = 1 pN for various structural features: crookedness β (panel a); helical diameter (b); helical rise (c); helical twist (d); width of major (e) and minor (f) groove; depth of major (g) and minor (h) groove; SBBS dihedrals (i). Training and testing sequences are denoted by red circles and green diamonds, respectively. Black lines indicate the bisector of the first and third quadrant. For each panel, the Pearson coefficient indicating the linear correlation between the two data sets is reported. The atomistic results were obtained by coarse-graining the trajectories obtained from all-atom simulations and performing the analysis reported in the Methods.

Atomistic (left) and coarse-grained (right) description for a representative dsDNA molecule with leading sequence 5′-CGCTACTTCGAGG-3′ in the B-DNA form as obtained by employing the NAB software.[70] In the coarse-grained cartoon, the color code is the following: sugar ↔ black; phosphate group ↔ red; adenine ↔ green; cytosine ↔ cyan; guanine ↔ yellow; thymine ↔ orange. The size of each bead is proportional to the WCA radius of the corresponding moiety. Average values obtained by coarse-graining the atomistic simulations for the angles SBB (a) and the dihedrals SBBS (b). Vertical lines separate the distinct sequences, which are listed in Section S3.1 in the Supporting Information. In the legends, the various labels correspond to the particular bases involved in the local conformation under consideration. For instance, SAT considers a SBB angle in which an adenine is bound to the sugar. Note also that the dihedrals are symmetric with respect to the inversion of the involved bases, as it just corresponds to changing the arbitrary reference strand.

Bonded Interactions

The double-stranded topology is fixed once and for all and is maintained by introducing several two-, three-, and four-body bonded interactions (from here on, we will refer to them as bonds, angles, and dihedrals, respectively), which connect beads within the same strand as well as providing interstrand links. Before going into the details of the bonded interactions considered, we first proceed to clarify the nomenclature used in this section. In order to label the bonded interactions, we indicate the sugar, phosphate, and base beads as S, P, and B, respectively. In this way, a bond between, e.g., a sugar and a base is indicated as SB. Whenever the bonded interaction runs along the 5′-3′ direction of a strand, we add a corresponding tag at the two ends of the label. For instance, a bond between a sugar and a phosphate in the 5′-3′ direction is indicated as 5′-SP-3′. Analogously, we denote as 5′-SPSB-3′ a dihedral involving a sugar, a phosphate, the following sugar, and the base attached to it (ordered according to the 5′-3′ direction of the strand). In order to keep in place the double-stranded structure, some bonded interactions involve both strands. Such bonded interactions are marked by underlining their label. For instance, the bonds accounting for Watson–Crick base pairing are indicated as BB-WC (in this specific case we also added “WC” in order to clearly distinguish these bonds from 5′-BB-3′, which account for stacking interactions). As another example, we denote as 5′-PSBB-3′ a dihedral involving in the 5′-3′ direction a phosphate, a sugar, the attached base, and the base paired to it. We further note that, due to the asymmetry of the 5′-3′ direction, the order in which the same beads are considered is important. For instance, the bonds 5′-SP-3′ and 5′-PS-3′ are different from each other. This can be clearly seen by comparing the corresponding entries in Table S2 in the Supporting Information, where we show the equilibrium values computed by coarse-graining the atomistic simulations used to parametrize MADna (see below). Analogously, the dihedrals 5′-SPSP-3′ and 5′-PSPS-3′ are different from each other, as can be seen from Table S4 in the Supporting Information. For the bonds potential Ubond, a harmonic function was considered:where r is the distance separating the two beads connected by the bond, kbond is the elastic constant specific to the type of bond considered, and r0 is the corresponding equilibrium distance. An analogous formula was used for the angle potential Uangle:where θ is the angle characterizing the three-body bonded interaction, kangle is the bending constant relative to the type of angle considered, and θ0 is the equilibrium value of θ. Finally, the following formula was chosen for the dihedral potential Udihedral:where ϕ is the dihedral angle, kdihedral is the elastic constant of the particular type of dihedral considered, and ϕ0 = ϕmin – 180°, with ϕmin being the equilibrium dihedral angle. This choice for the bonded interactions relies on the observation of normal distributions for most of the corresponding observables (see Section S2.2 in the Supporting Information). The general guideline in choosing how to build the topology was to obtain a minimal set of bonded interactions which reproduces the structural features obtained in all-atom simulations. With this spirit, we considered only bonded interactions involving up to a single step and implemented the sequence dependence by setting different values of the parameters according to the particular step under consideration. For instance, if along the sequence we have a CT step, the bond 5′-BB-3′ accounting for the stacking interaction is implemented by means of eq , where kbond and r0 are equal to the values assigned to a CT step. Analogously, the dihedral 5′-PSBB-3′ is implemented by means of eq , where kdihedral and ϕ0 are set to the values corresponding to CT. The same is repeated for all the step-dependent bonded interactions considered, so that 16 different values are considered for the corresponding parameters depending on the involved step. Some local features are symmetric with respect to the 5′-3′ direction. The corresponding bonded interactions were then implemented as being dependent only on the corresponding base pair. For instance, for the bond SB we considered four possibilities (A, C, G, and T), while for the bond BB-WC there were just two choices (AT or CG). Yet, this assumes that these local features do not depend on the neighboring steps, which is not always the case. As an example, we report in Figure the averages obtained by coarse-graining the atomistic simulations for the angles SBB (Figure a) and the dihedrals SBBS (Figure b). The angles SBB are obtained by considering a sugar, the base attached to it, and the corresponding Watson–Crick partner. For instance, we indicate as SAT the angle formed by a sugar, the adenine attached to it, and the thymine paired with the adenine. As for the dihedrals SBBS, we similarly consider a Watson–Crick base pair and the corresponding sugars. In Figure , the points are color-coded according to a classification based on a single base pair. Hence, four possible choices are present for SBB, depending on the base being attached to the sugar (A, C, G, T), while only two choices are available for SBBS, corresponding to the Watson–Crick base pairs AT and CG. As shown in Figure a, the data for SBB nicely cluster around their average values, indicating that this angle is in practice independent of the rest of the sequence of the hosting DNA molecule and that it can thus be considered for bonded interactions depending only on the base pair. In contrast, the dihedrals SBBS show a wide variability, particularly in the case of CG base pairs (Figure b). This demonstrates that these dihedrals depend on their neighborhood and are thus not apt for implementing bonded interactions depending only on the base pair. As we show in the Results and Discussion, the heterogeneity of the SBBS dihedrals is quantitatively captured by MADna as an emergent property resulting from the interplay between the other bonded interactions. We also note that this is a distinguishing feature of MADna when compared to the TIS[65] and 3SPN2[71] models, where instead the base-pair-dependent SBBS interactions are introduced as part of the model, thus preventing the possibility of capturing the heterogeneity suggested by the atomistic simulations. The choice of the bonded interactions was obtained in practice by trial and error: starting from a far too simplistic description, where the beads were barely kept together by a minimal set of bonds, we incrementally added bonded interactions. At each iteration, we ran simulations with the coarse-grained model and computed the averages obtained for local conformational features that were not yet directly implemented. We then compared these averages with the results obtained by coarse-graining the atomistic trajectories. If the comparison was not good, we then proceeded to add other bonded interactions. The iterative procedure stopped when we reached a satisfactory comparison. We also considered an opposite approach, where we started by considering far too many bonded interactions and deleted some of them until further eliminations resulted in disrupting the structure. The process has some arbitrarity in the properties being monitored, the order in which we add bonded interaction, etc. Yet, the high fidelity of the coarse-grained model in reproducing the atomistic values (as shown in the Results and Discussion) validates the final choice of bonded parameters that we considered. The final set of employed bonds, angles, and dihedrals is reported in Figure , together with some representative examples based on the dsDNA molecule sketched in Figure . Some of these bonded interactions are directly related to physically meaningful features, such as hydrogen bonds between Watson–Crick pairs (labeled as BB-WC in Figure ) and stacking interactions (5′-BB-3′ in Figure ). Summing up, there are in total 202 distinct bonded interactions in the model. Further details are reported in Section S2 in the Supporting Information.

Figure 3

List of the various bonded interactions considered in the model, together with representative examples based on the same molecule as in Figure . Step-dependent bonded interactions are indicated by the presence of the tags 5′ and 3′ in their label. For interstrand interactions, the corresponding label is underlined. The letters present in the labels indicate sugars (S), phosphate groups (P), or generic bases (B). For clarity, all the selected examples involve beads belonging to the same strand, whose 5′-3′ direction is indicated by the arrow in the top-left panel. Importantly, since all the bonded interactions involve at most two consecutive base pairs, any feature at scales larger than a single step is an emergent property originating from the propagation of the local interactions in combination with the electrostatic repulsion between phosphates (see below).

Excluded Volume

Excluded-volume interactions were implemented by means of a Weeks–Chandler–Andersen (WCA) potential, i.e., by retaining the repulsive part of a Lennard-Jones interaction:In the previous formula, r is the distance between the two particles, ϵ = 1 kcal/mol, while σ depends on the bead considered, as reported in Table S1 in the Supporting Information.

Electrostatics

In order to account for the presence of charges along the backbone, a charge q·e0 was assigned to the beads corresponding to phosphate groups, with e0 = 1.6 × 10–19 C being the elementary charge. Although each phosphate carries a unit negative charge, in order to account for counterions condensation an effective reduced value q = −0.6 was considered.[65,71,72] The salt-induced electrostatic screening was modeled via a Debye–Hückel interaction:[73]In the previous formula, ϵ0 = 8.859 × 10–12 F/m is the absolute permittivity; ϵr = 78.3 is the relative permittivity of water; and lD is the Debye length, defined aswhere kB = 1.38 × 10–23 J/K is Boltzmann’s constant; T is the temperature of the system in Kelvin; NA = 6.022 × 1023 mol–1 is Avogadro’s constant; and I is the ionic strength of the solution in mM.

Determination of Parameters

The parameters of the various sequence-dependent bonded interactions were tuned in order to reproduce the results of atomistic simulations from ref (52), which were performed in AMBER14[74] with the parm99 force field[75] including the bsc0 modifications.[76] In those simulations, dsDNA molecules with different sequences were pulled under the action of forces ranging from 1 to 20 pN. The main details of the atomistic simulations are reported in Section S2.1 in the Supporting Information. Here, we used a subset of the atomistic simulations covering all possible base steps (we refer to this subset as “training sequences”, see Section S3.1 in the Supporting Information for the sequences) to determine the bonded parameters, while the rest of sequences were employed as test cases to benchmark the coarse-grained model (“testing sequences”, also reported in Section S3.1 in the Supporting Information). It has to be noted that, although parm99+bsc0 simulations reproduce well most structural features determined in experiments,[77] more precise modifications have been recently introduced, namely bsc1[78] and OL15.[79] Our reason to choose bsc0 was rooted in the convenience of having the atomistic data already available in our group,[46,52] together with the reasonable precision of these modifications.[77] Future developments of MADna will include the reparameterization of the model via atomistic simulations based on bsc1 or OL15. A first estimation of the parameters was obtained via Boltzmann inversion of the all-atom simulations performed at 1 pN. In this regard, the atomistic trajectories were first coarse-grained according to the three-beads representation. Then, for each bond, angle, and dihedral the ensemble averages were computed from the coarse-grained trajectories, which enabled fixing the values of r0, θ0, and ϕ0. Analogously, the elastic constants were determined in order to reproduce the size of fluctuations starting from eqs , 2, and 3. Further details on this procedure can be found in Section S2.2 in the Supporting Information. Coarse-grained simulations performed using the obtained force field showed that the set of parameters obtained by Boltzmann inversion provides reasonable values for the elastic constants (see Figure S1 in the Supporting Information). Nevertheless, some of the parameters were further tuned in order to improve the quantitative agreement with the atomistic simulations, focusing on reproducing the elastic constants of the training sequences (see Results and Discussion and Section S2.2 in the Supporting Information).

Molecular Dynamics Simulations

All the simulations were performed in LAMMPS (http://lammps.sandia.gov[69]). The temperature T = 300 K was maintained through a Langevin thermostat with damping constant τdamp = 20 ps. The integration step was set to 20 fs.

Benchmark Simulations

In order to benchmark the coarse-grained model, we performed pulling simulations of the training and testing sequences following the same protocol as in refs (46 and 52). A pulling force f was applied to the center of mass ξ2 of the sugars belonging to the second base pair. Analogously, an opposite force −f was applied to the center of mass ξ of the second-to-last base pair (lseq is the length of the sequence) . The direction of the forces was taken along the line connecting ξ2 and ξ. Simulations were performed for f = 1, 5, 10, 15, and 20 pN. For each simulation, the system was initialized by considering the average structure created by the molecular builder (defined below), and the equations of motion were integrated for 20 ns. In order to ensure full equilibration, only the last 10 ns were considered for analysis. The convergence of a representative simulation is reported in Figure S8 in the Supporting Information. For each sequence and force, 100 independent simulations were performed. The ionic strength I was set at the same value as in the atomistic simulations from refs (46 and 52) and was computed as I = Nions/NAV, where Nions is the number of counterions and V is the volume of the simulation box.

Persistence-Length Simulations

For the computation of the sequence-averaged persistence length, we considered 20 random sequences made of 100 base pairs. The sequences are listed in Section S3.2 in the Supporting Information. For each realization, the molecule was initialized by considering the average structure obtained by the molecular builder (see below), and the simulation was run for 500 ns. In order to ensure full equilibration, the first 100 ns were discarded from the analysis. The convergence of a representative trajectory is reported in Figure S9 in the Supporting Information. For each sequence, 10 independent simulations were performed. The ionic strength was set at I = 150 mM. In a second set of simulations, we studied the sequence dependence of persistence length and helical pitch and compared our results with the experiments from ref (28). We considered 14 sequences of 100 base pairs obtained as the central fragments of the corresponding experimental ones.[28] The sequences are listed in Section S3.2 in the Supporting Information. The same protocol as for the sequence-averaged study was applied. The ionic strength was set at I = 1000 mM to ease the comparison with the predictions of CGDNA.[56] The convergence of a representative trajectory is reported in Figure S10 in the Supporting Information. Simulations were performed with MADna and, for comparison, with oxDNA2[80] (using the LAMMPS implementation[81] and considering sequence-dependent stacking interactions) and the sequence-dependent model 3SPN2C.[82]

Stretch–Torsion Simulations

In order to determine the elastic constants, a simulation setup was considered where a 40bp dsDNA molecule was subjected to the simultaneous action of a pulling force f and a torque τ directed along the z axis and with constant magnitude. For each simulation, the molecule was initialized via the molecular builder and was aligned with the z axis. The two bottom base pairs and the corresponding sugars and phosphates were tethered to their initial position via harmonic constraints with constant 100 kcal/mol Å2. A constant pulling force f oriented along the positive z direction was exerted on the center of the sugars belonging to the second base pair counting from the top. A harmonic constraint with constant 100 kcal/mol Å2 was also applied to the x and y coordinates of the center to keep the molecule aligned with the z axis and avoid spurious effects originating from the overall orientational entropy.[83] Finally, a constant torque τ was imposed on the two top base pairs and the corresponding sugars and phosphates. Simulations were performed at τ = 0 for values of the force f = 2, 4, 6, 8, 10, 15, 20, 25, 30, 35, and 40 pN and at f = 2 pN for values of the torque τ = 0, 5, 10, 15, 20, 25, and 30 pN·nm. To account for sequence-induced heterogeneity, five random sequences were generated, which we refer to as sequences ST1, ..., ST5 (the sequences are listed in Section S3.3 in the Supporting Information). Moreover, to check the dependence of the elastic constants on the length of the dsDNA molecule, we also performed simulations on shorter sequences containing 20 base pairs, which we refer to as sequences ST1-short, ..., ST5-short (the sequences are listed in Section S3.3 in the Supporting Information). For each combination of sequence, force, and torque, 10 independent simulations of length 250 ns were run. The first 50 ns were not considered for analysis in order to let the system equilibrate. The convergence of a representative trajectory is reported in Figure S11 in the Supporting Information. Simulations were performed with MADna, oxDNA2, and 3SPN2C. In all cases, the ionic strength was set at 150 mM. The four base pairs at each end were discarded from the analysis. Finally, in order to study the effect of sequence on the elasticity, we considered another set of simulations with phased A-tracts, which are listed in Section S3.3 in the Supporting Information. The same protocol as for the rest of the simulations was applied.

Twist–Bend Coupling Simulations

In order to determine the softer response to torque due to the coupling between twist and bending, we performed simulations similar to those of the previous section. Here, 150bp long sequences were placed under the action of a pulling force directed along the z axis and acting on one end of the molecule. The other end was tethered with the same protocol as in the previous section. Note that the pulled end is here free to move in the x and y directions in order to let the molecule bend. Simulations were performed for values of the force f in steps of 0.05 pN within the range 0.05 pN ≤ f ≤ 0.50 pN and in steps of 0.25 pN within the range 0.50 pN ≤ f ≤ 2.50 pN. Three different sequences were considered, which we refer to as TB1, TB2, and TB3 (the sequences are listed in Section S3.4 in the Supporting Information). For each combination of sequence and force, three independent simulations of length 2000 ns were run. The first 1000 ns were not considered for analysis in order to let the system equilibrate. The convergence of a representative trajectory is reported in Figure S12 in the Supporting Information. Simulations were performed with MADna, 3SPN2C, and oxDNA2.

Definitions and Protocols

Basic Definitions

We will indicate as P, S, and B the position vectors of the beads corresponding to the ith phosphate group, sugar, and base, respectively, and located on strand s = 1, 2. The index i is counted in the 5′-3′ direction of the arbitrarily chosen strand 1. Note that i = 1, ..., n for sugar and base beads, while i = 2, ..., n for the phosphate groups, where n is the total number of base pairs (compare Figure ). Hence, there are in total N = 2[(n – 1) + n + n] = 6n – 2 beads, where the factor of 2 is included to account for the presence of the two strands. Handles were always discarded from the analysis, so the various indexes introduced here refer to the subfragment under consideration.

Crookedness

The crookedness β accounts for the deviation of a dsDNA molecule from a straight line[52] (Figure b). For short fragments (such as the training and testing sequences considered here), thermally induced bending is negligible; hence, in this case β effectively quantifies the spontaneous bending of the molecule. The definition of β is here modified with respect to its original formulation to adapt it to the coarse-grained context. The contour length L of the line connecting the base-pair centers along the molecule is L = ∑ |Γ – Γ|, where Γ ≡ (B1, + B2,)/2. The crookedness β is defined asIf the line connecting the centers is perfectly straight, then β = 0. The more curved the line, the larger the corresponding β. As a reference, we computed the crookedness for the conformations obtained via NAB[70] for the training and testing sequences, obtaining an average value β ≃ 0.17.

Figure 4

Sketches showing pictorially the definitions to characterize the geometry of DNA. (a) The helical axis is obtained as the axis of the best-fitting cylinder. (b) The crookedness is obtained by computing the arccosine of the ratio between the end-to-end distance (black arrow) and the contour-length of DNA (red line). These quantities are obtained along the line formed by the points Γ, which are determined as the centers between bases belonging to the same pair (inset). (c) Grooves are defined by considering the lines interpolating the phosphate beads (brown and black lines). For any couple of phosphates on the first strand (P1, and P1, in this case), we define the midpoint (M). From the midpoint, we find the closest points on the second strand. The groove widths are obtained as the corresponding minimum distances (orange segments), suitably shifted to account for the excluded volume of the backbone. (d) The h-twist is defined by considering the vectors ζ ≡ S2, – S1, joining the two sugars within each base. The vectors ζ are projected onto the plane perpendicular to the helical axis, thus obtaining ζ. The h-twist is then defined as the angle depicted in red, corresponding to cos h-twist = ζ·ζ·e) The h-rise is defined by considering the geometrical centers of the sugars ξ ≡ (S1, + S2,)/2 and projecting the vector separating two consecutive centers onto the helical axis, thus obtaining h-rise = (ξ – ξ)·ĥ, corresponding to the red segment in the figure.

Helical Parameters

Due to the limited information on local coordinates, the definition of the helical parameters has to be adapted to the coarse-grained framework. Particularly, the representation of the bases as single beads prevents defining the local reference frames, which are crucial for the standard definitions of helical parameters.[84,85] Taking advantage of the short length of the training and testing sequences, the helical axis ĥ in this case was defined globally as the axis of the cylinder best fitting the position of phosphates in the double helix (Figure a; see Section S4.1 in the Supporting Information for the technical details), which was also employed to estimate the diameter of the DNA. In order to define the h-twist, we first introduce the sugar separation vector ζ ≡ S2,–S1, (Figure d). The h-twist for the step i, i + 1 was defined by cos(h-twist) ≡ ζ̂·ζ̂, where ζ̂ ≡ ζ/|ζ| and ζ ≡ ζ – (ζ·ĥ) ĥ is the projection of ζ onto the plane perpendicular to the helical axis. Following the usual convention,[84] the sign of the h-twist was determined according to the sign of the product (ζ × ζ)·ĥ. The employment of the positions of the sugars in the definition of the h-twist was motivated by the high correlation found with the standard definition in 3DNA[84] (see Figure S2 in the Supporting Information). Finally, the h-rise for the step i, i + 1 was computed as h-rise = (ξ–ξ)·ĥ, where ξ ≡ (S1, + S2,)/2 is the center of the sugars of base pair i (Figure e).

Grooves

For a visual description of the definition of groove geometry, see Figure c. The quantification of the geometry of grooves was assessed in a similar way as in Curves+.[85] The positions of the phosphate groups in each strand were interpolated via centripetal Catmull-Rom splines.[86] Particularly, the helical fragment connecting P and P was interpolated by considering P, P, P, and P as control points. For each of the terminal phosphates, the missing external control point was built by extrapolation of the h-rise and h-twist of the last step. In order to determine the groove geometry corresponding to base pair i, we first considered the midpoint M1, between P1, and P1, along the interpolated curve. Then, starting from the analogous midpoint M2, on the second strand, we followed the corresponding interpolating curve in both directions while computing the distance from M1, until a local minimum in the distance was first encountered at two points denoted as F+, and F–,. From the two identified minimal distances, we subtracted 0.58 nm to account for the van der Waals radius of the backbone[85] and assigned the resulting values to the width of the major and minor groove. In order to compute the depth of each groove, we considered the midpoints on the width vectors, i.e., 0.5(M1, + F+,) and 0.5(M1, + F–,) and computed their distance from the base-pair center Γ. Again, in order to account for the van der Waals radii, we estimated the final groove depths by subtracting 0.35 nm from the calculated distance.[85]

Determination of Elastic Constants

In order to interpret single-molecule experiments, dsDNA is often modeled as an elastic rod. When a pulling force f and a torque τ are exerted along the direction of the rod, the elastic energy U reads[22,31]where S is the stretching modulus; C is the twist modulus; L is the extension; θ is the twist in radians; ΔL ≡ L – L0; and Δθ ≡ θ – θ0, with L0 and θ0 being the equilibrium extension and twist at zero force and torque. On the right-hand side of eq , the first and second terms are stretching and twisting energy, respectively; the third term is the stretch–twist coupling energy; the fourth and fifth terms are the work performed by the external force and torque, respectively. Simultaneous minimization with respect to extension and force (∂U/∂ΔL = 0 and ∂U/∂Δθ = 0) leads to the following equations for the elastic equilibrium:where S̃ ≡ S – g2/C is the effective stretching modulus. From the previous formulas, the sign of g can be immediately ascertained by looking at the behavior of ΔL as a function of τ at constant force, with positive and negative correlation implying g < 0 and g > 0, respectively (an analogous response is obtained at constant τ for Δθ as a function of f). For a quantitative assessment of the elastic constants, three equations are needed. Different strategies were pursued for the two simulation protocols considered in this work.

Benchmark Simulations

In this case, the torsion angle is estimated as θ = ∑h-twist. The extension is computed as L = |Γ – Γ1|. Moreover, no torque is applied, τ = 0. From eq , one has ΔL = A1f and Δθ = A2f, with A1 ≡ L0/S̃ and A2 ≡ −gL0/CS̃. The third equation is provided by applying the equipartition theorem to eq and neglecting the twist–stretch coupling term.[31] This is not a problem for the present purposes since it affects equally the analysis of both atomistic and coarse-grained results, thus not jeopardizing the quality of their comparison, although this detail should be kept in mind for the physical interpretation of the results. Application of the equipartition theorem gives var(Δθ) = ⟨(Δθ)2⟩ – ⟨Δθ⟩2 = kBTL0/C. Inverting these formulas, one findsFitting the linear dependence of L and θ on the pulling force f, we computed the constants L0, θ0, A1, and A2. var(Δθ) was instead calculated directly from the simulations performed at f = 1 pN. Plugging the values of these quantities into eq , we finally obtained the values of S̃, C, and g reported in Figure .

Figure 6

Scatter plot comparing atomistic and coarse-grained results for the effective stretching modulus S̃ (panel a), the crookedness rigidity kβ (b), the torsion modulus C (c), and the twist–stretch coupling constant g (d). Training and testing sequences are denoted by red circles and green diamonds, respectively. Black lines indicate the bisector of the first quadrant. For each panel, the Pearson coefficient indicating the linear correlation between the data sets corresponding to the testing sequences is reported.

As for the crookedness constant kβ, following ref (52) we define it via the relation cos β = cos β0(1 + f/kβ), where β0 is the crookedness at zero force. kβ is then suitably extracted from the slope of cos β versus f.

Stretch–Torsion Simulations

Taking advantage of the directionality imposed by the external force, the torsion angle in this case is computed as θ = ∑ψ, where the angle ψ is calculated as the h-twist but considering the rejection of ζ from the direction of the force instead of the helical axis. In the same way as done above, for τ = 0 one has ΔL = A1f, with A1 ≡ L0/S̃. At variance with the benchmark simulations, the presence of an imposed torque enables deriving all the equations without resorting to the analysis of the fluctuations. Indeed, for a constant pulling force f = f0, ΔL = constant + A3τ and Δθ = constant + A4τ, with A3 = −gL0/CS̃ and A4 = L0S/CS̃ = L0/C(1 + g2/CS̃). From the knowledge of the constants A1, A3, and A4 one can obtain the elastic constants asFitting the linear dependence of ΔL on the pulling force f at τ = 0, we computed the constants L0 and A1. Analogously, fitting of ΔL and Δθ versus τ at f = 2 pN enabled the computation of A3 and A4. Plugging the values of these quantities into eq , we finally obtained the values of S̃, C, and g reported in Figure .

Figure 7

(a) Schematic description of the simulation setup for the stretch–torsion simulations, prescribing a constant force and torque applied along a fixed direction. (b) Twist response to the external force in the absence of imposed torque for MADna (orange circles), oxDNA2 (green squares), 3SPN2C (blue diamonds), and rotor-bead experiments[31] (gray triangles). The black dashed line indicates Δθ = 0; therefore, overwinding and unwinding responses are characterized by points lying above and below the line, respectively. The simulation data correspond to the average of the five sequences considered. (c) Effective stretching modulus S̃, torsion modulus C, and twist–stretch coupling constant g obtained by the three models for the five sequences reported in Section S3.3 in the Supporting Information. Symbols are the same as in panel b. The gray triangles correspond to experimental measures obtained for unrelated sequences in refs (17−20) for S̃, refs (20, 27, 31, and 92) for C, and refs (20, 31, 32, and 93) for g. In the case of g, the dashed black line denotes g = 0, thus separating two regimes characterized by qualitatively different twist–stretch coupling.

Computation of Persistence Length

In order to compute the persistence length, for each base pair i we considered the geometrical center ξ = (S1, + S2,)/2 of the sugar beads. By introducing the displacement vector R ≡ ξ – ξ, the ith tangent vector is defined as t̂ ≡ R/|R|. The contour length l separating the points of application of two consecutive tangent vectors is computed as the average modulus of the displacement vector. The overall correlation function cp is defined aswhere s is the contour length, while ⟨···⟩th, denotes the average over both the conformational ensemble and the value chosen for i. Therefore, cp computes the decay of the orientation of the tangent vectors by taking thermal fluctuations into account. Analogously, the static correlation function cs is defined aswhere min is the ith tangent vector computed on the minimum-energy structure and ⟨···⟩ denotes the average over the value chosen for i. The static correlation function quantifies the decay of the orientation of tangent vectors along the conformation corresponding to the ground state. For each sequence, the minimum-energy conformation was estimated by simulated annealing: starting from the average structure obtained via the molecular builder, we ran 100 short simulations at 300 K, after which the temperature was quenched at 0 K, while maintaining the Debye length corresponding to 300 K (this ensured that the minimum-energy configuration corresponded to the electrostatics at the temperature of interest). The ground state was then estimated as the final conformation of the quenched simulation attaining the smallest value of the total energy. Finally, following ref (56), the dynamic correlation function is defined asThis is in fact equivalent to impose, for each sequence, that the harmonic relation 1/lp = 1/ls + 1/ld proposed by Trifonov and co-workers holds exactly.[56,87]

Errors

Indeterminacies on the computed quantities were estimated according to the procedures below. Note that error bars are not present in the figures whenever they are smaller than the size of symbols.

Observables

For all coarse-grained simulations, several independent realizations were performed for each case, whose number depends on the particular set of simulations considered (see above). For each computed observable, the indeterminacy on the average was obtained as the standard error of the mean computed across the independent realizations. As for var(Δθ), the standard error of the variance was considered. In the case of the atomistic trajectories, the time series of each observable was decorrelated by performing a block analysis, from which the error was then estimated.[88]

Elastic Constants

In order to account for the errors on the elastic constants, the following procedure was employed. In all cases, the constants were obtained starting from a linear relation y = ax + y0, where x is either the force or the torque, while y is a certain observable (e.g., the extension). Let be the average values of the variables, and let dy be the error associated with y (x does not have an associated error). Assuming the values of y to be independent and distributed normally, we iteratively considered sets of variables y̅ extracted from Gaussians with average y and standard deviation dy, where indicates the particular realization considered. For each such realization, we obtained the fitting parameters aα and y0,α by linear regression. The mean slope was finally obtained as the average , while the associated error was estimated as the corresponding standard deviation. We checked numerically that a satisfying convergence was obtained for .

Molecular Builder

The results obtained for the training sequences were employed to devise a molecular builder, which provides sequence-dependent average structures to be used as initial configurations. For each step XY, with both X and Y chosen among the four bases A, C, G, and T, we considered the simulations run for the training sequence which contains it. For instance, the sequence PolyAA was considered for the steps AA and TT. The corresponding trajectories were aligned in order to minimize the root-mean-square deviation (rmsd) of the central step XY, computed according to the Kabsch algorithm.[89] After this operation, the average coordinates of the 10 beads belonging to the step (four sugars, four bases, and two phosphates) were computed. These coordinates can be employed to build the average structure of a molecule, where the junctions between adjacent steps are aligned in order to minimize the rmsd of the overlapping base pair. As an example, let us consider the molecule with sequence 5′-ACT-3′. Two steps are present: AC and CT. The molecule is initiated by considering the average coordinates of the step AC. At this point, we note that the four beads (two sugars and two bases) belonging to the second base pair of the step AC are the same as the four beads belonging to the first base pair of the next step, CT. The average coordinates of the latter are then translated and rotated in order to minimize the rmsd between the two sets of coordinates for the overlapping beads. The final coordinates for the four overlapping beads are computed as the mean of the two aligned sets. This procedure can be iterated for molecules of arbitrary length. As a last step, the final structure is translated so that the origin corresponds to the first sugar bead and is subsequently aligned to the z axis. The molecular builder is provided at https://github.com/saassenza/MADna/tree/main/LAMMPS and, using as input only the sequence, creates both the initial coordinates and the double-stranded topology in LAMMPS format.

Results and Discussion

Summary of the Coarse-Grained Model

In line with previous work,[57,65] the coarse grain considered in MADna describes each nucleotide by means of three beads, each centered on the sugar, phosphate group, and base, respectively (Figure ). The beads interact with each other via steric interactions and, in the case of phosphate groups, electrostatic repulsion. The double-stranded molecular structure is maintained by introducing bonded interactions (eqs , 2, and 3 and Figure ), with parameters depending on the sequence up to the level of a single step. The parameters were determined by Boltzmann inversion of atomistic trajectories from the literature,[52] whose sequences encompass all the possible steps (hereby referred to as training sequences, see section S3.1 in the Supporting Information for details). Further tuning was performed in order to reproduce the elastic response to an external pulling force computed for the same set of sequences. Full details of the model and the coarse-graining procedure are found in the Methods and in Sections S1 and S2 in the Supporting Information.

MADna Reproduces Conformational Features from Atomistic Simulations

In order to benchmark the optimized simulation setup, we proceeded with a systematic comparison between coarse-grained predictions and results from atomistic simulations. We performed coarse-grained simulations of dsDNA molecules under the action of a pulling force f ranging in the interval 1–20 pN, following the same protocol as in refs (46 and 52) (see Methods for details on the implementation). Apart from the training sequences, we considered a second, independent set of molecules (testing sequences), which has also been studied in refs (46 and 52). The testing sequences are reported in Section S3.1 in the Supporting Information and include biologically relevant structures as well as synthetic A-tracts fragments. Both conformational and elastic properties were considered for the analysis. The study of various conformational quantities is reported in Figure . For each observable, a scatter plot is depicted to compare coarse-grained and atomistic results at the smallest force considered, f = 1 pN. Training and testing sequences are indicated by red circles and green diamonds, respectively, while the black line indicates the bisector of the first and third quadrant. Overall, an excellent agreement is found, as indicated by the localization of the plotted points in the vicinity of the bisector and by the large values attained by the Pearson coefficient for all the considered quantities. In Figure a, we plot the crookedness β,[52] a dimensionless parameter accounting for the global deviation of the helical axis with respect to a straight line, i.e., the presence of a spontaneous curvature. Larger values of β correspond to more curved structures, and it was found that A-tracts show the straightest conformations.[52] In Figure b, we analyze the helical diameter. Also in this case, the coarse-grained model faithfully captures the dependence on sequence within a range spanning about 0.2 nm, although a slight overestimation of the atomistic values can be appreciated. In Figure c–h we plot the comparison of various helical features, namely h-rise (Figure c), h-twist (Figure d), and groove geometry (Figure e–h). The coarse -grained predictions closely follow the sequence dependence of the atomistic values, although a systematic overestimation of about 0.2 nm is present in the case of the depth of the major groove (Figure g). Finally, in Figure i we compare the values for the local dihedral SBBS, i.e., the dihedral formed by the sugars and bases within each base pair (inset in Figure i). Again, MADna reproduces with high precision the large variability induced by sequence heterogeneity, showing its ability to capture features emerging from the interaction between neighboring base pairs. Scatter plot comparing atomistic and coarse-grained results at f = 1 pN for various structural features: crookedness β (panel a); helical diameter (b); helical rise (c); helical twist (d); width of major (e) and minor (f) groove; depth of major (g) and minor (h) groove; SBBS dihedrals (i). Training and testing sequences are denoted by red circles and green diamonds, respectively. Black lines indicate the bisector of the first and third quadrant. For each panel, the Pearson coefficient indicating the linear correlation between the two data sets is reported. The atomistic results were obtained by coarse-graining the trajectories obtained from all-atom simulations and performing the analysis reported in the Methods. The close agreement between coarse grain and atomistic simulations indicates the high accuracy of MADna in describing the conformational properties of dsDNA. Several striking features are worth mentioning. First, as mentioned above, the testing sequences were not employed to build the model, so that in this case the coarse-grained results are pure predictions. Second, none of the quantities reported in Figure were directly used to build the model; thus, a quantitative agreement is not trivial also in the case of the training sequences. Third, these observables are related to different scales, encompassing a single base pair (SBBS), a base step (h-twist and h-rise), multiple-step geometry (grooves), and the molecule as a whole (β and diameter). Fourth, the selected quantities have a stark dependence on sequence, as shown by their wide range of variability, indicating that the model captures also this intrinsic heterogeneity. Particularly striking is the case of the dihedral SBBS: despite the only two possible choices for the bases (the Watson–Crick pairs AT and CG), this quantity displays a strong variability, ranging from −30 to 10 deg (Figure i). This is evidently an emergent behavior induced by the interaction of these base pairs with their neighbors and is perfectly reproduced by the coarse-grained model.

MADna Reproduces Elastic Constants Obtained from Atomistic Simulations

Next, we focused on the elastic properties of training and testing sequences (Figure ). The effective stretching modulus S̃ (Figure a) and the crookedness elastic constant kβ (Figure b) are related to the elastic response of extension and crookedness to the external force, respectively, and are obtained as the slopes of the corresponding observables as a function of f (see Methods and Figure S3 in the Supporting Information). The torsional modulus C (Figure c) accounts for the change of the h-twist upon application of an external torque. For the present setup, C was computed by analyzing the fluctuations of the h-twist for f = 1 pN. Finally, the twist–stretch coupling constant g (Figure d) quantifies the torsional response to the external force. The negative sign displayed by g (Figure d) implies that the molecule overwinds when stretched, in agreement with experimental observations.[31,32] Precise definitions of the elastic constants and their computation are reported in the Methods. Scatter plot comparing atomistic and coarse-grained results for the effective stretching modulus S̃ (panel a), the crookedness rigidity kβ (b), the torsion modulus C (c), and the twist–stretch coupling constant g (d). Training and testing sequences are denoted by red circles and green diamonds, respectively. Black lines indicate the bisector of the first quadrant. For each panel, the Pearson coefficient indicating the linear correlation between the data sets corresponding to the testing sequences is reported. As discussed in the Methods and in Section S2 in the Supporting Information, the values of S̃, kβ, C, and g for the training sequences were used to refine the parameters of the bonded interactions obtained by the Boltzmann inversion of the atomistic trajectories. Due to the large number of bonded parameters, we opted for a pragmatic approach and adjusted a minimal subset of parameters. Particularly, we empirically observed that a major impact on the elastic constants was obtained by tuning the rigidities of bond 5′-BB-3′, angle 5′-PSB-3′, and dihedral 5′-SPSP-3′. Interestingly, these three bonded interactions have a clear physical meaning (compare Figure ). Indeed, the bond 5′-BB-3′ accounts for stacking interactions, which are likely to affect the stretching stiffness[52] as well as the coupling between twist and stretch deformations. Moreover, the range of values encompassed by the angle 5′-PSB-3′ is related to the flexibility of the sugar pucker, which has been shown to play a key role in dsDNA elasticity.[46] Finally, the 5′-SPSP-3′ dihedral is likely to affect the backbone response and was found to be the quantity most affecting the crookedness rigidity. In this context, it is also worth mentioning that the BB-WC elastic constants for the pairs AT and CG approximately follow the ratio 2:3 (see Table S2 in the Supporting Information). The BB-WC bonds account for the interactions between Watson–Crick base pairs; hence, this ratio nicely reflects the presence of two and three hydrogen bonds for AT and CG, respectively. In Figure , we show the scatter plots comparing S̃, kβ, C, and g for both the training and the testing sequences, while the corresponding statistical indicators are reported in Table . MADna closely captures the average and sequence-induced standard deviation of all the elastic quantities (Table ). While this is expected for the training sequences, which were used to refine the model, the excellent agreement found for the testing sequences (average within 6% and standard deviation within 30% of the atomistic values) indicates the good performance of our model. The capacity of MADna to describe the detailed sequence dependence can be assessed by looking at Figure . MADna precisely accounts for S̃ and kβ (Figure a and Figure b, respectively), as indicated by the large values obtained for the Pearson coefficient (calculated considering only the testing sequences). In contrast, despite reproducing the average and standard deviation of C, MADna does not capture its detailed sequence dependence, as indicated by the low value of the Pearson coefficient. While this can be due to an intrinsic limit of our model, we note that also the input data set might be at fault: in contrast with S̃ and kβ, which are obtained by considering average values of the corresponding observables, C is determined by computing the fluctuations of the cumulated twist. Fluctuations are knowingly slower to converge when compared to averages, suggesting that the lack of accuracy of MADna might be due to sampling limitations of the atomistic simulations. Since to our knowledge there are currently no experiments addressing the sequence dependence of the twist modulus, and given the excellent performance of MADna in capturing the average value of C (see Figure c below), we decided to leave a thorough assessment of the matter to future investigation. Finally, the twist–stretch coupling g displays a good linear correlation. The modest value of the Pearson coefficient is an expected side effect of the large indeterminacy of the atomistic results, for which the error bars have sizes comparable to the dispersion of the average values. It has to be noted that some outliers are present; in particular for sequence A8T the coarse-grained model predicts a twist–stretch coupling markedly different from the atomistic result g ≃ 0. Nevertheless, as discussed in Section S5.1 in the Supporting Information, this disagreement is likely to originate from a possible lack of convergence of the atomistic simulations at large forces for this specific sequence. Based on this observation, we excluded this point for the quantitative evaluation of the agreement between the two data sets by means of the Pearson coefficient reported in Figure d and for the corresponding statistical indicators in Table .

Table 1

Comparison between the Average Elastic Constants Obtained by MADna and Atomistic Simulations (Compare Figure )a

		training sequences	testing sequences
S̃ (pN)	all atom	1214 (475)	1358 (481)
S̃ (pN)	coarse-grained	1188 (477)	1307 (490)
k_β (pN)	all atom	1617 (1060)	1901 (1226)
k_β (pN)	coarse-grained	1609 (1014)	1790 (894)
C (pN·nm²)	all atom	445 (77)	392 (36)
C (pN·nm²)	coarse-grained	435 (77)	409 (26)
g (pN·nm)	all atom	–216 (63)	–229 (87)
g (pN·nm)	coarse-grained	–219 (62)	–228 (60)

In parentheses the standard deviations are reported.

In parentheses the standard deviations are reported. (a) Schematic description of the simulation setup for the stretch–torsion simulations, prescribing a constant force and torque applied along a fixed direction. (b) Twist response to the external force in the absence of imposed torque for MADna (orange circles), oxDNA2 (green squares), 3SPN2C (blue diamonds), and rotor-bead experiments[31] (gray triangles). The black dashed line indicates Δθ = 0; therefore, overwinding and unwinding responses are characterized by points lying above and below the line, respectively. The simulation data correspond to the average of the five sequences considered. (c) Effective stretching modulus S̃, torsion modulus C, and twist–stretch coupling constant g obtained by the three models for the five sequences reported in Section S3.3 in the Supporting Information. Symbols are the same as in panel b. The gray triangles correspond to experimental measures obtained for unrelated sequences in refs (17−20) for S̃, refs (20, 27, 31, and 92) for C, and refs (20, 31, 32, and 93) for g. In the case of g, the dashed black line denotes g = 0, thus separating two regimes characterized by qualitatively different twist–stretch coupling.

Comparison with Experiments: Sequence-Averaged Persistence Length

The persistence length lp quantifies the bending rigidity of a polymer and is possibly the most characterized mechanical property of dsDNA.[16−19,21,23,29,37,49] In its classical formulation from polymer theory, lp corresponds to the length of the fragment to be considered in order to observe significant thermally induced bending effects.[90] More technically, lp is defined as the decay length of the thermally averaged tangent-vector correlation function cp, that is, cp = exp(−s/lp), where s is the contour length of the polymer fragment under consideration.[90] This definition relies on the assumption that the minimum-energy conformation for the polymer is that of a straight rod. In the case of dsDNA, according to the sequence a spontaneous curvature may be present. In order to account separately for intrinsic bending and thermal fluctuations, it is customary to distinguish the static (ls) and dynamic (ld) contributions to the persistence length, which characterize the decay length of the suitably defined correlation functions cs = exp(−s/ls) (computed from the minimum-energy structure) and cd = exp(−s/ld).[56] The three lengths are approximately related by a harmonic sum, 1/lp ≃ 1/ls+1/ld.[87] In order to compute the persistence length and its contributions, we performed simulations for 20 random sequences of length 100 base pairs. The minimum-energy structures (see Methods) and the equilibrated trajectories were then employed to compute the correlation functions cp, cs, and cd according to eqs , 13, and 14, respectively. The results are reported in Figure S5 in the Supporting Information for various definitions of the tangent vector. Here, we focus on the results of one of such definitions, according to which the tangent vectors are obtained by joining the geometrical centers of sugars of base pairs separated by 10 steps (see Methods). Fitting the correlation function cp with an exponential decay resulted in computed values of the persistence length lp ranging between 46 and 64 nm, with an average equal to lp = 56 ± 1 nm. This is in good agreement with experimental values on random sequences and standard ionic conditions (45–55 nm[16−19,21,23,29,37,49]), particularly since lp was not employed in the parametrization of the model. Other coarse-grained models give predictions for lp within the experimental range,[60,62,65,68,71] although in most of these works the persistence length of double-[60,68,71] or single-stranded[65] DNA was employed as a target quantity in the construction of the force field. In the present case, the slight overestimation of lp is in line with previous results of atomistic simulations, where an average lp = 57 ± 3 was found.[49] As expected,[56] for each sequence the static correlation function c is found to decrease more slowly than cp (Figure S5 in the Supporting Information) since it lacks the disordering action of thermal fluctuations. The corresponding static persistence length ls was computed by fitting cs via an exponential decay, in analogy to cp. The obtained values vary widely, ranging from 199 to 796 nm, with an average equal to ls = 485 ± 42 nm. This is somewhat smaller than previous estimations (ls = 576 ± 191 nm[49]), although we note that this value varies wildly by considering different sequences and, even for the same sequences, by employing different definitions (see below). As discussed in ref (49), its large heterogeneity may rationalize the markedly different results reported in experiments, where values of ls as different as 130 nm and >1000 nm have been estimated.[16,91] Fitting cd via an exponential decay yields values for the dynamic persistence length ld ranging between 56 and 77 nm, with an average equal to ld = 64 ± 1 nm. This prediction is in line with a previous report based on atomistic simulations[49] (ld = 64.7 ± 1.4 nm) and lies between the results obtained from cryoelectron microscopy[16] (ld = 82 ± 15 nm) and cyclization experiments[91] (ld = 50 ± 1 nm). The difference between the two experimental values might be ascribed to the different techniques employed, for which future simulations with the present model may shed some light. As a final consideration, it has to be noted that the estimation of the persistence length may vary according to the definition of the tangent vectors.[56] Alternative choices are considered in Section S6 in the Supporting Information, yielding average values in the ranges lp = 56–63 nm, ls = 485–1641 nm, and ld = 64–69 nm. This indicates the robustness of the values of lp and ld with respect to the definition of the tangent vectors. The average value of ls is sensitive to the definition, although the observed variation can be partly ascribed to the intrinsic large variability characterizing this quantity.

Comparison with Experiments: Sequence-Averaged Elastic Constants

A further set of simulations was devoted to determine the sequence-averaged elastic constants. These quantities were already estimated for the training and testing sequences (Figure ), but their quantitative determination is likely to depend on the details of the microscopic definitions, particularly for the h-twist. In order to enable a quantitative comparison with experimental values, we designed a simulation framework which avoids relying on such definitions and which is more akin to single-molecule experimental setups. As shown schematically in Figure a, in this set of simulations a dsDNA molecule is subjected to a constant force f and torque τ acting along the z axis. This enables defining unequivocally the twist angle θ starting from the projection of the base pairs onto the xy plane, in analogy with single-molecule experiments based on rotor beads.[31] The linear response of the extension L as a function of f provides access to the effective stretch modulus S̃. Analogously, the torsion modulus C can be computed from the dependence of θ on τ. Finally, the twist–stretch coupling g can be obtained by looking at the cross-dependence, i.e., the dependence of L on τ or the response of θ to changes in f. The quantitative details can be found in the Methods. We performed simulations for five randomly generated sequences of length equal to 40 base pairs at several values of f and τ (see Methods for full details and Section S3.3 in the Supporting Information for the sequences, which are named ST1, ..., ST5). With this choice of sequence length, the molecules are long enough so that several turns of the double helix are present but short enough to neglect the effect of bending. For comparison, we also run simulations for the same set of sequences and parameters by employing the two most widely used coarse-grained models from the literature, namely oxDNA2[80,81] and the sequence-dependent model 3SPN2C.[82] From a qualitative perspective, a marked difference between MADna and the other two models becomes evident when analyzing the change in twist Δθ as a function of the force (Figure b). Indeed, the present model prescribes that dsDNA overwinds when stretched (Δθ > 0), which is in agreement with experimental observations.[31,32] In contrast, neither oxDNA2 nor 3SPN2C capture this feature, predicting unwinding upon pulling (in the case of oxDNA this fact was already observed in the original publication[68]). To our knowledge, no other coarse-grained model available in the literature has been shown to predict this unintuitive behavior of dsDNA. MADna shows better agreement with experiments also from a quantitative standpoint. As shown in Figure c and Table , it predicts values for S̃, C, and g in good quantitative agreement with the experiments. As for the other models, oxDNA2 shows a similar performance for C, while it tends to overestimate S̃. Coherently with the results shown in Figure b, the wrong sign for g is found for both oxDNA2 and 3SPN2C, with the latter showing in general a significant overestimation of the elastic constants. The results reported in Figure c and Table correspond to forces larger than 10 pN, which was chosen as a reasonable threshold to avoid bending effects. Nevertheless, performing the analysis with the full set of forces resulted in a small change in S̃ (roughly 10%) and in virtually no change in C and g. It is also worth mentioning that the range considered for the torque (0–30 pN·nm) corresponds to supercoiling densities below the threshold value σ ≃ 0.05 usually associated with torque-induced denaturation,[22] hence making the results obtained for MADna relevant for real dsDNA molecules within the whole range of applied torques. Particularly, we computed the supercoiling density as σ = Δθ/θ0, since the lack of bending results in the absence of relevant writhe. For the largest torque applied (τ = 30 pN·nm), we found σ ≃ 0.046 for MADna, σ ≃ 0.043 for oxDNA2, and σ ≃ 0.014 for 3SPN2C, which is lower than for the other two models due to the larger value of the twist modulus C.

Table 2

Elastic Constants Obtained by the Various Models and Their Comparison with Experiments (see Figure c)a

	S̃ (pN)	C (pN·nm²)	g (pN·nm)
experiment	1045 ± 92	436 ± 17	–90 ± 10
MADna	1038 ± 21	386 ± 3	–125 ± 6
oxDNA2	1448 ± 5	399 ± 1	+120 ± 1
3SPN2C	2589 ± 71	1180 ± 10	+514 ± 36

Experimental values are obtained as averages of the results reported in refs (17−20) for S̃, refs (20, 27, 31, and 92) for C, and refs (20, 31, 32, and 93) for g, while the corresponding indeterminacies are computed as the standard error of the mean. The values of g obtained for MADna in Table and Table differ by a factor of 2 with respect to each other. This striking disagreement appears to be too large to be simply ascribed to the different sequences considered or the different definitions employed for the twist angle. In this regard, based on atomistic simulations it has been suggested that the stretching modulus depends on the size of the fragment under consideration.[94] Since in the present case we are comparing sequences of size 20 and 40 base pairs, it might be that a similar effect is present for g. In order to check this hypothesis, we performed another set of simulations following the same protocol as in the present section but considering sequences made of 20 base pairs, which are listed as ST1-short, ..., ST5-short in Section S3.3 in the Supporting Information. The resulting elastic constants are reported in Figure S6 in the Supporting Information. Particularly, we found that for the short sequences g = −258 ± 16 pN·nm, which is indeed in line with the results reported in Figure . This confirms the presence of size effects in MADna for the prediction of elastic constants. To our knowledge, this is the first instance in which a length dependence for g has been predicted. This feature might explain the systematic overestimation of g in atomistic simulations that are restricted to short sequences.[46] Another interesting effect observed in experiments is the coupling between twist and bending, which is relevant at pulling forces up to a few piconewtons.[33,34] As a consequence of this coupling, the twist of the DNA molecule appears to be softer; i.e., the effective twist modulus is Ceff < C. In order to study this feature, we considered a set of simulations involving DNA molecules of 150 base pairs, which is comparable to the persistence length, thus enabling the presence of bending fluctuations. The simulation protocol was similar to the case just analyzed, although here no torque was applied. We considered three independent sequences which are reported in Section S3.4 in the Supporting Information. For each sequence, simulations were performed for MADna, oxDNA2, and 3SPN2C at pulling forces ranging between 0.05 and 2.5 pN. Further simulation details can be found in the Methods. In Figure we report the results obtained and compare them with the experimental values from refs (33 and 34). In line with the trend observed in Figure , MADna and oxDNA2 follow quite closely the experimental curve, while 3SPN2C systematically overestimates the value of Ceff.

Figure 8

Effective torsion modulus Ceff as a function of force for MADna (orange circles), oxDNA2 (green squares), 3SPN2C (blue diamonds), and experiments (gray triangles). Experimental data were extracted from refs (33 and 34).

Comparison with Experiments: Sequence-Dependent Conformation and Elasticity

Having characterized and benchmarked the sequence-averaged mechanical properties of MADna, we now turn to sequence-dependent features. In this regard, we considered the sequences studied experimentally in ref (28), where the authors characterized the sequence-dependent persistence length and helical repeat by means of cyclization experiments. We performed simulations for DNA molecules of 100 base pairs obtained by taking the central parts of 14 different experimental sequences as listed in Section S3.2 in the Supporting Information. For each simulated sequence, we computed the helical pitch by dividing the cumulated helical twist by 2π, while for the persistence length we employed the same approach as above. In Figure we report the comparison between experiments and MADna predictions. As the plots show, there is a high correlation between the two data sets, thus indicating that MADna satisfactorily captures the sequence dependence of two key features of DNA conformation (helical pitch) and elasticity (persistence length). From a quantitative perspective, we see from Figure a that the simulated values for the pitch are larger than the experimental ones. Nonetheless, a quantitative comparison has to be performed with care. Indeed, the experimental values were obtained indirectly by analyzing cyclization data by means of a wormlike chain with twist.[28] Although the relative differences observed in the experimental results for the different sequences are robust, the values depend on the validity of this model down to the scale of single steps and on the value assigned to the twist modulus when fitting the data. Moreover, it is likely that the details of the definition of the h-twist in the simulations affect the quantitative determination of the helical pitch. To our knowledge, this is the first instance in which these structural data have been compared to predictions from simulations. As for the persistence length, the strong correlation found has to be mitigated by the slight quantitative overestimation of lp in simulations, in line with the results reported above for the sequence-averaged persistence length. Moreover, the values of lp from simulations appear to be more heterogeneous than in experiments. While this can be ascribed to intrinsic limitations of MADna, it has to be noted that the smaller dispersion of values determined from experiments might originate from some assumptions made in their analysis. Indeed, experimental persistence lengths were obtained by fitting j-factors from cyclization experiments, where the theoretical formulas depend on lp, the helical pitch, the twist modulus C, and the contour length L.[28] The data reported in ref (28) were obtained by assuming sequence-independent C and L, which neglects two sources of sequence-driven heterogeneity, suggesting that the values obtained in experiments might underestimate the actual range of variability of lp.

Figure 9

Comparison between experimental values and MADna predictions for the sequence dependence of the helical pitch (a) and the persistence length lp (b). The lines correspond to the linear fits of the scatter plots and are included as a guide to the eye. The value of the Pearson coefficient is reported in each plot. For comparison, we also performed simulations for oxDNA2 and 3SPN2C, for which the results are reported in Figure S7 in the Supporting Information. We found that oxDNA2 predictions do not correlate with experiments for either the helical pitch or the persistence length. This is expected, since in this model the sequence dependence is implemented by tuning the base-pairing and stacking interactions to account for thermodynamic data but not for the elasticity of DNA. In the case of 3SPN2C, we found a weak correlation for the helical pitch but a strong correlation for the persistence length. This high correlation was also expected, since the persistence-length data from ref (28) were used to parametrize the model.[82] For completeness, we mention that also CGDNA has been used to reproduce these experimental persistence-length data, for which a similar correlation as for MADna was found (r = 0.73).[56] As a further test, we performed stretching simulations for phased A-tracts, i.e., dsDNA molecules obtained by alternating fragments of consecutive adenines and random sequences, with each fragment having a length of 5–10 base pairs. It was experimentally found that the stretching modulus for such molecules is roughly 50% larger than for random sequences.[21] We run some stretching simulations for five phased A-tracts of 40 base pairs, whose sequences were extracted as fragments of the experimental ones[21] and are reported as A-tract-1, ..., A-tract-5 in Section S3.3 in the Supporting Information. Simulation protocols and analysis were the same as in the case of random sequences studied above. The results from the three models were S̃ = (1402 ± 48) pN for MADna, S̃ = (1444 ± 2) pN for oxDNA2, and S̃ = (2793 ± 67) pN for 3SPN2C. By comparing these values with the results reported in Table , we thus conclude that MADna predicts an increase of roughly 35%, oxDNA2 does not predict any increase, and 3SPN2C predicts a 8% increase. As expected, the absence of elasticity-oriented sequence dependence for oxDNA2 prevents it from capturing this feature. The prediction of MADna is the one most closely resembling the significant increase observed experimentally, although still underestimating its extent.

Strengths and Limits

The comparison between MADna, oxDNA2, 3SPN2C, and experiments can be summarized as follows. MADna performs in general better at capturing both sequence-averaged and sequence-dependent conformational and elastic features. As for the other two models, oxDNA2 reproduces well the sequence-averaged properties, but it cannot account for sequence dependence. In contrast, 3SPN2C captures well the sequence dependence of elasticity in the case of the persistence length, while its accuracy for the change in stretch modulus or conformation is more limited. Moreover, it tends in general to overestimate the magnitude of the elastic constants. The accuracy of MADna in addressing the sequence dependence of conformational and elastic properties of dsDNA appoints it as the ideal choice to interpret the outcome of single-molecule experiments, as well as to study the conformational changes induced by mechanic stress, which are relevant for many systems in vivo. Moreover, it can be employed as a virtual laboratory to test analytical theories on DNA elasticity.[95,96] Nonetheless, at the present stage MADna cannot account for breaking events such as the formation of kinks or local melting. Hence, it is important to assess the relevance of such mechanisms for the system under study before drawing conclusions based on simulations performed with the present model. We are currently working to surpass such limitations, in order to further extend the palette of possible systems which can be analyzed through the lens of MADna. Further limitations are introduced by the use of an implicit solvent, which prevents studying DNA solvation. Similarly, the use of reduced charges interacting via the Debye–Hückel potential does not account for ion–ion correlations, which are particularly relevant for multivalent ions.[97] We note that MADna shares these limitations with all the coarse-grained models based on a similar philosophy, including, e.g., oxDNA2 and 3SPN2C. In view of future modeling of the interaction of DNA with proteins by means of MADna, the potential obstacle provided by these limitations can however be overcome by considering a distance-dependent dielectric constant for the electrostatic interaction between charged sites on DNA and proteins.[98]

LAMMPS Implementation and Availability

The standard potentials employed in MADna enable a straightforward implementation of the model in all common Molecular Dynamics simulators. The molecular builder, the main scripts used for the analysis, and some sample scripts for simulation in LAMMPS are provided at https://github.com/saassenza/MADna/tree/main/LAMMPS. The molecular builder takes as input the sequence and provides both the topology and the initial coordinates in LAMMPS format. In order to run MADna in LAMMPS (at the time of writing, the stable version is 29Sep2021), the latter has to be built with the packages MOLECULE, EXTRA-MOLECULE, and EXTRA-PAIR.

Conclusion

We have introduced MADna, a novel coarse-grained model for the simulation of double-stranded DNA. MADna captures the sequence dependence of conformational and elastic properties with accuracy comparable to that of atomistic simulations. Key conformational features which closely follow atomistic results include the main helical parameters, the groove geometry, the diameter of the double helix, and the spontaneous curvature as quantified by the crookedness. The model also predicts sequence-averaged and sequence-dependent elasticity and conformation in agreement with experimental results for a wide set of features, namely the stretching and torsion moduli, the counterintuitive negative twist–stretch coupling, the twist–bend coupling, the persistence length, and the helical pitch. At the present stage, the model does not account for breaking events, which are being addressed in ongoing work. The implementation of MADna in Molecular Dynamics software is straightforward due to the common potentials employed. Sample scripts for LAMMPS and a molecular builder are openly provided on GitHub. Due to both its accuracy and its simplicity of use, we believe that MADna provides a significant addition to the toolbox of coarse-grained simulations and will enable precise theoretical studies of a wide set of large-scale DNA phenomena.

84 in total

Introduction

Methods

Coarse-Grained Model

Bonded Interactions

Excluded Volume

Electrostatics

Determination of Parameters

Molecular Dynamics Simulations

Benchmark Simulations

Persistence-Length Simulations

Stretch–Torsion Simulations

Twist–Bend Coupling Simulations

Definitions and Protocols

Basic Definitions

Crookedness

Helical Parameters

Grooves

Determination of Elastic Constants

Benchmark Simulations

Stretch–Torsion Simulations

Computation of Persistence Length

Errors

Observables

Elastic Constants

Molecular Builder

Results and Discussion

Summary of the Coarse-Grained Model

MADna Reproduces Conformational Features from Atomistic Simulations

MADna Reproduces Elastic Constants Obtained from Atomistic Simulations

Comparison with Experiments: Sequence-Averaged Persistence Length

Comparison with Experiments: Sequence-Averaged Elastic Constants

Comparison with Experiments: Sequence-Dependent Conformation and Elasticity

Strengths and Limits

LAMMPS Implementation and Availability

Conclusion

Review 1. Origins of specificity in protein-DNA recognition.

Review 4. Chromatin remodelling: the industrial revolution of DNA around histones.

Review 10. Poly(dA:dT) tracts: major determinants of nucleosome organization.