Literature DB >> 36032773

Molecular Simulations Matching Denaturation Experiments for N⁶-Methyladenosine.

Valerio Piomponi¹, Thorben Fröhlking¹, Mattia Bernetti¹, Giovanni Bussi¹.

Abstract

Post-transcriptional modifications are crucial for RNA function and can affect its structure and dynamics. Force-field-based classical molecular dynamics simulations are a fundamental tool to characterize biomolecular dynamics, and their application to RNA is flourishing. Here, we show that the set of force-field parameters for N6-methyladenosine (m6A) developed for the commonly used AMBER force field does not reproduce duplex denaturation experiments and, specifically, cannot be used to describe both paired and unpaired states. Then, we use reweighting techniques to derive new parameters matching available experimental data. The resulting force field can be used to properly describe paired and unpaired m6A in both syn and anti conformation, which thus opens the way to the use of molecular simulations to investigate the effects of N6 methylations on RNA structural dynamics.

Entities: Chemical

Year: 2022 PMID： 36032773 PMCID： PMC9413829 DOI： 10.1021/acscentsci.2c00565

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 18.728

Introduction

Post-transcriptionally modified nucleotides are crucial for RNA function.[1,2] Methylation of adenine in the N6 position (m6A) is the most prevalent chemical modification in mRNAs and has been observed in both coding and noncoding RNAs.[1−4] m6A can finely regulate the interaction of RNA with specific proteins known as m6A readers. In addition, similarly to other chemical modifications,[5] it can directly affect RNA stability and structural dynamics (see, e.g., refs (6−8)). Specifically, m6A has been suggested to weaken Watson–Crick pairings due to the incompatibility of its most stable conformation (syn) with duplex formation[9−11] (see also Figure ). Interestingly, recent nuclear magnetic resonance (NMR) experiments have identified syn/anti dynamics in both paired and unpaired m6A, recapitulating the effect of N6 methylation on RNA conformational kinetics.[11] Molecular dynamics (MD) simulations give access to structural dynamics at the atomistic resolution[12] and are thus an ideal tool to complement NMR studies. The capability of classical force fields to predict the dynamics of difficult structural motifs is steadily increasing.[12−14] However, the number of applications of MD simulations to N6-methylated RNAs reported to date is still limited.[11,15−20] Force fields parameters for m6A developed by Aduri et al.[21] were determined in a bottom-up fashion and are compatible with the AMBER force field, which is widely adopted for RNA simulations.[12] These parameters are the default choice for a structure refinement tool.[22] However, they have been quantitatively validated against a limited set of experimental results.[18] The availability of denaturation experiments on a number of m6A-containing duplexes[9,23,24] calls for a more extensive validation of force-field parameters and, ideally, for fitting force-field parameters directly on experiments.[25]

Figure 1

(a) N6-Methyladenosine (m6A) nucleobase in anti (less stable) and syn (more stable) conformations.[9,11] Atom names in red correspond to charges reparametrized in this work. (b) Example of a duplex containing m6A in anti conformation, which is the expected conformation for the nucleotide when Watson–Crick paired. A 6 is used to denote m6A in secondary structures for compactness. (c) Example of a duplex with m6A as a dangling end in syn conformation. The m6A methyl group is highlighted in yellow. In this paper, we validate and improve the parameters introduced in ref (21) by using alchemical free-energy calculations (AFECs).[26] To this end, an unmodified adenine is converted to a modified one by switching on/off nonbonded interactions of specifically chosen atoms, in both single-stranded and double-stranded RNAs[18] (see also Figure ). We then develop a reweighting technique that can be used to predict results corresponding to a different set of charges without the need to perform new MD simulations. Additionally, we extend a recently introduced force-field fitting strategy[27] to be usable in the context of alchemical simulations. The introduced approach allows training six charges and a dihedral potential so as to quantitatively reproduce methylation effects in denaturation experiments. The resulting force field can be used to properly describe paired and unpaired m6A in both syn and anti conformations.

Figure 2

Thermodynamic cycle. Alchemical free-energy calculations (AFECs) allow computing ΔG by integrating along an alchemical path λ describing the transformation of nonmethylated adenosine into m6A, by switching on/off the nonbonded interaction of specifically chosen atoms. The relative free-energy change due to the modification can be estimated as the ΔΔG between AFECs performed on a duplex and on the corresponding single strand. This quantity can be directly compared to the difference in thermodynamic stability of duplexes with or without the modification, which can be measured experimentally through denaturation experiments.

Material and Methods

Simulated Systems

We simulated the isolated m6A nucleoside, 9 m6A-methylated duplexes for which denaturation experiments are available in the literature[9,24] (see Table ), and the corresponding single-stranded RNAs. For the isolated m6A nucleoside, we computed the ΔG by taking the difference in the ΔGs obtained with AFEC by methylating the adenosine in syn or anti conformations. We then chose several systems from ref (9). For systems A4 and A5, where m6A is present as a dangling end and thus unpaired, we only performed AFECs corresponding to the syn conformation. For the other systems, we performed AFECs in the expected anti conformation. For the A2 and A3 systems, we additionally performed AFECs in the unexpected syn conformation as a validation (population reported in ref (11) is ∼1%). In addition, we chose 5 more systems from ref (24), with the following criterion: they have a single methylation per strand, and the methylation occurs in an internal position of the duplexes. For all of these systems, we performed AFECs in the expected anti conformation.

Table 1

List of Systems Involved in the Fitting and Relative Experimental ΔΔGa

	system	ΔΔG (kJ/mol)
A1	m⁶A ΔG_syn/anti	6.3[9]
A2	UACG6CUG	1.7 ± 0.9[9]
	AUGCUGAC
A3	CGAU6GGU	7.1 ± 0.9[9]
	GCUAUCCA
A4	6CGC	–2.5 ± 1.2[9]
	GCG
A5	GCG6	–1.7 ± 0.9[9]
	CGC
B1	GUC6CUG	2.5 ± 2.1[24]
	CAGUGAC
B2	ACU6UAGU	2.1 ± 1.3[24]
	UGAU6UCA
B3	AGUU6ACU	5.4 ± 1.3[24]
	UCA6UUGA
B4	CGGUGC6UCG	8.6 ± 0.8[24]
	GCU6CGUGGC
B5	ACUUA6GU	1.7 ± 1.0[24]
	UG6AUUCA

The first system is the single nucleotide, and the experimental value corresponds to ΔG. In A2–A5 and B1–B5, the “6” in the double strand sequences is used to identify m6A for compactness. ΔΔGs for systems A2–A5 were measured by Roost et al.[9] In A4 and A5, the m6A is positioned as a dangling end and has a stabilizing effect on the duplex. Experiments for systems B1–B5 were performed by Kierzek et al.[24] In B2–B5 systems, the methylation occurs in both strands; however, the ΔΔGs reported are intended per methylation. Starting structures for MD simulations were built using the proto-Nucleic Acid Builder.[28] Single strands were generated by deleting one of the chains from duplex structures. All of the MD simulations were performed using a modified version of GROMACS 2020.3[29] which also implements the stochastic cell rescaling barostat.[30] The AMBER force field was used for RNA,[31−33] the TIP3 model for water,[34] and Joung and Cheatham parameters for ions.[35] As a starting parametrization for m6A, we used AMBER adenosine parameters combined with modrna08[21] charges for the nucleobase, adjusted to preserve the total charge of the nucleoside. Details on the implementation of these parameters and the initial tests are reported in Section S1. Charges are given in Table S2. We refer to this parametrization as the Aduri force field.

Alchemical Simulations

For AFECs, we included a hybrid adenosine with double topology in the force-field definition: the first topology corresponding to standard adenosine, and the second one corresponding to m6A. We used 16 replicas in which Lennard-Jones (LJ) parameters and partial charges were simultaneously interpolated. In order to avoid singularities due to electrostatic interaction when the repulsive LJ potential is switched off,[26] we used soft-core potentials as implemented in GROMACS (sc-alpha = 0.5, sc-sigma = 0.3, and sc-power = 1; see Section S2).[36] Simulation boxes consist of rhombic dodecahedrons containing RNA, water, and Na+ and Cl– ions with an excess salt concentration of 0.1 M. For a subset of the systems, further simulations were performed for a salt concentration of 1 M. The systems were energy-minimized and subjected to a multistep equilibration procedure for each replica: 100 ps of thermalization to 300 K in the NVT ensemble was conducted through the stochastic dynamics integrator (i.e., Langevin dynamics),[37] and another 100 ps was run in the NPT ensemble simulations using the Parrinello–Rahman barostat.[38] In production runs, the stochastic dynamics integrator was used in combination with the stochastic cell rescaling barostat[30] to keep the pressure at 1 bar. Equations of motion were integrated with a time-step of 2 fs. Long-range electrostatic interactions were handled by particle-mesh Ewald.[39] During production, a Hamiltonian replica exchange was used proposing exchanges every 200 fs. The set of λ values defining the replica’s Hamiltonians was chosen in such a way to guarantee transition probabilities above 20% and as homogeneous as possible (see Section S2), ensuring a sufficient phase space overlap between replicas. Each replica was simulated for 10 ns, for a total of 16 × 10 ns = 160 ns for each system. To decrease numerical errors in energy recalculations, trajectories were saved in an uncompressed format. At the end of the production phase, the 16 independent “demuxed” (i.e., continuous) trajectories were processed to recompute energies for each of the 16 Hamiltonian functions to compute ΔG via the binless weighted histogram analysis method (WHAM).[40−42] Specifically, for each trajectory, a weight w was found for each snapshot x that allows computing statistics for the unmodified adenine as a weighted average over the set of concatenated replicas (see Section S3). These weights were then used to compute the ΔG associated with the methylation aswhere ΔE(x) = Eλ=1(x) – Eλ=0(x) is the difference between the total energy computed with the Hamiltonian energy functions associated with m6A and adenosine, respectively. We used a bootstrapping procedure to compute the statistical error on ΔG estimates by resampling the 16 continuous trajectories 200 times with replacement.[43] As a control, we computed ΔGs using the standard Bennett-acceptance-rate estimate implemented in GROMACS.[44,45] As can be seen in Table S3, these estimates are numerically equivalent to those obtained with binless WHAM. ΔΔGs were obtained taking the difference between ΔGs obtained by methylating the adenosine in anti or syn conformation on the duplex or dangling end, respectively, and the ΔG obtained methylating in syn conformation on the relative single strand. Transitions between syn and anti states were never detected during the alchemical simulations. In this way, the contribution to the free energy given by the syn (anti) conformation in the duplex (single strand or dangling end) was ignored. Indeed, we expect these contributions to be negligible based on the experimental evidence,[9,11] which show a syn/anti isomer preference when paired (∼1:100) versus unpaired (∼10:1). This was additionally verified with supplementary simulations performed on the A2 and A3 systems (see Table S3). Moreover, we computed ΔG by performing the alchemical transformations on the isolated nucleoside in solution for the two isomers and computing their difference (see Table S3).

Fitting Procedure

We employ a fitting strategy based on reweighting[27] where a subset of the partial charges and a dihedral potential are adjusted to match experimental data. Specifically, we decided to fit the atoms that are closer to the methyl group (N6, C6, H61, C10, H101/2/3, and N1, see Figure ). The total charge was maintained, leading to 5 free parameters associated with the partial charges. A single cosine was added to the η6 torsional angle identified by atoms N1–C6–N6–C10: U(x) = Vη [1 + cos(η6(x) – π)]. This angle controls the syn/anti relative populations, leading to a total of 6 parameters, and the shift is chosen so that a positive value of Vη favors syn configurations over anti. To optimize the calculation of the total energy of the system at every iteration of our fitting procedure, where up to 6 charges were possibly modified, we notice that the total energy of the system is a quadratic function of the charge perturbations ΔQ. Without loss of generality, one can write the energy change associated with charges and torsion perturbation asIn total, for every analyzed snapshot (x), 20 coefficients (K and K) can be precomputed that allow obtaining the energy change for arbitrary choices of ΔQ with simple linear algebra operations, without the need to recompute electrostatic interactions explicitly. The coefficients were obtained by using GROMACS in rerun mode for 20 sets of test charge perturbation, which were extracted from a Gaussian with zero average and standard deviation set to 1 e. The perturbations were constructed to maintain constant the total charge. Importantly, this approach correctly takes into account the effect of charge perturbations on 1–4 interactions, where electrostatics is scaled with a force-field-dependent fudge factor, as well as on 1–2 and 1–3 interactions, for which it is discarded, and interaction with all the periodic images. The second order expansion above is exact if one neglects round-off errors. The magnitude of charge perturbations was chosen to minimize such errors. Equation should then be suitably modified replacing ΔE with ΔE + ΔU. Its derivatives with respect to the free parameters (charge and dihedral potential coefficient) can be computed as well (see Section S6). Our fitting is based on the minimization of an L2-regularized cost function defined as follows:Here, σ corresponds to the experimental error, and the χ2 measures the discrepancy between computations and experiments, whereas the regularization terms on the charges and the torsional η6 are governed by the hyperparameters α and β and are needed to avoid overfitting on the training set. This function is minimized using the L-BFGS-B method[46] as implemented in SciPy.[47] The result crucially depends on the choice of the hyperparameters α and β. Lower values for the hyperparameters imply that larger corrections are allowed, with the risk of overfitting, and thus lower transferability to new experiments. Higher values for the hyperparameters imply that lower corrections are allowed, with the risk of underfitting and thus lower accuracy in reproducing experimental data. The sweet point could be in principle found with a cross-validation (CV) procedure and a scan over possible values for α and β.[25,27] For the smallest data set (set A in Table ), we used a leave-one-out CV strategy; i.e., we trained the parameters on all systems except one. For the largest data set (set AB in Table ), we used a leave-3-out strategy, iteratively training the parameters on 7 randomly chosen experiments and validating on the 3 left-out experiments. In both cases, we then assessed the transferability of the model by evaluating its average χ2 on the system (or the subset of systems) that was left out.

Statistical Significance

When recomputing energies through a reweighting procedure, particular attention must be taken toward the statistical significance that may be lost during the computation, by reducing the effective sample size of the data set. This is usually monitored by computing the Kish effective sample size.[48,49] In our case, the most affected ensemble is the one corresponding to m6A (λ = 1). We thus monitor the Kish size computed using weights corresponding to the λ = 1 ensemble, defined asWe then compare it with the Kish size obtained with the original force field, defined asTo quantify how much statistical efficiency is lost due to the reweighting to a modified set of parameters, we use the Kish size ratio (KSR) that we define as

Results

In this work, we fitted point charges and a single torsional potential correction for an m6A RNA residue using alchemical MD simulations and a set of experimental data, following the scheme shown in Figure . In all of the fittings, charges and torsional potential were subject to L2 regularization with hyperparameters α and β, respectively. We initially employed only the first 5 experimental data points of Table , namely, (A1) ΔG for a nucleobase and (A2–A5) ΔΔG in melting experiments.[9] Thus, we first report the results obtained with such a set of charges, including a validation done on a more recent set of melting experiments (B1–B5).[24] We then report results obtained with charges that were fitted on the entire data set (A1–A5 and B1–B5). As a reference, results obtained with the Aduri et al.[21] modifications (modrna08) for the commonly used AMBER force field are also reported, either as is or complemented with a custom torsional correction that results in a ΔG matching experiment A1. All of the calculated ΔGs are reported in Table S3. A complete list of the performed alchemical simulations is reported in Section S9. Structural snapshots for all simulation systems are reported in Figures S9–S18. We note that, in a preliminary version of this work, some of the results were obtained by reweighting simulations performed with a slightly different set of charges, obtained by an incorrectly regularized fitting. The original results are very close to those reported here and can be seen in the preprint available at https://arxiv.org/abs/2203.14886v2.

Fitting on the Smaller Data Set

For this first fitting, we only employed data set A1–A5 from Table . χ2 errors were computed using eq and setting the experimental error of each data point (σ) to be equal to each other and to 1 kJ/mol. Figure a reports the results of a cross-validation test performed with a leave-one-out procedure. Namely, we fit the whole experimental data set leaving out one experimental data point at a time and report the average error on the left-out experiment. In this leave-one-out procedure, we decided not to iterate on the ΔG experiment (A1), since this is expected to be crucial to correctly reproduce the conformation of non-Watson–Crick-paired residues (mostly syn). From this map, we can hardly appreciate any variation of the χ2 along the vertical axis corresponding to the β hyperparameter. This suggests that β could be set to zero, thus simplifying all subsequent hyperparameter scans. Conversely, the χ2 grows significantly for low α values. This implies that regularization of charges is required to avoid overfitting. In general, one should expect a minimum to be observed in this type of hyperparameter scan.[25,27] This is not the case here for the α scan (see also Figure S3, showing projection on α for β = 0), implying that the performance of the parameters on a given system is not improved when excluding that system from the training set. This is likely due to the small data set employed.

Figure 3

Results obtained with parameters fitted on the initial data set, A1–A5 in Table . (a) Cross validation error obtained with a leave-one-out-procedure, shown as a function of the two regularization hyperparameters α, for charges, and β, for the torsional potential. Darker green colors correspond to lower values of the average χ2 computed on the systems left out iteratively from the fitting. (b) Parameters (ΔQ and Vη) obtained from the entire initial data set as a function of α, with β = 0. (c) χ2 errors for individual experiments and Kish size ratio (KSR, see text for definition) obtained using parameters fitted on the entire initial data set as a function of α, with β = 0. (d) Validation on the second data set (B1–B5 in Table ) of the parameters obtained on the first data set. Results using Aduri parameters are shown as horizontal lines, either as reported in the original paper (green) or including a single torsional correction to obtain the correct syn/anti population (data point A1). Figure b shows the optimized parameters (charge and torsional corrections) as a function of the regularization hyperparameter α while fixing β = 0. A transition can be seen at α ≈ 10. Namely, when α > 10, parameters have a smooth dependence on α, whereas when α < 10, both the charges and the torsional potential change suddenly. In the limit α → ∞, it can be seen that charge corrections tend to zero with an inverse law dependence, which is expected for L2 regularization, and the torsional correction tends to Vη ≈ 1.5 kJ/mol, which corresponds to the amplitude of the torsional potential that optimizes the χ2 without modifying the charges of the reference Aduri et al. model. We notice that ΔG obtained when using the Aduri et al. force field is ∼1.7 kJ/mol, and thus, this correction results in ΔG ≈ 1.7 + 2 × 1.5 = 4.7 kJ/mol, which is still smaller than the experimental reference ∼6 kJ/mol. The obtained parameters indeed strike a balance between favoring the syn state in the isolated nucleoside and not favoring it too much in the single-stranded calculations used to predict the ΔΔG from melting experiments, which would lead to too large destabilizations associated with the methylation. When α is decreased, the optimal torsional correction changes, since all of the parameters are coupled. This confirms that charges and torsional parameters should be fitted simultaneously. Figure c shows the individual χ2 associated with the same hyperparameter scan. The average χ2 error is, by construction, monotonically increasing with α, and most of the individual errors follow the same trend. Figure c also shows the statistical efficiency of the analysis, quantified by the relative reduction of the Kish effective sample size associated with reweighting. A low number here indicates that the tested charges are so different from those employed in the simulation to make the result statistically not significant. The Kish size displays a significant drop for α < 10, indicating that results in this regime might be not significant. This is a likely explanation for the discontinuous behavior observed in Figure b. We then tested the charges obtained with this reduced training set on the newer data set B1–B5 (see Table ), which was not included in the training phase. This set of data involves 5 recently published melting experiments,[24] 4 of which have m6A occurring in both chains of the duplex. We notice that double methylations are expected to lead to an even lower statistical efficiency of the reweighting procedure. We thus performed this analysis by reweighting simulations that were generated using the set of parameters derived fitting on systems A1–A5 for α = 10 and β = 0. Since this parametrization is closer to the right solution of the fitting when compared with the Aduri one, it obtains higher Kish size values in the relevant α range (see Figure d). The χ2 computed on the second data set shows that an optimal result can be obtained by setting α ≈ 10. We also compared with results obtained using the original Aduri charges and optionally including a torsional correction to fix the syn/anti balance. These results are obtained with direct simulation, that is, without reweighting. It can be seen that the results with the parameters trained on systems A1–A5 largely outperform those obtained with Aduri parameters on systems B1–B5, thus confirming the transferability of the parameters. Aduri+tors parametrization corresponds to setting Vη = 2.35 kJ/mol in such a way to perfectly fit experiment A1 (single nucleoside) without modifying charges. The χ2 computed for Aduri+tors demonstrates that acting exclusively on the torsional is not sufficient to reproduce both ΔG and melting experiments. It is also important to note that the improvement in reproducing experiments is obtained by changes in the partial charges that are small when compared to differences between charges derived with the standard restrained electrostatic potential protocol[50] in different conformations (see Section S5).

Fitting on the Full Data Set

Next, we perform a fitting using the full data set reported in Table . Since the variability of error in this data set is larger, we here computed χ2 using the experimental errors reported in Table . For the ΔG experiment, for which an experimental error is not reported, we used a nominal σ = 0.5 kJ/mol to assign to this experiment a larger weight when compared to the other data points corresponding to melting experiments. Figure a reports the results of a cross-validation test performed with a leave-three-out procedure. Namely, we randomly select seven systems to be used in training, and we report the average χ2 error obtained for the remaining three systems. This time, system A1 was also allowed to be left out of the training set. Results are qualitatively consistent with those obtained with the smaller data set (see Figure ). It is difficult to appreciate any variation of the χ2 along the vertical axis corresponding to the β hyperparameter, suggesting that we can safely set β = 0. We also do not find any clear minimum when scanning over α (see also Figure S4, showing projection on α for β = 0). Figure b shows the parameters as a function of the regularization hyperparameter α while fixing β. A clear transition can be seen at α ≈ 20. The average χ2 error is monotonically increasing with α, but some of the systems have a nontrivial behavior (Figure c). The Kish size shows a significant drop for α < 50, showing that results in this regime might not be statistically reliable. We thus select the parameters obtained with α = 50 as the optimal ones trained on the entire data set.

Figure 4

Results obtained with parameters fitted on the full data set, A1–A5 and B1–B5 in Table . (a) Cross-validation error obtained with a leave-three-out-procedure, shown as a function of the two regularization hyperparameters α, for charges, and β, for the torsional potential. Darker green colors correspond to lower values of the average χ2 computed on the systems left out iteratively from the fitting. (b) Parameters (ΔQ and Vη) obtained from the entire data set as a function of α, with β = 0. (c) χ2 errors for individual experiments and Kish size ratio (KSR, see text for definition) using parameters fitted on the entire initial data set as a function of α, with β = 0. (d) ΔΔG computed for each of the 10 analyzed systems with 4 different sets of parameters. fit_A is parameters obtained fitting on the first data set (A1–A5) with regularization α = 10. fit_AB is derived fitting on the entire data set (A1–A5 and B1–B5) for α = 50. χ2 values obtained for each force field set of parameters are shown in the table inside panel d. We then compare the performance of several different sets of parameters in reproducing all of the available experimental data points. Namely, we compare (a) the original Aduri parameters (Aduri), (b) the Aduri parameters augmented with a torsional correction to enforce the correct syn/anti balance in a nucleobase (Aduri+tors), (c) the parameters obtained fitting on the initial data set (A1–A5) with hyperparameter α = 10 (fit_A), and (d) the parameters obtained fitting on the full data set (A1–A5 and B1–B5) with hyperparameter α = 50 (fit_AB). Free energies are computed directly from the alchemical simulations, that is, without reweighting. Results are reported in Figure d. The quality of the fit is also summarized in the reported χ2 values. The addition of a simple torsional correction to the Aduri parameters results in a decrease in the overall χ2 from 15.23 to 9.17. However, this decrease is dominated by the χ2 of the A1 data point, which is reduced from χ2 = 84.64 to zero. Conversely, the χ2 averaged on all of the other experiments increases from χ2 = 7.57 to χ2 = 10.19. This indicates that including in the fitting the single A1 data point makes the agreement with denaturation experiments worse. On the other hand, the two sets of parameters obtained in this work (fit_A and fit_AB) display a significantly better agreement with experimental data. Note that fit_A, surprisingly, performs moderately better than fit_AB. The reason is that fit_AB, based on systems with double methylation and thus lower statistical efficiency, was performed with a higher regularization hyperparameter and thus parametrization closer to the reference one, as shown in Figures S6 and S7. Finally, as observed in the previous subsection, we note that the improvement in reproducing experiments is obtained by relatively small changes in the partial charges (see also Section S5). The fitted parameters are summarized in Table .

Table 2

Charge Modifications (ΔQ) and Torsional Potential (Vη) for the Fitting Performed on the Smaller Data Set (fit_A, α = 10) and for the Fitting Performed on the Larger Data Set (fit_AB, α = 50)a

	C6 (e)	N6 (e)	H61 (e)	N1 (e)	C10 (e)	H100 (e)	V_η (kJ/mol)
fit_A	0.019	0.077	0.099	–0.046	0.004	–0.051	2.46
fit_AB	0.009	0.049	0.067	–0.053	0.033	–0.035	2.49

For future simulations, we recommend using fit_A, which leads to a lower error on the larger set of available experiments.

Relative Stability of syn and anti Conformations

One piece of the experimental information that we implicitly used in our fitting procedure is the relative stability of syn and anti conformations in a nucleotide. We indeed assumed a predominant population of syn conformation for the unpaired nucleotides used in the reference single-stranded systems. We also assumed that m6A adopts exclusively its anti conformation when paired, in agreement with experiments.[9,11] In particular, ref (11) reports that, for the most common G6C sequence, m6A forms a Watson–Crick base pair with uridine that transiently exchanges on the millisecond time-scale between the main substate (anti) and a lowly populated (1%), singly hydrogen-bonded and mismatch-like conformation through isomerization of the methylamino group to the syn conformation. This population corresponds to a ΔGduplex ≈ −11 kJ/mol. We a posteriori validated this population by performing alchemical transformations on the duplex systems enforcing the syn conformation. The predicted ΔG values for a nucleotide and two of the tested duplexes are reported in Table , where the corresponding experimental values are also included. For the A1 experiment, as expected, the proposed sets of parameters closely match the experimental value that was used during training. The Aduri et al. force field underestimates the ΔG, resulting in a relatively high population of the unexpected anti conformation in a nucleoside. This difference can be directly corrected with a torsional potential applied on the η torsion (Aduri+tors). However, when analyzing duplexes A2 and A3 with the Aduri+tors parameters, we found that the predicted ΔG would be close to zero, in fact resulting in the assumption of neglecting the syn conformation in duplexes in our alchemical calculations being difficult to justify, and in disagreement with experimental findings. In other words, the original Aduri charges allow reproduction of the relative stability of syn and anti conformations in either the paired state (Aduri parameters) or the unpaired state (with torsional correction), but not in both simultaneously. Remarkably, the sets of parameters proposed here, which also contain a torsional term penalizing the anti conformation, result in a significantly higher value for ΔGduplex, much closer to a qualitative agreement with the experiment. This suggests that the proposed parameters better describe the interactions of the m6A nucleobase with the surrounding environment and are thus more transferable. We notice that the relative stability of syn and anti conformations is predicted to be sequence-dependent, being different for system A3 (sequence U6G).

Table 3

Free-Energy Differences between syn and anti Isomer States in Systems A1–A3a

	Aduri	Aduri+tors	fit_A	fit_AB	exp
(A1) ΔG_syn/anti	1.71 ± 0.25	6.33 ± 0.25	6.07 ± 0.21	6.04 ± 0.26	6.3
(A2) ΔG_syn/anti^duplex	–7.7 ± 0.5	–3.1 ± 0.4	–10.4 ± 0.6	–7.8 ± 0.4	∼−11
(A3) ΔG_syn/anti^duplex	–5.4 ± 0.5	–0.8 ± 0.4	–4.9 ± 0.6	–5.8 ± 0.5

The last column corresponds to experimental estimates, whereas the other columns correspond to computed ΔΔG for different parametrizations. Energies are given in kJ/mol.

The last column corresponds to experimental estimates, whereas the other columns correspond to computed ΔΔG for different parametrizations. Energies are given in kJ/mol. To gain insight into how the m6A–U pairings occur in the duplexes, we analyzed snapshots of system A2, for m6A in both syn and anti, together with histograms of distances between atoms belonging to the two nucleobases (Figure ). The reported histograms are unimodal and with an increased average associated with the distortion of the A–U Watson–Crick pairings due to the steric clash induced by the methylation. However, the hydrogen bond between A–N1 and U–H3 is present, in agreement with what has been suggested previously.[11]

Figure 5

Interfacing atom distances for m6A–U pairing in system A2 in Table , for anti conformation (left) and syn (right). Histograms show unimodal distributions, and the averaged value are indicated in the box. Distances are sampled from the alchemical trajectories considering only the λ = 1 replica. In the syn conformation, m6A–H10 corresponds to the hydrogen of the methyl group closest to the uracil oxygen O4.

Interpretation of Parameters and Dependence on Ionic Strength and Temperature

To provide an interpretation for the obtained parameters, we performed a few additional fittings. In particular, we investigated which charges have a major impact on enforcing agreement with experiments. For this purpose, we performed a fitting on exclusively two charges at a time plus the torsional term, considering pairs of charges that, in fit_A and fit_AB, have systematically positive and negative ΔQs. Results are discussed in Section S7 and Figure S9. Overall, the results suggest that the main contribution of the fitted correction is to increase the stability of Watson–Crick hydrogen bonds by making N1 and H61 more polar and at the same time using the η torsional potential to control the syn/anti relative population. As a further test, we simulated a subset of the systems using a salt concentration of 1 M, which is consistent with that used in experiments. As shown in Section S8, results are equivalent to those obtained at a 0.1 M salt concentration. These tests were performed using the fit_AB set of charges, but for one system, we repeated them with the original Aduri charges, with equivalent results. We interpret this result with the fact that the methylation is not sufficiently altering the electrostatic environment to be sensitive to changes in ion concentration. This implies that training using simulations performed at a different ion concentration would result in an equivalent set of parameters and further confirms the robustness of our results. In addition, considering that the experimental results refer to a temperature T = 310 K, which is different from the simulation temperature (T = 300 K), we performed control simulations and analyses to assess the impact of this choice. Results are reported in Section S9 and show that such a temperature difference is not relevant for the results discussed here.

Discussion

In this work, we proposed a protocol to parametrize charges in modified nucleobases using available melting experiments. The approach is applied to m6A and leads to a set of charges that can reproduce a set of 10 independent experimental values. The approach is based on the force-field fitting strategies introduced in earlier works,[27,51,52] which are here extended with several technical improvements. A first methodological contribution is a formalism that allows alchemical calculations to be used as a reference. Previous works were only using observables computed with a single set of force-field parameters.[14,27,51,52] The method introduced here allows free-energy differences between different sets of parameters to be evaluated and compared with the experiment. This opens the way to the optimization of parameters based on experimentally measured ΔΔGs. We based our analysis on optical melting experiments, which are commonly employed in the nucleic acids community,[53] but other types of experiments might be considered. In our specific application, only the parameters of one of the two end states were refined, but one could similarly fit parameters for both adenosine and m6A, at the price of increasing the number of parameters and thus the risk of overfitting. A second improvement is that we developed a way to efficiently recompute the total energy of the system using test charges. This is achieved by precomputing the total electrostatic energy of the system with a set of randomly perturbed charges. Given the high cost of electrostatic calculations, this makes the cost of each of the iterations performed during force-field fitting significantly faster and implicitly takes into account combination rules, nonbonded exclusions, and periodicity. These two improvements can be readily integrated into other MD-based force-field optimization strategies. A limitation of optimizing charges with the introduced procedure is that the statistical efficiency of reweighting is significantly decreased even by small charge perturbations. This implies that simultaneously parametrizing many copies of the same nucleotide, or parametrizing a larger number of charges for the same nucleotide, would be more difficult. In our case, we had to include at most two m6A residues in the same simulation. If more copies of the same reparametrized nucleotide are needed in the same system, one might have to design strategies where only a few copies at a time are reparametrized, or follow an iterative procedure where modifications are included in consecutive steps.[27] In this application, this was not necessary. Overfitting was avoided by using a standard L2 regularization term on the charge increments. This penalty does not depend on the charge location. Importantly, the regularization hyperparameters tune the relative weight of the experimental data and of the reference charges, here taken from ref (21), thus allowing a meaningful set of parameters to be obtained also in regimes where the number of data points is very limited. It is worth noting that the standard restrained ESP fitting is performed including a restraint that acts as a hyperbolic regularization term,[50] which is introduced to keep the absolute values of the obtained charges as small as possible. Our regularization, instead, keeps the resulting charges as close as possible to the initial guess obtained with the restrained ESP procedure.[21] This allows the implicit inclusion in the fitting of the result of the corresponding quantum-mechanical calculation. More effective regularization strategies might be designed based on the molecular dipole, as done in ref (54), to minimize the perturbation of the electrostatic potential at a large distance from the molecule. Alternatively, one might directly use as a regularization term the deviation from the quantum-mechanical electrostatic potential at a short distance. In the limit of a large regularization hyperparameter, this would lead to ESP charges.[55] Finally, other regularization criteria might be used.[14] When comparing our procedure with standard ESP charge fitting, it is important to realize that we are aiming to reproduce experimentally observed ΔΔGs, which are nonlinear functions of the energy of each configuration, which in turn is a quadratic function of the charges. These nonlinearities make it possible for multiple local minima of the cost function to exist and could thus make the minimization not reproducible. However, when sufficiently regularized, the fitting procedure results in reproducible charges that depend smoothly on the control parameters. In standard ESP fitting, instead, the electrostatic potential is fitted, thus resulting in a linear fit with a unique solution. We notice that the parameters of the unmethylated force field were not modified. This was based on the assumption that the employed set of force-field parameters is already capable of reproducing ΔΔG experiments associated with mutations between nonmodified nucleobases.[56] The m6A charge optimization could be easily repeated using another set of initial parameters, and the parameters of nonmodified nucleobases might be adjusted as well, although with the caveat discussed above. Another possible limitation of the employed alchemical simulations is the sufficient sampling of the end states. The duplex is expected to be stable and well structured, so sampling multiple structures should not be necessary. For selected cases, we also explored the possibility to include the unlikely syn paired state, which, as expected, gives a negligible contribution to the stability of the duplex. For single strands, instead, we only sampled the syn state. More importantly, our simulations were short enough to avoid any significant reconformation of the single strand. Sampling the conformations of flexible, single-stranded RNAs is notoriously difficult.[12] In addition, the generated ensemble might contain artificially stabilized intercalated structures, whose population is known to be overestimated by the RNA and water force fields adopted here.[57,58] This would make the correct sampling of the single-stranded state unfeasible. We also notice that the experimental results that we aimed to reproduce were performed on systems designed to have the isolated strands unstructured, to capture the effect of methylation on hybridization. Putting everything together, we conclude that the approximation of a single strand ensemble that does not depart too much from the initial A-form helix is a sensible choice for this specific application. An important finding of this work is that the parameters of Aduri et al. cannot reproduce the syn/anti balance expected for m6A residues. This balance is extremely important and is related to the mechanism by which m6A modifications modulate duplex stability.[9] This could not be rectified with a straightforward correction of the single torsion involved. The optimized charges, instead, allow the correct syn/anti balance to be recovered in both paired and unpaired nucleobases as well as a heterogeneous set of optical melting experiments to be reproduced. Interestingly, the Aduri et al. parameters were tested in a recent work,[18] with results for system A2 in Table consistent with ours and with experiments. However, systems A1 and A3 were not tested, and thus, the problem that we observed here could not be identified. Another interesting finding is that the ΔΔGs associated with N6 methylation are here predicted to be independent of ion concentration. We are not aware of any experimental validation of this finding, which could be obtained by comparing melting experiments at different ion concentrations. Finally, our results suggest that the relative population of the syn excited state in duplexes[11] might significantly depend on the identity of the neighboring nucleotides. The precise hybridization kinetics could thus be quantitatively different for RNAs with different sequences. A convenient property of our approach is that it does not require changing the functional form of the interaction potential so that new parameters can be readily incorporated into existing MD software. This is not the case if ad hoc corrections are employed.[14,59] In addition, it is worth noting that the charge modifications obtained are very small, and in particular, they are smaller than the typical difference between sets of charges derived with slightly different procedures or using different reference conformations. Despite this small difference, the effect on experimental observables is significant. These observations imply that there is still significant space to improve the performance of current force fields without necessarily modifying the functional form if experimental information is used during training.[25] Using our approach, it is possible to dissect the individual contribution of the modified force-field parameters. The main factors playing a role in the change of duplex stability induced by m6A methylation are (a) the penalty for switching to the unfavored anti isomer,[9] (b) the stabilization induced by hydrophobic shielding of the methyl group against surrounding bases (see also Figure S8),[60,61] (c) the impact of partial charges on stacking interactions,[60] and (d) the impact on the strength of Watson–Crick hydrogen bonds. Since, on average, experimental ΔΔGs for denaturation experiments performed on duplexes are smaller than the anti isomer penalty, we could expect that the sum of the other factors has a stabilizing effect on the majority of the considered duplexes. We notice that Aduri charges for N1 and H61, which are involved in Watson–Crick pairings with the complementary uridine, have partial charge absolute values significantly lower compared to the standard adenine parameters (0.289 48 vs 0.411 50 for H61, – 0.675 968 vs −0.761 50 for N1). This may lead to a weakening of hydrogen bonds which may cause an overestimation of destabilization induced on duplexes, as we observed in Aduri+tors cases (see Figure d). The results of our fitting systematically increase the absolute value of H61 and N1 partial charges, hence resulting in a stronger Watson–Crick pairing. At the same time, the torsional term allows a reproduction of the correct anti isomer penalty. Parameters are coupled so that it is necessary to fit them simultaneously to avoid double counting effects. To the best of our knowledge, this is the first attempt to tune partial charges of a biomolecular force field based on experiments performed on macromolecular complexes. We expect that this approach could be used in the future to improve the capability of biomolecular force fields to match experimental observations by exploiting a part of the functional form that has been traditionally derived in a bottom-up fashion. For future work, we recommend using the fit_A set of parameters (parameters available at https://github.com/bussilab/m6a-charge-fitting), which leads to a lower error on the larger set of available experiments. Remarkably, the parameters derived here for m6A allow a proper description of paired and unpaired m6A in both syn and anti conformations and thus open the way to the use of molecular simulations to quantitatively investigate the effects of N6 methylations on RNA structural dynamics.

48 in total

Molecular Simulations Matching Denaturation Experiments for N⁶-Methyladenosine.

Introduction

Material and Methods

Simulated Systems

Alchemical Simulations

Fitting Procedure

Statistical Significance

Results

Fitting on the Smaller Data Set

Fitting on the Full Data Set

Relative Stability of syn and anti Conformations

Interpretation of Parameters and Dependence on Ionic Strength and Temperature

Discussion

1. Iterative Optimization of Molecular Mechanics Force Fields from NMR Data of Full-Length Proteins.

2. A uniform mechanism correlating dangling-end stabilization and stacking geometry.

3. Experimental parameterization of an energy function for the simulation of unfolded proteins.

4. W-RESP: Well-Restrained Electrostatic Potential-Derived Charges. Revisiting the Charge Derivation Model.

5. Atomistic and Thermodynamic Analysis of N6-Methyladenosine (m⁶A) Recognition by the Reader Domain of YTHDC1.

6. m(6)A RNA methylation promotes XIST-mediated transcriptional repression.

7. Stacking in RNA: NMR of Four Tetramers Benchmark Molecular Dynamics.

8. Additive CHARMM force field for naturally occurring modified ribonucleotides.