Literature DB >> 31230075

Bending of DNA duplexes with mutation motifs.

Michal Růžička1,2, Přemysl Souček1, Petr Kulhánek1,3, Lenka Radová1, Lenka Fajkusová4, Kamila Réblová1.   

Abstract

Mutations can be induced by environmental factors but also arise spontaneously during DNA replication or due to deamination of methylated cytosines at CpG dinucleotides. Sites where mutations occur with higher frequency than would be expected by chance are termed hotspots while sites that contain mutations rarely are termed coldspots. Mutations are permanently scanned and repaired by repair systems. Among them, the mismatch repair targets base pair mismatches, which are discriminated from canonical base pairs by probing altered elasticity of DNA. Using biased molecular dynamics simulations, we investigated the elasticity of coldspots and hotspots motifs detected in human genes associated with inherited disorders, and also of motifs with Czech population hotspots and de novo mutations. Main attention was paid to mutations leading to G/T and A+/C pairs. We observed that hotspots without CpG/CpHpG sequences are less flexible than coldspots, which indicates that flexible sequences are more effectively repaired. In contrary, hotspots with CpG/CpHpG sequences exhibited increased flexibility as coldspots. Their mutability is more likely related to spontaneous deamination of methylated cytosines leading to C > T mutations, which are primarily targeted by base excision repair. We corroborated conclusions based on computer simulations by measuring melting curves of hotspots and coldspots containing G/T mismatch.
© The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  DNA bending; Muts protein; free energy calculations; hotspots–coldspots; mutations

Year:  2019        PMID: 31230075      PMCID: PMC6704406          DOI: 10.1093/dnares/dsz013

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

Mutations in DNA are responsible for various inherited disorders and cancer, but they are also crucial for evolution. Here we focus on elastic properties of DNA sequences with hereditary mutations, which can be obtained from parents (from a mother via oocyte or from a father through sperm) or they arise de novo (during formation of parent’s germ cells or post-zygotically)., Mutations can be induced by environmental factors, but they also arise spontaneously: (i) due to deamination of the methylated cytosine (mC),, (ii) during DNA replication. With a fully functional repair system, the frequency of spontaneous error is ca 1 per 109–1010 base pairs per replication. Mutations are not equally distributed along the DNA sequence., Sites that possess higher mutation frequency than would be expected by chance are considered as hotspots, while sequences with lower frequencies than would be expected by chance are coldspots. The CpG dinucleotide is a well-known hotspot where mC converts to thymine resulting in TpG (or CpA on the other strand) transitions. Hypermutability of CpG reflects spontaneous methylation of cytosine, which is a common post-replicative modification of DNA in both bacteria and eukaryotes. These errors are recognized by thymine DNA glycosylases within the base excision repair (BER) pathway where the glycosylic bond between the thymine base and the ribose is cleaved, which triggers downstream events of the BER., Despite the effective repair, this mutation is very frequent. As a result, CpG sites occur less frequently in the genome, than would be expected by chance., Except for CpG dinucleotide, CpHpG motif (where H stands for A, C or T) was also detected as a hotspot. Another example is GTAAGT sequence, which is mainly associated with insertion/deletion errors. It was observed that a sequence of ±2 nucleotides (nt) around a mismatch affects relative rates of single nucleotide variations and that mismatches are not repaired with the same efficiency, which highlights the importance of both the mismatch type and the neighbouring nucleotide sequence. Mismatches and small insertion/deletion loops are targeted by MutS protein (homologue of eukaryotic MutSα) within mismatch repair (MMR). Two subunits (MSH2 and MSH6) form human MutSα, which moves along DNA and searches for mismatches., Mismatch detection is coupled with conformational changes of the subunits that create a transient clamp around DNA. When the MutSα identifies a mismatch, it provokes ATP binding that induces a sliding clamp with high stability. MutL protein is then recruited, which initiates the repair process. The clamp relaxation is associated with ATP hydrolysis, thus the MutSα switches between open/closed states. Crystallographic structures of MutS/DNA (or MutSα/DNA) complexes revealed that DNA with a mismatch is bent by ca 60°. A model where MutSα slides along DNA and scans for flexible regions corresponding to mismatches has been proposed. The mismatch is further recognized by phenylalanine 432 and glutamine 434 from conserved Phe-X-Glu motif in MutSα. The phenylalanine intercalates into minor groove and stacks onto mismatched base while the glutamine creates a hydrogen bond with it. Molecular dynamics (MDs) is a computational method that can describe the behaviour of molecules at the atomic level on nanosecond (ns) to microsecond (μs) time scale. Although this method utilizes simplified potential energy functions and fixed atomic charges it has provided valuable information about DNA flexibility. Bending of DNA with different mismatches (in the absence of protein) was analysed by Sharma et al. where a biased MD method (umbrella sampling) was used because DNA cannot bend spontaneously to the conformation observed in the complex with MutS protein. Analysis of mismatches in one sequence context revealed different bending free energies which partially correlated with the binding affinities of MutS to the mismatches. The umbrella sampling method was also used in the study of bending of various B-DNA sequences (AT, GC and A-tract sequences) containing G/T, G/A and C/C mismatches and abasic sites., Structural analysis of DNA mismatches in three different sequence environments was also carried using unrestrained MD simulations. The study showed to which extent different mismatches induce alterations of the local structure and which base pair arrangements are most populated. The relation between DNA mutability and flexibility was however not reported so far. In our previous study, we analysed the presence of germline mutations (obtained from the Human Gene Mutation Database—HGMD, http://www.hgmd.cf.ac.uk/ac/index.php) in the 5-nt DNA segments repeatedly occurring in five genes (PAH, LDLR, CFTR, F8 and F9) associated with common inherited disorders. This approach contrasts with the conventional strategy because hotspots are usually derived from a mutation spectrum. The occurrence of mutations in the mutation spectrum, however, often reflects particular population where an effect of a common ancestor plays a role (founder effect), i.e. frequent mutation originate from a single mutation event and is spread throughout the population as observed in our and other studies., We detected statistically significant coldspots and hotspots and determined characteristic sequence patterns for each group, particularly, coldspots contained purine tract while hotspots showed alternating purine-pyrimidine sequence often with the presence of CpG dinucleotide. In the next step, we employed an adaptive biasing force (ABF) MD method to better understand the relation between DNA-bending properties and DNA mutability. In particular, we analysed the global bending properties of two coldspots and two hotspots with a G/T pair and derived the free energy required to bend a straight DNA duplex towards the geometry found in the complex with MutSα protein. We observed that the coldspots are inherently more flexible than the hotspots. In this study, we extended analysed dataset and derived bending free energies for 10 coldspots and ten hotspots with different mismatches. Primary attention was paid to G/T and A+/C mismatches which are results of transitions occurring more frequently than transversions. In addition, G/T and A+/C base pair geometries are near-isosteric to canonical pairs. We also analysed other mismatches (A/A, G/G, A/G, C/T, C/C and T/T), but for the symmetrical G/G, A/A and C/C mismatches, we did not get stable base pairs during bending. Moreover, we carried out a bending analysis of 10 Czech population hotspots and 15 de novo mutations with G/T and A+/C pairs. The computations are complemented by experimental melting curves measured for eight coldspots and eight hotspots with G/T mismatch.

2. Materials and methods

The building of DNA 15-mers with mismatches

The 15-nt long DNA duplexes with central mismatch pair were built as a right-handed B-DNA by the nab module of AMBER 14. Terminal 5-nt segments (underlined) were identical for all systems, only the central 5-nt region varied according to the analysed motif, i.e. forward sequence was 5’GAACC XXXXX CTAGG3’, the reverse sequence was 5’CCTAG XXXXX GGTTC3’, the position of the mismatch is in bold. To maintain consistency during bending analysis when we analysed purine/pyrimidine mismatches, the purine base and the surrounding ±2 nt sequence was always put into the forward sequence. Our tests showed that the placement into the reverse strand has negligible impact on the resulting bending free energy. DNA was then solvated with TIP3P water molecules in a truncated octahedral periodic box and neutralized with sodium ions using the tleap module of AMBER, a new version of DNA force field parmbsc148 was used. Similarly to the study, DNA mismatched base pairs (G/T, A/C, A/G, G/A, C/T, T/C, C/C, T/T and A/A) were built in anti-anti form which is prevailing geometry.Figure 1 shows all analysed mismatch geometries, indicated C1’–C1’ distances for A-T and G = C pairs were taken from the previous study while C1’–C1’ distances of mismatched base pairs are average values calculated from 150 ns production simulations described below. In the anti-anti form, both bases have glycosidic bonds in anti-conformation and mutually interact through Watson-Crick (WC) edge (there are three edges: WC, Hoogsteen (H) and Sugar, through which base can interact with another base). The G/G mismatch was built in anti-syn geometry where the glycosidic bond of one guanine is rotated about 180° and adopts a syn conformation (Fig. 1), which is considered to be a more stable arrangement., In the anti-syn G/G mismatch, first base interacts through WC edge, whereas the second one through H edge. Further, A/C and C/C pairs were built protonated. Protonated forms of adenine at N1 and cytosine at N3 positions were taken from the AMBER library. The protonated pairs represent stable arrangements where each pair consists of two H-bonds (Fig. 1). The previous study has shown that protonated A+/C pair is the relevant state for DNA bending.
Figure 1

Base pair geometries: (A) Canonical WC base pairs A-T and G = C. (B) Mismatches A+/C and G/T used for calculations of bending free energies. (C) The remaining tested mismatches: A/A, C+/C, G/A, G/G, T/C and T/T.

Base pair geometries: (A) Canonical WC base pairs A-T and G = C. (B) Mismatches A+/C and G/T used for calculations of bending free energies. (C) The remaining tested mismatches: A/A, C+/C, G/A, G/G, T/C and T/T.

Equilibration and production dynamics

The systems were equilibrated in three steps. First, a minimization of the system was performed in 3,000 steps, then the system was heated to 300 K during 100 ps at a constant volume and finally, a 500 ps long simulation was run at 300 K and 100 kPa at the NPT conditions. Then we ran production dynamics for 150 ns at the NPT conditions. During both equilibration and production dynamics, terminal base pairs were restrained by distance restraints imposed on hydrogen bonds to avoid a formation of flanking bases that could perturb the DNA structure. The built models contained mismatched bases mutually oriented with WC edges close to the optimal base pair arrangement (visualized in Fig. 1). During the minimization/equilibration procedure the mismatched pair spontaneously adopted the optimal arrangement in the nearest energy minimum (see below where minima are described). Formed mismatches were stable during 150 ns production dynamics. Similarly, we prepared the G/G pair in the anti-syn form using xleap module of AMBER; the first guanine was oriented with WC edge and the second one with H edge.

ABF calculations

We used the ABF method to calculate free energies during DNA bending and employed the Multiple Walkers Approach (MWA) to accelerate the calculations. All the ABF simulations were performed in the modified PMEMD programme from AMBER, connected with PMFLib. The length of each ABF simulation was 200 ns (40 ns for each MWA client). Convergence of these 200 ns simulations was tested in our previous study. The collective variable used in the free energy calculations was the mass-weighted root-mean-square distance (RMSD) to a target structure calculated over all of the heavy atoms from terminal 5-nt long segments and the heavy atoms of the backbone from the central 5-nt long segment corresponding to the AMBER mask notation ((: 1–5, 11–20, 26–30) | (:6–10, 21–25@P, OP1, OP2, O3', O5', C3', C4', C5')) & (!@H=). The target structure used in the calculations was bent X-ray DNA duplex from the complex with MutSα protein (PDB ID 2O8B) (Fig. 2A).
Figure 2

(A) Superimposed bent X-ray DNA structure (from PDB ID 2O8B) (green) and relaxed MD structure (blue) (structures are superimposed over lower part). In the MD structure, a central 5-nt segment which was either coldspot or hotspot is cyan, and G/T mismatch is highlighted in sphere representation. Detail of G/T pair in the X-ray structure (B) and in MD structure (C).

(A) Superimposed bent X-ray DNA structure (from PDB ID 2O8B) (green) and relaxed MD structure (blue) (structures are superimposed over lower part). In the MD structure, a central 5-nt segment which was either coldspot or hotspot is cyan, and G/T mismatch is highlighted in sphere representation. Detail of G/T pair in the X-ray structure (B) and in MD structure (C). Further, we imposed a shear wall restraint to keep the mismatch pair stable inside the DNA duplex because the mismatch tends to disrupt during the DNA bending, which affects the calculated free energy. Note that in the X-ray DNA-MutSα complex the G/T pair is modestly disrupted, it shows a change in base pair parameter shear, buckle and opening (Fig. 2B) and it is shifted towards minor groove. In our computations, we compared global bending of different DNA systems, with stable mismatch base pair inside the duplex. Since various mismatch pairs have different free energy minima for shear, we set up differently the shear restraint ranges to keep the pair stable. The ranges for each mismatch were chosen based on (i) behaviour of the mismatch in the production 150 ns simulations where we determined the distribution of shear values and (ii) extra ABF simulations, where we probed each mismatch via collective variable shear sampled in the range from −5 to 5 Å. Determined shear minima from ABF calculations were in agreement with the distribution of shear values observed in the production simulations (Supplementary Fig. S1). In particular, free energy minima of G/T and A+/C mismatches are −2.25 and −2.58 Å, respectively, so the shear restraint range was set from −10 to 0 Å. Shear restraints for other mismatches are indicated in Supplementary Material. The free energy profiles describing DNA bending were calculated for RMSD in the interval from 1.45 to 5.45 Å. ABF simulations employed a single window approach, which spans the entire range of the sampled collective variable. This window was split into 41 bins, which serve for numerical evaluation of mean force only. The bins are visited by free diffusion once enough samples are collected in individual bins to obtain a good estimate of mean force, which is subtracting from MM potential force to remove any barriers along the collective variable. Five ABF walkers were commenced from five restart files of the production dynamics each taken after 30 ns to establish the independence of walkers. As result, ∼2 million samples taken at each MD step were accumulated in each bin, see Supplementary Figure S2. The time step used in our MD simulations was 2 fs. The snapshots were saved to the trajectory file every 5,000 steps (10 ps). The relaxed DNA is shown as a minimum on the free energy profile (Supplementary Fig. S3), representing a stable thermodynamic state. On the contrary, the bent structure is not thermodynamically stable because of the absence of MutSα in our model. Therefore, we had to select RMSD value which would represent the bend state and serve to compare coldspots and hotspots. To guess a fair estimate, we have analysed the behaviour of the relaxed DNA state observed in the 150 ns production dynamics. RMSD towards an average DNA structure was calculated over the entire production dynamics. In other words, we have reversed the definition of the target structure for RMSD calculation from the bend conformation to the relaxed conformation. Depending on the system, the RMSD using this target fluctuated from 1 to 4 Å, with maximum occurrence at about 1.55 Å for the set of atoms defining used RMSD, see Supplementary Figure S2 in ref.39 This value was then set as a threshold representing a bend structure to deduct bending free energies from the ABF/MWA simulations. To evaluate the significance of calculated bending free energies between coldspots and hotspots we calculated errors directly from the ABF/MWA simulations. The error of the mean force [σ(dG/dRMSD)] is shown in Supplementary Fig. S4 (left). The error is small and approximately constant in the entire range of RMSD indicating no problem with sampling. The integrated value of errors for the resulting free energy is shown in Supplementary Figure S4 (right). Its value at RMSD threshold of 1.55 Å is ∼0.3 kcal/mol (Supplementary Table S1). We also repeated ABF calculations (i.e. we ran new independent simulations with different starting restart files) for C01, C02, H01 and H02 motifs with G/T. Comparison with original calculations can be found in Supplementary Table S2, the found differences indicate errors in the calculated free bending energies, which is in the same range as provided by the previous analysis, i.e. 0.3 kcal/mol. The MD trajectories were processed using the ptraj module of AMBER and visualized using the VMD programme. Base pair parameters were calculated with Curves and 3DNA. Base pair geometries in Figure 1 were drawn using the Nemesis programme. Local bend of base pair steps was calculated as described by Sherer et al. based on tilt and roll parameters obtained with the following formula:

Logistic regression

We performed logistic regression based on calculated bending free energies of coldspots/hotspots to classify other studied groups of motifs (i.e. motifs with CpG or CpHpH sequence, population hotspots and motifs with de novo mutations). The analysis was performed in R. Coldspots were defined as Group 0 and hotspots as Group 1. Obtained P-values close to 1 indicate that particular motif behaves as a hotspot while P-values close to 0 indicate that motif behaves as coldspot. The classification was performed based on bending free energies for G/T pair, which best-distinguished coldspots and hotspots.

DNA melting curve analysis

We measured a DNA duplex stability for 15-nt long DNA oligonucleotides carrying coldspot/hotspot motifs. We designed the same sequences, which were used in our calculations, i.e. the forward sequence was 5’GAACC XXGXX CTAGG3’, the reverse complementary sequence was (5’CCTAG XXTXX GGTTC3’). The central region contained coldspot or hotspot with G/T pair (in bold). The mixture of both primers (1 μM each; final concentration was tuned to maintain equal starting fluorescence signal for all analysed samples) in 1 mM KCl was denatured (90°C) and slowly (0.1°C/s) cooled to 4°C. Hybridized oligonucleotides were kept on the ice, and EvaGreen dye (Biotium) was added to the final concentration 1× (according to the manufacturer’s instructions). The melting analysis was performed in the range 20–90°C (0.2°C/s; fluorescence acquired at each temperature step) on Rotor-Gene 6000 (Qiagen). The melting curves were analysed using Melt curve analysis implemented in RotorGene software as the derivative of the raw data.

3. Results and discussion

Bending of 10 top coldspots and 10 hotspots with G/T and A+/C mismatches

We used two coldspots (C01-AAGAA and C02-CAGTG) and two hotspots (H01-AGGTA and H02-TGGAA) from our previous study (systems with G/T described in our previous study were recalculated with the new version of PMFLib, which was used in free energy calculations of other systems in this study) and added eight new coldspots and eight new hotspots with lowest Fisher combined P-values calculated in our previous study. These P-values were determined based on analysis of mutations in five genes (LDLR, PAH, CFTR, F8 and F9) and indicate significance for each motif to be a coldspot or hotspot (see Tables 2 and 3 in). In the case of hotspots, we excluded motifs with CpG dinucleotide in the middle of the 5-nt segment and also CpHpG motifs because mutability of these sequences can be associated with deamination of mC leading to C > T mutations that are primarily targeted by BER pathway (analysis of motifs with CpG and CpHpG was thus conducted separately, see below).
Table 2

Bending free energies of hotspots with CpG and CpHpG sequence

TypeMotif 5’→3’/5’→3’Fisher combined P-value39Bending free energy for G/T (kcal/mol)Bending free energy for A+/C (kcal/mol) P-value from logistic regression
CpGACGGC/GCCGT5.51E-0512.012.40.08
CpGCCGAG/CTCGG2.11E-0513.111.80.52
CpGTCGCA/TGCGA2.69E-0813.213.10.58
CpHpGCTGTG/CACAG2.97E-0613.011.80.46
CpHpGCTGGG/CCCAG2.24E-0512.811.50.35
CpHpGCAGTA/TACTG2.87E-0412.312.80.14
Average values12.712.2

Bending free energies smaller than 13.0 kcal/mol are in grey fields.

Table 3

Bending free energies of 10 Czech population hotspots with G/T and A+/C

GeneMutation at protein, DNA level, transcript numberNumber of allelesaMotif 5’→3’/5’→3’Bending free energy for G/T (kcal/mol)Bending free energy for A+/C (kcal/mol) P-value from logistic regression
ALOX12b p.Tyr521cys, c.1562A>G, NM_001139.26/12TTACC/GGTAA14.014.10.89
ALOXE3 p.Arg234Term, c.700C>T, NM_021628.26 TTCGA/TCGAA 13.114.40.51
TGM1 p.Arg142His, c.425G>A, NM_000359.21/2 GCGCC/GGCGC 12.513.70.21
CAPN3 p.Arg448His, c.1343G>A, NM_000070.22/4 CCGCA/TGCGG 13.012.40.43
ANO5 p.Arg758Cys, c.2272C>T, NM_213599.21/5 CCCGT/ACGGG 11.812.60.05
SGCA p.Arg77Cys, c.229C>T, NM_000023.34/6 TCCGC/GCGGA 11.711.80.04
LDLR p.Gly592Glu, c.1775G>A, NM_000527.4104/188 CGGGG/CCCCG 12.412.70.17
PAH p.Arg408Trp, c.1222C > T, NM_000277.2560 CTCGG/CCGAG b 13.111.80.52
ATP7B p.Trp779Term, c.2336G>A, NM_000053.314GTGGC/GCCAC13.112.00.49
CLCN1 p.Arg894Term, c.2680C>T, NM_000083.238 CTCGA/TCGAG 13.113.90.52
Average values12.812.9

CpG or CpHpG motifs are in bold and energy below 13 kcal/mol is in the grey field.

Number of alleles indicated in our publications (see above)/number of alleles detected up to date if differ.

This motif is already included among top hotspots with CpG see Table 2.

Bending free energies of top 10 coldspots and top 10 hotspots with G/T and A+/C pairs Bending free energies smaller than 13.0 kcal/mol are in grey fields. Bending free energies of hotspots with CpG and CpHpG sequence Bending free energies smaller than 13.0 kcal/mol are in grey fields. Bending free energies of 10 Czech population hotspots with G/T and A+/C CpG or CpHpG motifs are in bold and energy below 13 kcal/mol is in the grey field. Number of alleles indicated in our publications (see above)/number of alleles detected up to date if differ. This motif is already included among top hotspots with CpG see Table 2. Since coldspots have mostly A-T pair in the middle of the 5-nt segment, whereas hotspots have G = C pair (see sequence logo in Fig. 3 in ref.39), different representation of mismatches exists in coldspots and hotspots. In particular, G/T, A+/C, A/G and C/T are equally represented, but A/A and T/T pairs occur more frequently in coldspots while C+/C and G/G exist more often in hotspots. Base pairs G/T and A+/C can substitute the canonical pairs G = C and A-T without large structural perturbation of the duplex., Their H-bonding scheme and the C1’–C1’ distances are similar to the canonical pairs (Fig. 1). Other non-canonical pairs show larger differences in the C1’–C1’ distances (Fig. 1) and have more variable base pairing. In addition, G/T and A+/C are results of transitions (purine-to-purine or pyrimidine-to-pyrimidine substitutions), which are more frequent than transversions (purine-to-pyrimidine or pyrimidine-to-purine substitutions). Therefore, we primarily focussed on bending analysis of coldspots and hotspots with these mismatches.
Figure 3

Melting curves of oligonucleotide duplexes with the G/T mismatch acquired using RotorGene software, the derivative of the raw fluorescence with respect to temperature (dF/dT) is shown. (A) systems with 0 G = C base pair, (B) systems with 1 G = C base pair and (C) systems with 2 G = C base pairs. Bending free energies of analysed motifs are indicated. (D) The melting curve fitted by exponential function in the range of 52–90°C.

Melting curves of oligonucleotide duplexes with the G/T mismatch acquired using RotorGene software, the derivative of the raw fluorescence with respect to temperature (dF/dT) is shown. (A) systems with 0 G = C base pair, (B) systems with 1 G = C base pair and (C) systems with 2 G = C base pairs. Bending free energies of analysed motifs are indicated. (D) The melting curve fitted by exponential function in the range of 52–90°C. Our calculations showed that coldspots with G/T and A+/C pairs are in average more flexible than hotspots about 1.2 and 0.6 kcal/mol, respectively (see Table 1 and Supplementary Fig. S3 where bending free energy profiles are visualized). Considering the error size of calculated free energies, which is around 0.3 kcal/mol (see Supplementary Fig. S4 and Supplementary Tables S1 and S2), the difference between coldspots and hotspots with G/T is significant while the difference between coldspots and hotspots with A+/C is comparable with the error size. We also found several exceptions, e.g. hotspots H06 and H10 show relatively low bending free energies for G/T while some coldspots exhibit higher bending free energies (e.g. C04, C06 and C10, see Table 1). Based on our hypothesis we assume that coldspots are on average (considering all mismatch types) more flexible than hotspots, but we do not expect that every coldspot with any mismatch will be more flexible than any hotspot. Also, we do not consider here that mutability of DNA could be affected by other factors than by sequence elasticity and that MD methods suffer from various approximations. For instance study by LaRosa and Zacharias showed that local untwisting and bending of the DNA lowers the penalty for flipping of modified base (8oxoG), which is recognized by bacterial MutM protein (homologue of human hOGG1) within BER. Most probably, a similar mechanism plays a role in MMR, where DNA bending can be facilitated by disrupted mismatch base pair. However, in this study, we restrained the central mismatch base pair inside duplex because it opened spontaneously only in some ABF simulations, which prevented comparison of the global bending properties among different systems. Thus, our analysis aimed to detect a potential difference in global elasticity between coldspots and hotspots.
Table 1

Bending free energies of top 10 coldspots and top 10 hotspots with G/T and A+/C pairs

IDMotif 5’→3’/5’→3’Fisher combined P-value39Bending free energy for G/T (kcal/mol)Bending free energy for A+/C (kcal/mol) P-value from logistic regression
Coldspots
C01AAGAA/TTCTT1.27E-0612.412.40.17
C02CAGTG/CACTG9.22E-0512.412.80.17
C03AAAGA/TCTTT9.37E-0911.612.30.03
C04AAAAT/ATTTT1.08E-0813.014.10.46
C05AAAAA/TTTTT1.36E-0811.912.70.06
C06AGAAA/TTTCT1.78E-0813.413.90.68
C07GAAAA/TTTTC4.50E-0812.213.10.12
C08GAAGA/TCTTC6.47E-0711.612.50.03
C09GGAGA/TCTCC1.22E-0613.212.70.59
C10GGAAA/TTTCC3.04E-0613.214.00.55
Average values12.513.1
Hotspots
H01AGGTA/TACCT1.24E-1113.714.00.80
H02TGGAA/TTCCA6.58E-0414.414.50.95
H03AGGTG/CACCT2.44E-0314.213.60.93
H04TGAGT/ACTCT1.37E-0314.214.50.93
H05TGGCT/AGCCA8.02E-0414.414.70.96
H06ACATG/CATGT5.38E-0412.613.20.26
H07GGGCA/TGCCC4.70E-0313.013.30.48
H08TTGTA/TACAA4.79E-0313.712.90.82
H09GGATG/CATCC4.90E-0313.613.30.78
H10GCATG/CATGC7.46E-0312.512.50.22
Average values13.613.7

Bending free energies smaller than 13.0 kcal/mol are in grey fields.

Based on logistic regression, we calculated P-values for hotspots and coldspots (Table 1). It can be seen that P-values of the most hotspots are around 0.8 or 0.9, only H06, H07 and H10 motifs exhibit lower values (Table 1). In the case of coldspots, six motifs exhibit P-values below 0.3, other four motifs: C04, C06, C09 and C10 have P-values between 0.46 and 0.68, so none of them behaves as a typical hotspot.

Bending of 10 top coldspots and 10 hotspots with other mismatches

In the bending analysis of G/A, C/T, A/A, C+/C, T/T and G/G, we encountered a problem with the stability of some mismatched base pairs. We observed disruptions of symmetrical A/A, G/G and C+/C mismatches during bending which lasted for several ns during the ABF simulations. Although shear of the mismatch was restrained (as described in Supplementary Material), the mismatch was perturbed in such extent that its bases stack on top of each other (Supplementary Fig. S5). Our tests revealed that DNA with a disrupted mismatch has lower bending free energy than when the pair is perfectly stable (observed for DNA with G/T pair). This indicates that DNA with perturbed pair can be bent more easily, which correlates with observation in ref.35 where abasic DNA compared with DNA with various mismatches exhibited lowest bending free energy. We were not able to stabilize these mismatches in the ABF simulations. Therefore, we did not derive bending free energies for these systems. Other mismatches C/T, A/G and T/T were stable during bending. In the case of C/T and A/G mismatches, we did not detect a significant difference between bending free energies of coldspots and hotspots, although coldspots were slightly more flexible than hotspots (Supplementary Table S3). In the case of T/T pair we observed the opposite trend, i.e. hotspots were more flexible than coldspots (Supplementary Table S3). However, it should be noted that this mismatch is not equally distributed in coldsots and hotspots, thus we analysed only four hotspots with T/T and eight coldspots with T/T. In the available X-ray complexes of DNA/MutS A/A, G/G and A/C pairs exist in anti-syn conformation,, only G/T pair was detected in anti-anti (Fig. 2). Most probably the initial anti-anti form, which is prevailing in the free form of DNA, is not suitable geometry for DNA complexed with protein. Although C/T, A/G and T/T pairs stayed stable in the ABF simulations, we assume that mismatches less compatible with the canonical pairs require base-pair rearrangement during bending to optimize its geometry. Keeping mismatch in one base pair geometry can create tension which might affect bending free energy. It is also possible that rearrangement of the A/C mismatch during bending (we performed bending analysis with A+/C in anti-anti form, while there is unprotonated anti-syn form in the X-ray structure of DNA/MutS) would increase a difference in bending free energies between coldspots and hotspots. However, description of controlled mismatch rearrangement during bending would require more collective variables, which would be very computationally demanding. In the next sections, we analysed the bending of other DNA motifs but only with G/T and A+/C pairs.

Bending of CpG and CpHpG motifs with G/T and A+/C

In our previous study, motifs with CpG or CpHpG were frequently detected as hotspots; particularly, these motifs represented 14 out of top 20 hotspots (see Table 3 in ref.39). Thus, we separately analysed three hotspots with CpG and three hotspots with CpHpG possessing G/T and A+/C pairs (Table 2). These hotspots have low Fisher combined p-values and belong among top 10 hotspots in our previous study. Calculated bending free energies of these systems are low and correspond to coldspots rather than hotspots, particularly, average bending free energies were 12.7 and 12.2 kcal/mol for G/T and A+/C, respectively. Calculated P-values from the logistic regression were below or around 0.5. This correlates with previous computational studies, where pyrimidinepurine and purinepurine (or pyrimidinepyrimidine) dinucleotides were found to be more flexible than purinepyrimidine.

Bending of DNA with Czech population hotspots

We investigated the bending properties of 10 population hotspots detected in Czech patients. Theoretically, a DNA site can become a population hotspot due to only one mutation event, when the arisen mutation is spread throughout the population. Thus, bending properties of these sequences do not have to exhibit stiffness which we detected in hotspots described in Table 1. Our cooperating genetic laboratory (in Centre of Molecular Biology and Gene Therapy, University Hospital Brno) analyses mainly genes associated with neuromuscular, skin and metabolic diseases. Therefore, we focussed on the most significant Czech population hotspots in these genes detected in our previous studies. In particular, we analysed hotspots within 5-nt segments in: TGM1, ALOXE3, ALOX12b genes associated with congenital ichthyoses, in CAPN3, SGCA and ANO5 genes associated with limb-girdle muscular dystrophies, in LDLR gene associated with familial hypercholesterolaemia, in PAH gene associated with phenylketonuria, in ATP7B gene associated with Wilson disease and in CLCN1 gene associated with Myotonia Congenita (Table 3). As mentioned earlier we analysed only population hotspots leading to transitions. Many analysed population hotspots contain CpG or CpHpG (8 out of 10). It can be seen that five motifs with CpG or CpHpG have low bending free energies (around or below 13 kcal/mol) for both mismatches, only three motifs with CpG (in ALOXE3, TGM1 and CLCN1) exhibited higher bending free energies for one mismatch (Table 3). In the case of motif in ALOX12B, which does not contain CpG or CpHpG sequence, its bending free energies are high for both mismatches (14.0 kcal/mol for G/T and 14.1 kcal/mol for A+/C), the second non-CpG motif (in ATP7B gene); however, has low bending free energies for both mismatches. Thus, we can see a quite wide range of bending free energies, but on average, bending free energies of these systems are low. P-values derived from the logistic regression are similar to CpG/CpHpG motifs, i.e. 9 out of 10 motifs have a P-value below 0.53 (Table 3), which indicates that bending properties of these motifs are closer to coldspots described in Table 1.

3.5. Bending of DNA with de novo mutations

Analysis of coldspots/hotspots with G/T and A+/C showed that mutations arise more often in the rigid sequences. To test this observation, we performed bending analysis of 15 de novo mutations within 5-nt segments on non-CpG/non-CpHpG sites and we selected only mutations leading to G/T and A+/C pairs. Mutations from ADA, IL2RG, SERPING1, STAT3 and WAS were provided by the cooperating department at the Centre for Cardiovascular Surgery and Transplantation, Brno, particularly, mutations in ADA, SERPING1 and STAT3 were entirely novel (not published before), mutations in IL2RG and WAS were recently published by the cooperating department.ADA, IL2RG, STAT3 and WAS genes are associated with immunodeficiencies, whereas SERPING1 is associated with hereditary angioedema. Further, eight novel mutations were taken from the PAH gene. In our previous study, we described six novel missense mutations. Three of them (transitions located on non-CpG sequences: p.Asp229Gly, p.Phe263Ser and Ile406Met) were included among the 15 de novo mutations (Table 4). Other five mutations in PAH were taken from the HGMD database from the beginning of the protein sequence (p.Met1Ile, p.Ser16Pro, p.Ser40Leu, p.Lys42Glu and p.Val45Ala). In the study, we reported two novel mutations in CLCN1 gene leading to transitions: missense mutation c.905A>G, p.(Tyr302Cys) and splicing mutation c.433 + 3A>G, the latter is located on the GTAGG motif, which represents known indel hotspot derived from GTAAGT. Since GTAGG motif is associated with double base pair substitutions (22%), we analysed only the p.(Tyr302Cys) mutation. The last mutation was taken from CYP4F22 gene where we recently reported two missense mutations (c.844C>T, p.R282W and c.1085G>A, p.R362Q). However, because both of them are positioned on the CpG dinucleotide, we selected the most recent reported mutation (c.667C>T, p.Q223Term) in this gene in HGMD.
Table 4

Bending free energies of motifs with 15 de novo mutations calculated for G/T and A+/C mismatches

GeneMutation at protein level, DNA level and transcript numberMotif 5’→3’/5’→3’Bending free energy for G/T (kcal/mol)Bending free energy for A+/C (kcal/mol) P-value from logistic regression
ADA p.Val130Met, c.388G>A, NM_000022TGGTG/CACCA14.214.20.93
IL2RG p.Gly232Glu, c.695G>A, NM_000206.2TGGAA/TTCCAa14.414.50.95
SERPING1 p.Leu349Pro, c.1046T>C, NM_000062.2GCTCT/AGAGC13.213.50.56
STAT3 p.Lys658Glu, c.1972A>G, NM_139276.2ATAAG/CTTAT14.813.30.98
WAS His115Arg, c.344A>G, NM_000377CCACA/TGTGG12.413.10.18
PAH p.Asp229Gly, c.686A>G, NM_000277.2AGACG/CGTCT13.313.40.62
PAH p.Phe263Ser, c.788T>C, NM_000277.2CTTCC/GGAAG13.113.80.51
PAH Ile406Met, c.1218A>G, NM_000277.2ATACC/GGTAT13.813.30.83
PAH p.Met1Ile, c.3G>A, NM_000277.2ATGTC/GACAT14.113.60.91
PAH p.Ser16Pro, c.46T>C, NM_000277.2TCTCT/AGAGA12.912.90.38
PAH p.Ser40Leu, c.119C>T, NM_000277.2CTCAC/GTGAG13.912.70.86
PAH p.Lys42Glu, c.124A>G, NM_000277.2TCAAA/TTTGA14.113.10.92
PAH p.Val45Ala, c.134T>C, NM_000277.2AGTTG/CAACT13.214.20.58
CLCN1 p.Tyr302Cys, c.905A>G, NM_000083.2CTACT/AGTAG13.112.90.51
CYP4F22 p.Gln223Term, c.667C>T, NM_173483.3GCCAA/TTGGC13.612.80.78
Average values13.613.4

Bending free energies smaller than 13.0 kcal/mol are in grey fields.

This motif is already included in Table 1 as H02.

Bending free energies of motifs with 15 de novo mutations calculated for G/T and A+/C mismatches Bending free energies smaller than 13.0 kcal/mol are in grey fields. This motif is already included in Table 1 as H02. Calculations showed that bending free energies of these motifs are somewhat higher with average value 13.6 kcal/mol for G/T and 13.4 kcal/mol for A+/C (Table 4) which is comparable to hotspots in Table 1. Low bending free energies for both mismatches can be seen only for p.His115Arg in WAS gene, p.Ser16Pro in PAH gene and p.Tyr302Cys in CLCN1 gene. Other motifs have bending free energies for at least one mismatch around or above average hotspot value. Logistic regression revealed that except for the three mentioned mutations (p.His115Arg, p.Ser16Pro and p.Tyr302Cys) P-values are higher than 0.52 (seven motifs have P-value even higher than 0.8) (Table 4). Although we do not have large statistical datasets for studied groups, we can see apparent trends in bending free energies calculated for G/T and A+/C pairs. In the case of G/T pair, average bending free energies of coldspots, CpG/CpHpG motifs and motifs with population hotspots are similar and lower compared with the bending free energies of hotspots and motifs with de novo mutations. This trend is less evident for A+/C mismatch where CpG/CpHpG motifs show lowest average bending free energies, and where a smaller difference between coldspots and hotspots exists (this correlates with the fact that bending free energies for G/T were taken as the best classifier in logistic regression).

3.6. Melting experiments for selected coldspots and hotspots with G/T

Previous experimental study showed that A-RNA has a higher melting temperature (Tm) than B-DNA, which correlates with computational studies, where A-RNA was found to be more stable than B-DNA. Moreover, B-DNA exhibited greater deformability of its backbone than A-RNA, and it was more easily deformed in the global parameters roll and tilt, which were used for calculation of global bending. We measured and analysed melting curves of coldspots and hotspots using thermal denaturation experiments to determine if there is a correlation with calculated bending free energies. First eight coldspots (C01–C08) and first eight hotspots (H01–H08) were analysed with the central G/T mismatch. Since C05-AAAAA after introducing G/T pair changes sequence to AAGAA, which is identical to C01-AAGAA, we analysed only C01 motif in this experiment. As expected, positions of peaks and derived Tm were affected by the number of G = C pairs included in the analysed motifs (Supplementary Fig. S6). Therefore, we separated melting curves according to the number of G = C pairs and compared coldspots and hotspots with 0 G = C, 1 G = C and 2 G = C pairs. For system H07 (GGGCA), which contains 3 G = C pairs, we did not have coldspot to compare it. Therefore this system was not considered. In the plots, we observed a trend where derivative melting curves of rigid motifs decrease slower compare to systems with increased flexibility. This can be seen in the range of 50–70°C. Although this trend is not 100% accurate, it is seen for many motifs in 0 G = C, 1 G = C and 2 G = C groups. For instance, in the case of motifs with 0 G = C (Fig. 3A), H08 motif with highest bending free energy (13.7 kcal/mol) decreases slower than other two systems C01 and C04 with bending free energies 12.4 and 13.0 kcal/mol, respectively. Considering systems with 1 G = C pair (Fig. 3B), the melting curves of motifs decrease almost perfectly according to decreasing bending free energy. The order of decrease is following: H02 with highest bending free energy (14.4 kcal/mol), H01 (bending free energy 13.7 kcal/mol), C06 (bending free energy 13.2 kcal/mol), C03 (bending free energy 11.6 kcal/mol) and C07 (bending free energy 12.2 kcal/mol). For motifs with 2 G = C pairs (Fig. 3C), we also observed the trend except for H03, which decreased slowly as other coldspots although its bending is 14.2 kcal/mol. Further, to compare melting curves quantitatively, each one was fitted by the exponential function in the range of 52–90°C (Fig. 3D). Calculated coefficients b, n and u were derived. The coefficient b represents intercept-like coefficient, and its value is around 0 for all systems while coefficients n and u describe the curvature of the melting curves. Relative comparison of n and u within the groups (0 G = C, 1 G = C and 2 G = C) shows that systems with slow decrease have smaller n but higher u (Supplementary Table S4).

3.7. Analysis of local bend, twist and three translational base pair step parameters for 10 coldspots and 10 hotspots with G/T in unbiased 150 ns simulations

Analysis of local bend revealed no significant differences in the average local bend on central Steps 7 and 8 (containing mismatch base pair) between coldspots and hotspots. However, local bend values of the central Step 7 correlated with the calculated bending free energies. In particular, coldspots C04, C06, C09 and C10 with higher bending free energies (13.0, 13.4, 13.2 and 13.2 kcal/mol) showed lower local bend on this step (8.2, 7.9, 7.8 and 7.7°, respectively) than other coldspots (Supplementary Table S5). Similarly, hotspots H06 and H10 with low bending free energies (12.6 and 12.5 kcal/mol) exhibited higher local bend on this step (10 and 10.8°, respectively) than other hotspots which are stiffer (Supplementary Table S6). Exception is H08, of which bending energy is 13.7 kcal/mol and has also larger local bend (10.7°). In the case of twist, central Steps 7 and 8 in coldspots with average values 25.8 and 41.1° deviate more from standard twist value (ca. 36°) than corresponding steps in hotspots where central Steps 7 and 8 exhibit values 28.9° and 37.8° (Supplementary Tables S7 and S8). Analysis of three translational base pair step parameters (shift, slide and rise) revealed no significant differences between coldspots and hotspots and no correlation with the bending free energies (Supplementary Tables S9–S14). It should be noted that biased ABF simulations exhibit two different modes of bending, see graphs of mean force in Figure S4 left. In particular, there are two linear regions, each representing a harmonic mode of bending with different force constants. One is observed in the range from 5.45 to ca 2.5 Å and the second more profound is in the range from 2.5 to 1.5 Å. The bending in the second region has major impact on discrimination between coldspots and hotspots. Basically, the unbiased simulations can reveal information only about the first mode. The second mode, which seems to be more important, is inaccessible in unbiased simulations and it can be expected that important features cannot be deduced from analysis of such MD trajectories.

4. Conclusion

We analysed the bending properties of damaged DNA in different sequence contexts employing biased MDs simulations. We tested sequences, which are frequently associated with germline mutations (hotspots) and rarely associated with germline mutations (coldspots). Main attention was paid to motifs with G/T, and A+/C mismatches as these pairs are the result of transitions and have base pair geometry well defined in the duplex structure. Despite the limitation of the employed model, our results revealed the relation between mutability of DNA sequences and their bending properties, particularly, coldspots were found to be more flexible than hotspots (Table 1). This is in agreement with the suggested model of the MMR process, in which more flexible segments are more effectively repaired. Our attempt to analyse other mismatch types including A/A, G/G and C+/C failed because biased simulations were not stable and thus derived free energy profiles were affected by other than bending movements. In the case of C/T, A/G and T/T mismatches, the trend in bending free energies between coldspots and hotspots was not convincing. Coldspots with C/T and A/G were found to be slightly more flexible than hotspots, while the opposite trend was detected for T/T mismatch (Supplementary Table S3). This might be solved by employing more collective variables, which would also describe mismatch rearrangement during bending. However, it can be expected that such simulations would be very computationally demanding. Apart from DNA elasticity, detection of mismatches and subsequent repair process in the cell is most likely influenced by other factors, which we did not consider in our model. Thus, we cannot expect a perfect correlation between DNA mutability and elasticity. Moreover, mismatch recognition is fine-tuned by interaction with amino acids phenylalanine 432 and glutamine 434 from MutS protein. Our model explaining the emergence of mutations in DNA is therefore simplistic, but it suggests that DNA elasticity plays an important role. Bending analysis of motifs with CpG and CpHpG revealed that bending free energies of these systems are low (Table 2), which indicates that their mutability is more likely related to spontaneous deamination of mC leading to C > T mutations, which are targeted by BER pathway. Similarly, the bending free energies of the Czech population hotspots were lower (Table 3). These motifs often contained CpG and CpHpG sequences. In contrast, motifs with de novo mutations (intentionally selected non-CpG and non-CpHpG sites) exhibited higher bending free energies (Table 4) that were comparable with bending free energies of hotspots (Table 1). This observation supports our hypothesis, which suggests that mutations appear more likely on the rigid DNA sites. Our computational results are further complemented by thermal denaturation experiments where we measured the stability of selected coldspots and hotspots with the G/T mismatch. Obtained results showed that melting curves of hotspots decreased slower than melting curves of coldspots in the range of 50–70°C, so the hotspots exhibit higher stability that could be associated with increased rigidity detected in our calculations. Click here for additional data file.
  2 in total

1.  Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection.

Authors:  Alexander Thomas Ho; Laurence Daniel Hurst
Journal:  PLoS Biol       Date:  2022-05-12       Impact factor: 9.593

2.  Importance of base-pair opening for mismatch recognition.

Authors:  Tomáš Bouchal; Ivo Durník; Viktor Illík; Kamila Réblová; Petr Kulhánek
Journal:  Nucleic Acids Res       Date:  2020-11-18       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.