Literature DB >> 35404933

Pairing statistics and melting of random DNA oligomers: Finding your partner in superdiverse environments.

Simone Di Leo¹, Stefano Marni¹, Carlos A Plata², Tommaso P Fraccia³, Gregory P Smith⁴, Amos Maritan⁵, Samir Suweis⁵, Tommaso Bellini¹.

Abstract

Understanding of the pairing statistics in solutions populated by a large number of distinct solute species with mutual interactions is a challenging topic, relevant in modeling the complexity of real biological systems. Here we describe, both experimentally and theoretically, the formation of duplexes in a solution of random-sequence DNA (rsDNA) oligomers of length L = 8, 12, 20 nucleotides. rsDNA solutions are formed by 4L distinct molecular species, leading to a variety of pairing motifs that depend on sequence complementarity and range from strongly bound, fully paired defectless helices to weakly interacting mismatched duplexes. Experiments and theory coherently combine revealing a hybridization statistics characterized by a prevalence of partially defected duplexes, with a distribution of type and number of pairing errors that depends on temperature. We find that despite the enormous multitude of inter-strand interactions, defectless duplexes are formed, involving a fraction up to 15% of the rsDNA chains at the lowest temperatures. Experiments and theory are limited here to equilibrium conditions.

Entities: Chemical

Mesh：

Substances：
Solutions
DNA

Year: 2022 PMID： 35404933 PMCID： PMC9022813 DOI： 10.1371/journal.pcbi.1010051

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

One of the defining features of biomolecules is the specificity and selectivity of their mutual interactions. Selectivity is at the heart of virtually all biological processes, including cell signalling, immune response, genetic transmission and regulation of gene expression. These processes are based on the presence of partner molecules that succeed in finding and docking to each other after negotiating their way through a huge number of collisions and interactions with other molecules, some of which exert attraction [1]. Indeed, in all actual cases, specific biomolecular interactions take place in “superdiverse” environments, i.e. in contexts of enormous variety of molecular species. Succeeding in pairing to the target is thus depending not only on the binding strength between partner molecules, but on the whole network of pair interactions between concurring molecular species, possible cooperativity, concentration and degeneracy. While the complexity of interactions in the contexts of biomolecular crowding is universally acknowledged [2, 3], the thermodynamics and statistical physics of large pools of distinct interacting molecules have been discussed only in the frame of phase transitions [4, 5], and molecular models of such systems have not been presented yet. DNA oligonucleotides mixtures are one of the pre-eminent systems to experimentally recreate the conditions described above in a controlled way, given the natural selectivity of base pairing [6] and the possibility to artificially synthesize large ensembles of distinct DNA sequences with controlled distributions [7]. Herein, we consider systems formed by aqueous solutions of DNA oligonucleotides of length L in which the four bases (Cytosine, C; Guanine, G; Adenine, A; Thymine, T) are present with equal probability in each position in the sequence as sketched in Fig 1a. Thus, in each of these random-sequence DNA oligomers (rsDNA) solutions, 4 distinct molecules can be found with approximately the same probability. We consider solutions of rsDNA oligomers with L = 8, 12 and 20 (8N, 12N and 20N), corresponding to mixtures of ≈ 7 ⋅ 104, 2 ⋅ 107 and 1012 distinct sequences, respectively; in these systems we study interaction selectivity, defect distribution and equilibration properties as a function of the temperature.

Fig 1

Description of the system.

Description of the system.

(a): Solutions of random-sequence DNA (rsDNA) oligomers of length L are mixtures made of 4 distinct molecules, obtained by all the combinations of the four nucleobases, which are present at any position in the sequence with equal probability. (b): Each rsDNA oligomer can interact with 4 different rsDNA oligomers, leading to a 4 × 4 interaction matrix. Each dot in the matrix represents the most energetically favorable pairing between the two selected rsDNA oligomers, among all the possible mutual shifts. (c) Each position in the interaction matrix corresponds to a specific duplex motif, characterized by pairing errors which are here described by the parameters in . The most probable duplex in the matrix is highly defected, as the last example in the panel. Because of its relevance, the hybridization of nucleic acids has been a topic of continuous investigation since their identification, which led to well-established tools and models to calculate free energy, melting temperature and secondary structure directly from the DNA (or RNA) sequences involved [8-11]. In particular, the thermodynamics of duplex formation is commonly evaluated in the frame of the so-called “Nearest Neighbor” (NN) model, in which the hybridization free energy is obtained by the summation of elemental contributions. These are extracted from database of melting temperatures of DNA oligomers and effectively take into account both WC and non-canonical pairings [8]. However, available models are accurate only in the case of pools of low numbers of distinct sequences [12], and do not have the capability of predicting the behavior of complex systems such as the rsDNA solutions considered here. In a rsDNA solution, upon colliding, pairs of rsDNA molecules bind to each other with a strength and resultant stability that is mainly determined by the level of complementarity of the base sequences according to the Watson-Crick (WC) pairing rules. This results in an interaction matrix, sketched in Fig 1b, where each position represents the most stable duplex conformation for a given pair. The most probable outcome of a random collision between two rsDNA molecules is a quite unstable pair, with a small number of consequent paired bases (3 in case of L = 12, as in the bottom sketch of Fig 1c). However, even if less probable, more stable structures are formed, ranging from full complementarity, a condition that yields defectless helical duplexes (Fig 1c, top sketch), to duplexes with pairing mismatches of various types, listed in Fig 1c. To each duplex, with any pattern of defects, corresponds a binding free energy that can be computed from the sequences involved with standard tools. At the opposite end of the spectrum, there are rsDNA strands with no complementarity at all, as between an oligo made of only Ts and one made of only Cs. In rsDNA at fixed L, the variety of binding possibilities increases with the number of allowed mismatches, while the binding energy decreases. As shown here, these two factors nearly compensate, leading to non-trivial competition between interaction strength and degeneracy. In a previous study of rsDNA [13], it was observed that, when L > 12, rsDNA solutions self-organize into liquid crystal phases. Given the mechanism by which these phases are formed in solutions of DNA oligomers [14, 15], this finding suggests that hybridization of rsDNA leads to duplexes with fairly well-paired terminals. This feature of rsDNA solutions remained speculative, with no experimental or statistical support. Besides its value as a platform for exploring hybridization in crowded nucleic acid environments and for describing the network of interactions in superdiverse mixtures, the study of rsDNA is also relevant to the evaluation of scenarios for the origin of life. Indeed, if the RNA world hypothesis is correct, such a state had to be anticipated by a condition in which RNA oligomers were abiotically synthesized with a large degree of randomness, from which ribozyme sequences could have been subsequently selected [16, 17]. Whether such molecular mixtures could form WC pairs or whether hybridization was instead prevented by the large variety of species is an information that can shape the RNA world model itself, clarifying the role of complementarity and duplex formation in the prebiotic environment. [18]. In this paper, we study rsDNA solutions by a combination of three complementary strategies: (i) measurement of the overall degree of hybridization by UV absorbance as a function of temperature (T); (ii) measurements of the degree of hybridization and thermal stability of pairs of mutually complementary sequences mixed with rsDNA by fluorescence Contact-Quenching (CQ); (iii) development of a theoretical framework, based on a re-parametrization of the NN thermodynamic parameters, enabling quantitative predictions on rsDNA hybridization and pairing error statistics.

Materials and methods

Description of the system

8N, 12N and 20N rsDNA oligomers were synthesized on solid phase by an Äkta Oligopilot. The products were purified via dialysis against a 25mM NaCl solution and lyophilized. The samples were characterized by HPLC (see S2 Text). An analogous previous synthesis was characterized by MALDI-TOF mass spectroscopy [13]. Stock aqueous solutions were prepared at c ≈ 50g/l, from which the final samples concentrations were obtained by dilution: ranging from c = 0.04g/l to c = 25g/l and with ionic strengths of c = 0.15M, c = 0.45M and c = 1.0M. The statistics of the pairing quality have been explored by mixing rsDNA with a tagged pair of complementary DNA strands. Specifically, we have used the 8-base-long couple 8A* and 8B* and the 12-base-long couple 12A* and 12B*, modified by 5’-Texas Red (A* oligomers) and 3’-6-FAM (Fluorescein) moieties (B* oligomers), respectively, so that upon hybridization the two fluorophores come in contact (see Fig G in S2 Text). Specific sequences are as follows. 8A*: TexasRed-5’-ACAGTCCT-3’. 8B*: 5’-AGGACTGT-3’-FAM. 12A*: TexasRed-5’-ACGACAGTCCTG-3’. 12B*: 5’-CAGGACTGTCGT-3’-FAM. 8A*, 8B*, 12A* and 12B* were purchased from IDT.

Using UV hyperchromicity to detect ensemble rsDNA melting

The overall degree of hybridization in rsDNA was evaluated by measuring the absorbance A at the wavelength λ = 258nm. A is obtained by averaging over an interval Δλ = 3nm. Experiments were performed with the Evolution 300 UV-Vis spectrophotometer from Thermo Scientific customized with a Quantum Northwest peltier hot/cold stage with hold temperature accuracy of ±0.05°C. Experiments have been performed with 1 C/min heating and cooling rate. The cell holder was capable of hosting two different types of cell: standard quartz cuvettes with optical path length ℓ = 1cm, adequate to investigate DNA solutions with c ≈ 0.02 − 0.04g/l; microfluidic cells with ℓ = 10μm from Starna Scientific Ltd to investigate low volumes (< 50μl) of more concentrated DNA solutions (c ≈ 20 − 35g/l) (see S2 Text). Absorbance data as function of temperature, A(T), were treated according to standard protocols [19] (see S2 Text) to extract the “ensemble” melting curve θ(T), i.e. the fraction of rsDNA oligomers forming duplexes (of any quality) at temperature T. The melting temperature T is defined by θ(T) = 1/2.

Contact-quenching detects the pairing of specific sequences

Fluorescence-based measurements were used to detect how frequently a specific sequence is able to find its exact complementary strand in the midst of the rsDNA solution. This was done by mixing 8A* and 8B*, in equal amount in a 8N solution, and similarly with 12A* and 12B* in 12N. Although the pair of fluorophores Texas Red and FAM were originally chosen to obtain FRET signal, we found that the dominant effect signaling their interaction is the so called “Contact Quenching” (CQ), i.e. the drop in fluorescence quantum yield of both fluorophores when the two fluorophores are in close proximity [20] (see S2 Text). In the case considered here, the quenching is deep (about 80% reduction for Texas Red and 50% for FAM) and can be easily exploited to extract the fraction θ of A* oligomers that forms a defectless duplex with its complementary partner B*. To extract θ(T) we monitored the quenching of the Texas Red emission, which is known to have a small T dependence [21]. Fluorescence emission vs. T was measured using the Applied Biosystems QuantStudio 5, a Real-Time PCR Instrument by Thermo Fisher Scientific. Calibration and normalization procedures to extract θ from raw data are provided in S2 Text. They also include a thermodynamic characterization of the 8A*-8B* and 12A*-12B* duplexes, since their binding free energy ΔG is slightly modified with respect to their untagged analogs because of the stabilizing effect of the fluorophores at the terminals. CQ experiments were performed by mixing fixed concentrations of both A* and B*, c = 100nM, with rsDNA solutions prepared at 0.04g/l < c < 25g/l, and ionic strengths c = 0.15M and c = 1.0M. Measurements were performed in 15mM TRIS HCl at a pH of 7.4, to minimize fluorescence drift due to pH sensitivity of FAM [21]. In rsDNA solutions, each specific sequence is present at a concentration c/4. The stoichiometric ratio of each added fluorescent sequence (A* or B*) and the same non-fluorescent sequence already present in the solution of rsDNA is: When ϕ = 1, i.e. the amount of fluorescently labeled sequence equals that of the same sequence without tag, θ offer an approximate evaluation of the degree in which errorless pairing is present within rsDNA solutions. At the same time, the measurement of the ϕ dependence of θ(ϕ) enables a detailed comparison with theoretical predictions.

Experimental results

Ensemble melting of rsDNA

Hyperchromicity in ultraviolet enables accessing the overall degree of hybridization in rsDNA solutions. Fig 2, green symbols, shows θ(T) for 12N, c = 0.04g/l, c = 1M, which we compare, as reference, with the melting curves of binary mixtures of complementary DNA 12mers computed, with standard approaches, at two concentrations: c = 0.02g/l (blue dashed line) and c = 0.04/412 g/l (red dashed line), both at c = 1M. As visible, θ exhibits a behavior intermediate between the two. rsDNA duplexes are way more unstable (of ≈ 30°C) than duplexes of complementary strands at equal total concentration, a clear manifestation of the selectivity of rsDNA pairing. At the same time, rsDNA duplexes appears about 5°C more thermally stable than the 12mer complementary duplexes when solubilized at the same concentration at which they are present in the rsDNA solution, an indication that the formation of defected pairing is a relevant feature of the hybridization of rsDNA, as also suggested by the milder slope of θ(T).

Fig 2

Double strand vs. random sequence DNA melting.

Double strand vs. random sequence DNA melting.

Ensemble melting curve as a function of temperature. Green dots: measured ensemble melting of 12N at c = 0.04 g/l. Shading marks experimental uncertainty resulting from the average over 8 experimental replicas. Dashed lines: theoretical melting predicted for equimolar solutions of two complementary 12mers at c = 0.02 g/l each (dashed blue line) and c = 0.04/412 ≈ 2.4 10−9 g/l each (dashed red line). 12N θ(T) exhibits a behavior intermediate between the two. Dashed lines are obtained by averaging many melting curves of complementary 12mers. c = 1M in all curves. In Fig 3a, θ(T) measured for 12N and 20N at c ≈ 0.04g/l are shown for three different salt concentrations c. Since T decreases with L but increases with c, to obtain reliable θ(T) for 8N, we performed melting experiments at a larger concentration, c ≈ 25g/l, shown in Fig 3b. We find in all conditions θ(T) to depend on T more mildly than in typical melting curves in binary solutions of complementary strands. Fig 3 also shows that T of rsDNA grows with c, with L and with c, as it appears by comparing panels (a) and (b), in agreement with DNA melting in less complex systems [8, 22].

Fig 3

Ensemble melting of rsDNA.

Ensemble melting of rsDNA.

Measured ensemble melting curves of rsDNA at c = 0.04 g/L for 12N (panel a, open circles), 20N (panel a, full diamonds) and 8N at c = 25 g/L (panel b, open squares), at various salt concentrations: c = 0.15M (blue), c = 0.45M (green) and c = 1M (red). Shading marks experimental uncertainty resulting from the average over 6–8 experimental replicas for 8N and 12N, whereas, for 20N, just one experiment is shown as described in the text. Dashed lines, with same color code, are the theoretical predictions of Eq (7). Experimental data shown here were taken at equilibrium. Attaining this condition is not trivial, since the lifetime of DNA duplexes dramatically depends on the length of the oligomers and on the concentration of the solutions [23, 24]. To approach equilibrium of 20N in dilute conditions, we considered only θ(T) measured upon heating after a long equilibration time at low T. Similar attention had to be given to the behavior of concentrated 12N solutions, as mentioned in the next Section. Further information on equilibrium conditions is provided in the S1 Text. The non-equilibrium behavior of rsDNA will be the topic of a future work.

Probability of defectless duplexes

The study of θ(T) hints at hybridization in rsDNA as a combination of selectivity with a certain degree of defects in the duplex formation. However, θ(T) does not offer much insight on how probable it is to find duplexes with a certain pairing quality. Aiming at this kind of information, CQ experiments in solutions of 8N and 12N were performed. We did not perform analogous measurement for the 20N both because of the artifacts due to non-equilibrium pairing and because of the small accessible range of ϕ (420 is a large number!). Fig 4 shows the fraction of hybridized 8A* and 8B* in 8N, θ(T), for various ϕ. In the case of 8N we could reach ϕ = 1, corresponding to c = 16g/l. At ϕ = 1 (red dots) the fraction of paired A*B* reaches about 30% at low T. As ϕ increases, the fraction of A*B* duplexes increases, as expected. The dependence of θ(T) on ϕ is a useful tool to test our theoretical model, as discussed below.

Fig 4

Probability of defectless duplexes in rsDNA.

θ Fraction of paired 8A*8B* in 8N measured with CQ experiments, at c = 0.15M. Colors correspond to different values of the stoichiometric ratio ϕ between the probes A* / B* and rsDNA strands. Black data is the melting curve of a neat A* B* solution (without rsDNA). In all experiments c = 100nM. Each data point is obtained as an average over 5–10 replications of the experiment. The corresponding standard deviation is reported as shaded regions.

Probability of defectless duplexes in rsDNA.

Theoretical framework

Although the combination of the measured θ(T) and θ(T) offers important insight on the quality of pairing within the rsDNA system, a deeper understanding of the driving mechanisms in this superdiverse environment requires a statistical model able to take into account the balance between binding energy and degeneracy. Indeed, the strongest binding energy is achieved in defectless duplexes, which are only formed with a fraction 1/4 of the total number of sequences. On the contrary, defected duplexes are more weakly bound, but they can be assembled with a larger variety of sequences, i.e. the weaker the binding energy, generally, the larger its degeneracy. The hybridization in simple systems formed by a limited number of sequences is well described by the current thermodynamic approaches, such as the NN model. Despite their accuracy, there is yet no theoretical frame to apply this knowledge to systems with high complexity such as the rsDNA, where more than 4 × 4 interactions are involved. Explicitly computing the free energy for all pairs involved would be too computationally expensive. We thus develop a mean-field-type theoretical approach that uses averages of the thermodynamic parameters of the NN model and their re-parametrization on a simple counting of defects. With this approach we can compute θ(T) and θ(T) at equilibrium with no free parameters. Hence, a direct comparison between theory and experiments is achieved. We consider a rsDNA mixture containing N chains in a volume V (with a total concentration c = N/V). We assume the mixture to be perfectly balanced (see S2 Text), that is, each sequence is present through N/4 copies. We also assume on-off hybridization, with no intermediate state between unbound and paired, which is justified given the limited length (L ≤ 20) of the oligomers here considered [11]. Thus, a given couple (i and j) interact with a set of 2L − 1 distinct binding free energies , depending on their mutual alignment: we use the shift parameter α with −(L − 1) ≤ α ≤ L − 1 to express such alignment. Specifically, α = 0 stands for the condition of perfect alignment, i.e. the 3’ terminal base of strand i is aligned with the 5’ terminal base of strand j and vice versa. Positive or negative α indicate an overhang on the 3’ or 5’ terminal, respectively. We define the Boltzmann factor where the brackets around the concentration denotes that it is measured in mol/L ([c] = c/(mol/L)), and β = (kT)−1 with k being the Boltzmann constant. In real mixtures, some duplexes are very unlikely, with large ΔG and thus small ζ. In this setting, we can formally define, using the canonical distribution, the probability of any given hybridization state and the related partition function, which contains all the relevant statistical information of the system (see S3 Text). However, the partition function of the rsDNA system in the thermodynamic limit (N → ∞) is not in a closed form. Consequently, an exact derivation of the fraction of paired oligomers θ in such a limit is not simple to obtain. This is thoroughly discussed in the S3 Text, where two opposite limits (high and low temperatures) are carried out allowing the derivation of analytical expressions to bypass this obstacle. Furthermore, an Unification Ansatz that matches the two approximations in their range of validity has been worked out. This theoretical approach allows us to compute , the approximated melting curve for the oligomers with sequence i in the midst of all the 4 species in the rsDNA solution (see S3 Text for details on the derivation). This analytical expression has the same structure of the melting curve as a solution of self-complementary sequences (Eq 18a in Ref. [25]), with two relevant differences: the factor 1/4 normalizing the concentration (the concentration of any specific sequence is c/4), and the double summation of the pairing weight of i: for all possible partner sequence j, and for all the possible shifts. Since the explicit computation of all (2L − 1) × 4 × 4 binding energies is prohibitive, and since our aim is to provide a statistical insight on the qualities of the double helices, we introduce a parametrization of the duplex quality, defining the pairing errors vector which describes the number and type of pairing errors in the rsDNA duplexes. Besides the shift parameter α, includes the number of consecutive base mismatches at the two duplex terminals (α, α) and the number α of mismatches inside the duplex, as sketched in Fig 1c. This parametrization is useful to compute the degeneracy g(L, ), i.e., the number of sequences of length L that can form a duplex characterized by with a given reference sequence: Let us note that the exponential factors with base 3 and 4 indicate the number of different nucleobases that can occupy a site in the dangling end of the shift or lead to an internal or external mismatch, respectively, whereas the binomial coefficient accounts for all possible dispositions of the α mismatches in the internal region of the duplex. The same set of parameters forming is used to evaluate the free energy of the pairings. The hybridization free energy is commonly computed on the base of the “Nearest Neighbor” approach [26], by which the binding enthalpy and entropy are obtained as a summation of quartets of neighboring nucleobases, their values depending on the specific bases that form it [8]. We propose a parametrization of the NN thermodynamics obtained by averaging over the quartets that allows us to compute , an approximated value of based, not on detailed knowledge of the involved sequences but, just on and the fraction f of C or G bases in the sequence i. The reader can find in see S3 Text the computation of the average energetic parameters from the literature ones [8] and the salt correction [22]. An estimation of the error introduced in this simplification is shown in Fig 5, in which we show the melting curve of a solution of two 8mers with perfect complementarity. Therein, the melting curve computed using with f = 0.5 and = 0 (green line) is compared to the family of melting curves, computed with the traditional NN protocol, corresponding to a set of specific sequences with f = 0.5 (dashed pink line and shadow). Our energetic description approximates the average melting curve with standard SantaLucia protocol with low discrepancy ΔT < 1°C.

Fig 5

Theoretical predictions of melting curves, computed for DNA 8mers at 25g/L and 1M NaCl.

Theoretical predictions of melting curves, computed for DNA 8mers at 25g/L and 1M NaCl.

The green solid line stands for the melting curve predicted for a pair of complementary strands when using the energy obtained from our parametrization, with = 0 and f = 0.5. The purple dashed-line and related shading are, respectively, the mean value and the standard deviation computed using the NN model over a set of 40 melting curves of distinct DNA sequences with fixed f = 0.5. The dotted lines are the melting curve of 8N with different values of f (Eq (6)), as specified in the colorbar. The solid dark red line is the ensemble melting curve θ, obtained by the average of over the possible values of f (as given by Eq (7)). By inserting these parametrizations in Eq (3), the melting curve can be generalized to the melting curve of all the sequences in the rsDNA with a certain CG content: where the product expresses the statistical weight of the pairings of a sequence with CG presence of f with all the sequences leading to the formation of a duplex with quality . Note that ∑ refers to summation over all the possible pairs yielding . Fig 5 shows the set of with f ranging from 0 to 1 in the case of 8N. The ensemble melting is obtained by the average of over the possible values of f, weighted with the probability p(f) of having a sequence with f in the rsDNA: The summation is over all possible fractions, that is, the product f ⋅ L sweeps the integer numbers from 0 to L. Fig 5 shows θ (red line) for 8N. Two features are clearly noticeable: the ensemble T predicted for rsDNA is much lower than those of complementary strands at the same concentration and ionic strength (dashed line), in agreement with experimental observations; the T dependence of θ is milder, reflecting the variety of energies involved in the various duplexes that can be formed in rsDNA solutions. The choice of splitting the statistical summations in two steps enlightens the relevance of both averaging the free energy for fixed f and of averaging the melting curves over f. While the latter is apparent upon inspecting Fig 5, further insight on the former can be obtained by Fig A in S3 Text, where the melting curve inclusive of all summations forf = 0.5 is compared with simplified approaches, which are found to differ both in T and in the shape of the melting curve.

Probability of defectless and defected duplexes

Neat rsDNA solutions

By using the parametrization introduced above, it is possible to evaluate the fraction of strands involved in duplexes having any specific type of pairing , for sequences with f. This is done by weighting the total fraction of paired oligomers with the statistical weight of the specific defect class, i.e., The fraction θ of duplexes having a given in the whole rsDNA solution is the averaged , weighted with the probability p(f), Having access to θ, we can rank the defects based on their probability. This is shown in Fig 6a where we plot the six most frequent errors in 12N as a function of T. Perfect (defectless) duplexes are not dominant, but also not negligible, exceeding 10% at the lowest T considered. The most frequent form of error is at the terminals as a result of their reduced energy cost compared to internal mismatches.

Fig 6

Pairing statistics in rsDNA.

Pairing statistics in rsDNA.

Theoretical predictions of the fraction of perfect and defected rsDNA duplexes (as given by Eq (9)), parametrized by . (a): the six most probable duplex motifs in 12N. (b): fraction of duplexes with a total number of unpaired bases ||, in 8N (dotted lines), 12N (continuous lines), 20N (dashed lines). c = 1M and c = 25g/l. By suitable summation of θ, it is possible to determine the fraction of duplexes having a total of defects || ≡ |α|+ α + α+ α. The resulting θ| are shown in Fig 6b. We find the fraction of duplexes with || = 1 to be dominant, while the fraction of duplexes with more than 2 errors becomes negligible at low T. It is also interesting to notice that θ| computed for the three considered values of L converges to the same value at low T. This phenomenon can be understood as a consequence of mainly two reasons. First, g(L, ) does not depend on L when only dangling ends and external mismatches, the dominant form of pairing errors, are present (see Eq (5)). Second, internal mismatches are instead nearly negligible at low temperatures in this range of L (Fig 6a). An analytic derivation of this low T limit is reported in the S3 Text.

Modeling the contact quenching experiments

Fig 6 shows the fraction of duplexes with a certain quality among all the possible kinds of duplexes forming the rsDNA mixture. However, in CQ experiments, the rsDNA solution is enriched by the presence of the labelled strands A* and B* at a concentration expressed by the ratio ϕ. The fraction of A*B* pairs at a given ϕ can be predicted through an adequate extension of the model, where ζ = [c] exp(−βΔG) is the Boltzmann weight of the A*B* couple, f is the fraction of C or G bases in the A*B* sequences and is the fraction of paired A* or B*, either to each other or to any other oligomer of the rsDNA solution, that can be computed generalizing Eq (6) for , Further details on the pairing energies are given in the S2 Text. It can be easily seen that, in the limit of ϕ → ∞, correctly converges to the melting curve of a couple of complementary strands in equal concentration.

Discussion

The model we developed here provides theoretical predictions directly comparable to experimental observations for both the ensemble melting and the formation of defectless duplexes in rsDNA solutions. In Fig 3, the theoretically computed θ(T) (dashed lines) are compared to the measured θ(T) (symbols). The agreement is very good, especially if we take into account that the model has no free parameters. Fig 7 compares the experimental melting temperatures, T, to its theoretical prediction as a function of the parameters here considered: L, c, c. The average deviation between predicted and observed T is within 1°C, or within 0.5°C if we exclude the data point of 12N at c = 0.15M, in which T is the lowest and thus more difficult to determine because of the narrow T interval available to define the low-T baseline (see blue open circles in Fig 3a). The difference between computed and measured T is comparable with the typical errors in predicting T for any given specific sequence with standard thermodynamic approach [22].

Fig 7

Melting temperatures for rsDNA: Theory vs experiment.

Melting temperatures for rsDNA: Theory vs experiment.

Comparison of measured and predicted T for rsDNA solutions. Symbols: experimental T obtained from the melting curves of 8N, 12N and 20N (Fig 3), as function of salt, c. Conditions are specified in the legend. Dashed lines: theoretical predictions of Eq (7). The key result of the model is the non-trivial T dependence of the statistics for pairing quality in rsDNA solutions, which is codified into θ(T), the fraction of duplexes with quality , as given by Eq (9) and shown in Fig 6. θ(T) reflects the complexity of the interactions in such superdiverse environment, but only as an ensemble feature. A much more stringent test is provided by the comparison between CQ experiments, i.e., the measured and predicted θ(T, ϕ). In Fig 8 we compare the CQ data already shown in Fig 4 for 8N at c = 0.15M with the predicted θ(T) for several values of ϕ. The agreement is good, compatible with the range of uncertainty of both experiment and model, the latter being a consequence of the uncertainty on the free energy associated to the 8A*8B* duplex formation.

Fig 8

Probability of defectless duplexes in rsDNA: Theory vs experiment.

Probability of defectless duplexes in rsDNA: Theory vs experiment.

Fraction of paired 8A*8B* in 8N, θ, as determined via CQ experiments (dots and light shading, as in Fig 4) and from the model (dashed lines and dark shading), at c = 0.15M and for several ϕ values (colors, see legend). Black data, lines and shadings: melting in 8A*+8B* solutions, in the absence of rsDNA. Shaded regions of the theoretical predictions are obtained from the experimental uncertainty on the pairing energy between A* and B** (see S2 Text). In Fig 9 we show measured and predicted θ as a function ϕ for 8N at T ≈ 15°C at two ionic strengths, c = 0.15M and c = 1M. Noticeably, the ionic strength has little effect on θ(T, ϕ), much less than on T. This is because the salt contribution to the statistical weights is similar for the most probable pairings. Since at low T the pairing probability is approximately the ratio between statistical weights, the salt contribution effectively cancels (see S3 Text).

Fig 9

Probability of defectless duplexes in rsDNA: Salt dependence.

Probability of defectless duplexes in rsDNA: Salt dependence.

Fraction of paired 8A*8B* as a function of ϕ, expressing their dilution in 8N, at T = 15°C for c = 0.15M (blue dots) and c = 1.0M (red dots). Dashed lines: theoretical predictions, with the shaded regions obtained from the experimental uncertainty on the pairing energy between A*B*, (see S2 Text). The success of our model—which has no free parameter—in describing rsDNA solutions might be surprising, given the level of approximation introduced in the energy parametrization on . Indeed, the molecular diversity of rsDNA makes its behavior intrinsically averaged between all possible pairings patterns, rendering our parametrization, built on averages, particularly adequate. We would also like to point out that the model validity is limited to equilibrium conditions, as we documented above and in the S1 Text. Also, the model assumes unlimited molecular availability, and does not include the effects of competitive binding which could arise from constrained stoichiometric ratios in limited pools of molecules. The successful comparison with experiments indicates that rsDNA with short enough chain length satisfies all these requirements. In systems with longer molecules, out-of-equilibrium conditions and hybridization states with more than one helical region could become relevant [11]. The agreement with observations also validates the predicted pairing distributions θ(T) and θ|(T) shown in Fig 6, which are worth discussing further. These distributions indicate that, in a superdiverse environment of random sequence oligonucleotides, the selectivity afforded by the free energy of base-pairing is “marginal”. Specifically, the resulting pairing is good, but not perfect, with the majority of sequences being defected, but with less than two pairing errors. Defectless pairing involves at most a fraction of ≈ 14% of the rsDNA strands. The pairing statistics of rsDNA largely depends on the compensation between two opposing factors: (i) the degeneracy, g(L, ), which grows in a nearly exponential way with the total number of pairing errors (see Eq (5)), thus favoring the formation of defected duplexes; and (ii) the binding free energy, ΔG, which increases approximately linearly with the number of defects (see S3 Text), yielding a Boltzmann factor with an exponential advantage to duplexes with less defects (see Eq (2)). In DNA duplexes formation, these two factors nearly balance, with a partial dominance of the energetic component, a condition leading to the smooth α and T dependence of the probability distributions. The result would change if the degeneracy grows faster or slower than an exponential with the binding energy. An example of this latter condition is given by the selectivity of PCR primers within the genome, in which the set of competing bindings is limited with respect to the random situation, thus enabling a strong dominance of defectless primer binding. An obvious question is how critical is this marginal condition, and whether modifications in the nucleobase structure, and thus of pairing and stacking energy, could significantly improve selectivity. To answer this question we computed θ(T) by assuming that all pairing energies equal that of CG (which is roughly double of that of AT). We find a significant, but not dramatic, increment in θ, that at low T reaches 0.3 (see Fig D in S1 Text), indicating that the basic features of θ(T) are stable within the range of energies involved in natural and artificial nucleobase binding [7]. As a further test of the pairing efficiency in random systems, we computed θ(T) by considering random systems formed by only 2 bases, instead of the four natural ones, using energetic parameters intermediate between AT and CG (see Fig E in S1 Text). Even in this condition, where the degeneracy of the defected duplexes is strongly reduced, the fraction of perfect pairs is larger but still well under 0.4. When, on the contrary, the number of bases are increased (still assuming a WC-type pairing role), θ markedly decreases. These observations strengthen the notion that, in the range of pairing energies of nucleic acids, the presence of randomness appears to be the dominant factor in determining the quality of the pairs. These observations sets a reference for the selectivity in contexts of strong randomness and heterogeneity of sequences such as those that have likely characterized the origin of life and RNA world. Whatever the mechanisms of chain amplification and lengthening, and whatever base pair variants were at the time available, they could not have relied on levels of selectivity much better than those reported here. We previously proposed that one of such mechanisms leading to the formation of long nucleic acid chains could exploit the symmetry breaking and formation of molecular column due to liquid crystal ordering [27, 28]. rsDNA can indeed form, in given conditions, columnar liquid crystals [13]. How the distribution of pairing quality here discussed can be compatible with liquid crystal formation appears as a subtle matter that will be the topic of a forthcoming work.

Conclusion

We introduced rsDNA solutions as a model system of superdiverse mixture, enabling the study of interactions and pair formations in the midst of a huge amount of competing molecular species, a condition offering a conceptual paradigm for the molecular variety and selectivity of biological environments. In the analysis of rsDNA solutions, we could take advantage of the limited polymer heterogeneity given by the four nucleobases, of the rather simple and highly characterized pairing rule, of the availability of solid-state synthesis and of the variety of experimental tools. This combination of factors enabled us to experimentally characterize, and theoretically describe, the selectivity of pairing, and found that the majority of rsDNA duplexes contain pairing errors, but limited to one or two per duplex, a condition that still grants a reliable stability to the structures. Based on the success in the description of rsDNA, we applied our approach to the description of the selective pairing in the context of the PCR technology and miRNA based gene regulation, which is the topic of a forthcoming publication. Extending this statistical approach to biomolecular superdiverse systems closer to cell environments, characterized by a less dramatic diversity but by more complex and less defined interactions, will be the challenging development of this work.

Further results.

Evidence of Out-of-Equilibrium Conditions; Perfect pairing probability with different energies and number of nucleobases types. Fig A: Evidence of out-of-equilibrium behavior for 12N. T dependence of the fraction of duplexed strands θ measured while heating and cooling at 1°C/min. Fig B: Evidence of out-of-equilibrium behavior for 20N. Absorbance vs. T measured upon heating after one month equilibration at 4°C. Fig C: , determined from the model and via CQ experiments with different cooling rates. Fig D: computed with different values of f in 8N. Fig E: computed with different values of (PDF) Click here for additional data file.

Materials and methods.

Characterization of rsDNA synthesis; Measurement of rsDNA concentration; UV Absorbance: Experimental Setup; Analysis of UV Absorbance Data; Characterization of A* and B* Fluorescence; Contact-Quenching Data Analysis; Free Energy of A*B* Duplex. Fig A: HPLC traces of 12N compared with two different 12mers. HPLC traces of 8N, 12N and 20N. Fig B: Quartz microfluidic. Quantum Northwest Peltier. Fig C: Temperature calibration of measured by a thermistor in contact with the microfluidic cell vs. the internal control T. Fig D: Steps in the analysis of absorbance data used to extract the melting curves θ. Fig E: Fluorescence Emission Spectra of labeled DNA systems: A*, B*, A*B*, A*B and AB*. Fig F: Absorbance Spectra of labeled DNA systems: A*, B*, A*B*, A*B and AB*. Fig G: Simplified representation of the 12A*B* duplex, showing the relative size of DNA duplex, linker and fluorescent moieties FAM and TexasRed. Fig H: fluorescence intensity of TexasRed vs. in a solution of 12A* + 12B*. Fig I: Normalized fluorescence intensity of TexasRed in a solution of 12A*, 12B* and 12B. Fig J: Average normalized fluorescence of 8A*+8B*. The fit enables determining the linear drift at low temperatures, corresponding to the signal of fully paired A*B*. Fig K: . (PDF) Click here for additional data file.

Comprehensive Description of the theoretical model.

Partition Function; Melting Curve Approximations for rsDNA solution; Energetic Parametrization based on ; Pairing Statistics with α = 0; Effects of the Ionic Strength on the Pairing Statistics.Fig A: Melting Curves of rsDNA, according to the Low T Approximation, High T Approximation and Unification Ansatz. (PDF) Click here for additional data file. 14 Feb 2022 Dear prof Bellini, Thank you very much for submitting your manuscript "Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Eugene I. Shakhnovich Guest Editor PLOS Computational Biology Nir Ben-Tal Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors present a combined experimental and theoretical thermodynamic study of a solution consisting of all possible L-nucleotide-long DNA oligomers (L being 8, 12 and 20) generated by the random synthesis from a mixture of equal proportion of four nucleotide precursors. Authors’ main conclusion is that, despite the presence of a number of chains that can form mismatched duplexes with the given chain, which could prevent it from finding the complementary partner, the yield of fully matched duplexes at low enough temperatures is pretty high. Major comments 1. The authors failed to convince this reviewer that a pretty artificial system they consider deserves so much attention. In Introduction, they claim that the system they consider is relevant to the pre-biological evolution consideration within the framework of the RNA World hypothesis. But this claim looks too far-fetched to me. 2. There is no doubt that our current understanding of thermodynamic parameters of the DNA double helix and mismatched pairs is pretty solid so the fact that the authors obtained a good agreement between theory and experiment for their artificial system is not surprising and adds very little, if any, to our understanding the DNA thermodynamics. Technical comment Although the authors originally formulate the problem under consideration in very general terms, to perform a theoretical treatment they make a number more or less arbitrary assumptions, which allow them to solve the problem under consideration. For example, to calculate melting curves of fully matched duplexes they start with the nearest neighbor approximation, which takes into account the parameters of heterogeneous stacking obtained from melting experiments with long DNA duplexes, rather than using a simplified version of the melting theory, which considers the duplex stability as a function of only the duplex GC-content. But in the end, they de facto arrive at the simplified version anyway. They would make their reasoning more convincing and straightforward if they relied on a finding by Vologodskii and Frank-Kamenetskii (Phys Life Rev 25, 1-21, 2018) that theoretical models, which take into account heterogeneous stacking, do not yield better predictions of oligonucleotide duplex melting temperatures than the simplest model, which relies only on the GC-content. Reviewer #2: This is a very well-done paper dealing with the ability of complementary DNA strands to find each other among a plethora of other strands, from completely incompatible to partially compatible, and taking into account also imperfect structural pairing. Experiments and theory nicely go hand-in-hand, and there its much to say, all in all. Nonetheless, I want to probe the authors in two directions: 1) the theory works too well, given many approximations. It would be good if the authors could add a section for limitations of the analytical model, and where it could go wrong. This would provide a better view for potential users of the framework about when and where this treatment is appropriate and when they should instead be careful 2) Could the authors think about specific mixtures of DNA sequences designed in such a way to maximise complementarity. For example, what about ensuring a certain minimal Hamming distance (or the like) between sequences to reduce mismatches? I think these would be two valuable additions both in the direction of acknowledging the limitations of the model, and in terms of proposing how to make the model "useful" beyond the present experiments. Reviewer #3: The paper by Simone De Leo and co-workers present an analysis of the melting of random DNA oligomers in "Superdiverse Environments". The article is rather technical for the journal; however, it includes detailed supplementary information. It is well done and shows exceptional agreement between the experimental observations and the theoretical predictions. However, a deeper discussion of the results in the context of modern DNA in living cells, implications of their findings and possible limitation or strength due to the oligomer lengths would have been more commendable for a larger scientific audience. For instance, a more extensive discussion on the PCR primer selectivity and possible errors in the amplification would be interesting in different applications. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No: From my reading, I did not see any claim of public data/codes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 4 Mar 2022 Submitted filename: response to reviewers PLOS CB.pdf Click here for additional data file. 22 Mar 2022 Dear prof Bellini, We are pleased to inform you that your manuscript 'Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Eugene I. Shakhnovich Guest Editor PLOS Computational Biology Nir Ben-Tal Deputy Editor PLOS Computational Biology *********************************************************** 7 Apr 2022 PCOMPBIOL-D-22-00089R1 Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments Dear Dr Bellini, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Livia Horvath PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

26 in total

Review 1. The thermodynamics of DNA structural motifs.

Authors: John SantaLucia; Donald Hicks
Journal: Annu Rev Biophys Biomol Struct Date: 2004

2. Liquid crystal self-assembly of random-sequence DNA oligomers.

Authors: Tommaso Bellini; Giuliano Zanchetta; Tommaso P Fraccia; Roberto Cerbino; Ethan Tsai; Gregory P Smith; Mark J Moran; David M Walba; Noel A Clark
Journal: Proc Natl Acad Sci U S A Date: 2012-01-10 Impact factor: 11.205

3. End-to-end stacking and liquid crystal condensation of 6 to 20 base pair DNA duplexes.

Authors: Michi Nakata; Giuliano Zanchetta; Brandon D Chapman; Christopher D Jones; Julie O Cross; Ronald Pindak; Tommaso Bellini; Noel A Clark
Journal: Science Date: 2007-11-23 Impact factor: 47.728

4. Predicting sequence-dependent melting stability of short duplex DNA oligomers.

Authors: R Owczarzy; P M Vallone; F J Gallo; T M Paner; M J Lane; A S Benight
Journal: Biopolymers Date: 1997 Impact factor: 2.505

Review 5. Revisiting a dogma: the effect of volume exclusion in molecular crowding.

Authors: Anastasia Politou; Piero Andrea Temussi
Journal: Curr Opin Struct Biol Date: 2014-11-19 Impact factor: 6.809

6. Liquid Crystal Ordering and Isotropic Gelation in Solutions of Four-Base-Long DNA Oligomers.

Authors: Tommaso P Fraccia; Gregory P Smith; Lucas Bethge; Giuliano Zanchetta; Giovanni Nava; Sven Klussmann; Noel A Clark; Tommaso Bellini
Journal: ACS Nano Date: 2016-09-01 Impact factor: 15.881

Review 7. DNA melting and energetics of the double helix.

Authors: Alexander Vologodskii; Maxim D Frank-Kamenetskii
Journal: Phys Life Rev Date: 2017-11-14 Impact factor: 11.025

8. Validation of the nearest-neighbor model for Watson-Crick self-complementary DNA duplexes in molecular crowding condition.

Authors: Saptarshi Ghosh; Shuntaro Takahashi; Tamaki Endoh; Hisae Tateishi-Karimata; Soumitra Hazra; Naoki Sugimoto
Journal: Nucleic Acids Res Date: 2019-04-23 Impact factor: 16.971

9. Measuring thermodynamic details of DNA hybridization using fluorescence.

Authors: Yong You; Andrey V Tataurov; Richard Owczarzy
Journal: Biopolymers Date: 2011-03-07 Impact factor: 2.505

10. Abiotic ligation of DNA oligomers templated by their liquid crystal ordering.

Authors: Tommaso P Fraccia; Gregory P Smith; Giuliano Zanchetta; Elvezia Paraboschi; Youngwoo Yi; Yougwooo Yi; David M Walba; Giorgio Dieci; Noel A Clark; Tommaso Bellini
Journal: Nat Commun Date: 2015-03-10 Impact factor: 14.919