| Literature DB >> 16982646 |
Zhi John Lu1, Douglas H Turner, David H Mathews.
Abstract
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37 degrees C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60 degrees C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16982646 PMCID: PMC1635246 DOI: 10.1093/nar/gkl472
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Hairpin loop enthalpy parametersa
| Parameter | Condition | Δ | SE (kcal/mol) |
|---|---|---|---|
| 1.3 | 1.79 | ||
| 4 | 4.8 | 1.31 | |
| 5 | 3.6 | 1.61 | |
| 6 | −2.9 | 1.01 | |
| 7 | 1.3 | 1.73 | |
| 8 | −2.9 | 1.72 | |
| 9 | 5.0 | 2.16 | |
| >9 | 5.0 | — | |
| UU or GA first mismatch but not AG | −5.8 | 1.27 | |
| Special GU closure | −14.8 | 2.35 | |
| 18.6 | 5.66 | ||
| A | 3.4 | 1.48 | |
| B | 7.6 | 9.57 |
aHairpin loops of <3 nt are prohibited. ΔH°(first mismatch stacking) and terminal mismatch bonuses apply only to hairpin loops with >3 unpaired nucleotides.
Lookup table for unstable triloops and stable tetraloops and hexaloops
| Hairpin | Ref(s) | |
|---|---|---|
| CaacG | A | 23.7 |
| GuuaC | A | 10.8 |
| CaacgG | B | 6.9 |
| CcaagG | B | −10.3 |
| CcacgG | B | −3.3 |
| CccagG | B | −8.9 |
| CcgagG | B | −6.6 |
| CcgcgG | B | −7.5 |
| CcuagG | B | −3.5 |
| CcucgG | B | −13.9 |
| CuaagG | B | −7.6 |
| CuacgG | C, D | −10.7 |
| CucagG | B | −6.6 |
| CuccgG | C | −12.9 |
| CugcgG | B | −10.7 |
| CuuagG | B | −6.2 |
| CuucgG | C, D | −15.3 |
| CuuugG | D | −6.8 |
| AcaguacU | E | −16.8 |
| AcagugcU | E,C | −12.8 |
| AcagugaU | C | −11.4 |
| AcaguguU | E | −15.4 |
Closing pairs are included and unpaired nucleotides are shown in lower case.
aFor extra stable hairpins measured in 0.1 M Na+ (A, B), placement was determined by assuming that the relative enthalpy of loops remains constant between 0.1 and 1 M Na+(30,34).
bA, Ref. (34); B, Ref. (35); C, Ref. (30); D, Ref. (31); E, Ref. (36).
Bulge loop initiation enthalpy parametersa
| Bulge length | SE (kcal/mol) | |
|---|---|---|
| 1 | 10.6 | 1.2 |
| 2 | 7.1 | 4.3 |
| 3 | 7.1 | 11.7 |
| (7.1) | — |
aNote that the nearest neighbor parameter for stacking of adjacent base pairs is added for bulges with 1 nt. For bulges with >1 nt, calculation of the stabilities of adjacent helices includes the terminal AU penalty terms for AU or GU pairs adjacent to the bulge.
The periodic table of tandem mismatch (2 × 2 internal loop) enthalpya
| Closing BP | Mismatch | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GA | AG | UU | GG | CA | CU | UC | CC | AC | AA | |
| AG | GA | UU | GG | AC | UC | CU | CC | CA | AA | |
| GC | −31.0c | −15.6c | −14.4b | −22.8 | −10.3b | −29.4b | (−14.7) | (−14.7) | −8.6b | −1.3b |
| −15.9c | −12.4b | |||||||||
| −28.4c | − | |||||||||
| − | ||||||||||
| CG | −8.9d | −12.7d | −17.5d | −22.8d | −10.8d | −0.6d | −2.8d | −1.8d | −1.7d | −4.2d |
| −16.5g | −5.0e | |||||||||
| − | − | |||||||||
| UA | −13.4c | −19.4 | −6.7b | 2.7 | 9.1b | 3.3b | 9.5b | (12.1) | (12.1) | 14.7b |
| AU | −17.9c | −10.8 | −12.2b | −1.0 | 7.2b | (7.4) | (7.4) | (7.4) | 7.5b | 13.4b |
| −11.0c | ||||||||||
| − | ||||||||||
| UG | −15.3c | −18.7f | (−8.5) | (−8.5) | (−8.5) | (−8.5) | (−8.5) | (−8.5) | (−8.5) | 1.7b |
| GU | −19.9c | −16.1f | (−19.6) | (−19.6) | (−19.6) | (−19.6) | (−19.6) | (−19.6) | (−19.6) | −23.2f |
aBoldface numbers are averages of multiple measurements on the same internal loops and numbers in parentheses are predicted by average of the nearest numbers to the left and right. The enthalpies of reference helices were taken from Ref. (21).
The enthalpies (kcal/mol) are drawn from b, Ref. (76); c, Ref. (52); d, Ref. (55,77); e, Ref. (78); f, (79); g, Ref. (80).
Approximations for internal loop enthalpy parameters at 37°C (in kcal/mol)
| Length (nt) | 2 | 3 | 4 | 5 | 6 | >6 | |
|---|---|---|---|---|---|---|---|
| −10.5 ± 1.4 | 0.3 ± 1.2 | −7.2 ± 1.1 | −6.8 ± 1.8 | −1.3 ± 1.2 | (−1.3) | ||
| 5.0 ± 0.7 | |||||||
| 3.2 ± 0.7 | |||||||
| Type of loop (first pair): | 5′RA 3′YG | 5′YA 3′RG | 5′RG 3′YA | 5′YG 3′RA | G G | U U | 5′RU 3′YU |
| 1 × 1 | NA | NA | NA | NA | −7.9 ± 3.7 | NA | −3.4±1.7 |
| 1 × 2 | 0 | −5.8 ± 1.5 | −5.8 ± 1.5 | −5.8 ± 1.5 | −5.8 ± 1.5 | −10.1 ± 1.7 | NA |
| 1 × ( | 0 | 0 | 0 | 0 | 0 | 0 | NA |
| 2 × 3 | 0 | −5.7 ± 3.8 | −10.9 ± 2.7 | −8.6 ± 1.9 | −9.0 ± 4.6 | −6.4 ± 2.5 | NA |
| Others (except 2 × 2) | −3.4 ± 1.3 | −3.4 ± 1.3 | −7.6 ± 1.0 | −7.6 ± 1.0 | 2.8 ± 2.4 | −5.8 ± 1.1 | NA |
The parameters were obtained from a set of linear regressions of experimental data for 1 × 1, 1 × 2, 1 × 3, 2 × 2, 2 × 3 and 3 × 3 loops. , , , (others: AG), (others: GA), (others:GG), (others: UU), (1 × 1: GG), (1 × 1: UU) are determined with linear regression of all the loops excluding the 2 × 2 and 1 × 2 loops. (2 × 3: GG) was specified and separated in the regression to make the standard errors of (others: GG) smaller. Some parameters of 1 × 2 and 2 × 3 were specified by refitting of 1 × 2 and 2 × 3 loops respectively, supposing that (5′RA/3′YG) was zero.
aWhen the internal loop is large (n > 6) the increase of free energy is assumed to be derived from entropy (57), so the initiation term, is the same as (6).
bNA, not applicable to that type of loop.
Updated internal loop free energy parameters at 37°C (in kcal/mol)
| Length (nt) | 2 | 3 | 4 | 5 | 6 | n>6 | |
|---|---|---|---|---|---|---|---|
| 0.5 ± 0.1 | 1.7 ± 0.1 | 1.1 ± 0.1 | 2.0 ± 0.1 | 2.0 ± 0.1 | 2.0 + 1.08 ln( | ||
| 0.7 ± 0.1 | |||||||
| 0.6 ± 0.1 | |||||||
| Type of loop (first pair): | 5′RA 3′YG | 5′YA 3′RG | 5′RG 3′YA | 5′YG 3′RA | G G | U U | 5′RU 3′YU |
| 1 × 1 | NA | NA | NA | NA | −2.6 ± 0.3 | NA | −0.4 ± 0.1 |
| 1 × 2 | 0 | −1.2 ± 0.2 | −1.2 ± 0.2 | −1.2 ± 0.2 | −1.2 ± 0.2 | −0.8 ± 0.2 | NA |
| 1 × ( | 0 | 0 | 0 | 0 | 0 | 0 | NA |
| 2 × 3 | 0 | −0.5 ± 0.2 | −1.2 ± 0.1 | −1.1 ± 0.1 | −0.7 ± 0.2 | −0.4 ± 0.1 | NA |
| Others (except 2 × 2) | −0.8 ± 0.1 | −0.8 ± 0.1 | −1.0 ± 0.1 | −1.0 ± 0.1 | −1.0 ± 0.2 | −0.6 ± 0.1 | NA |
The parameters were obtained from a set of linear regression of the same experimental data as Mathews et al. (17), except for some updated data of 3 × 3 loops from Chen et al. (41).
aNA, not applicable to that type of loop.
Enthalpy parameters for multibranch loop initiation
| Parameter | Value (kcal/mol) | SE (kcal/mol) |
|---|---|---|
| 38.9 | 14.2 | |
| 12.9 | 2.9 | |
| −11.9 | 3.7 | |
| 27.1 | 6.8 |
aThe b term is excluded in the dynamic programming algorithm prediction of secondary structure. And the parameters a and c were optimized to be a = 30.0 kcal/mol and c = −2.2 kcal/mol in the dynamic programming calculation to achieve the highest prediction sensitivity.
Figure 1(A) Free energy difference of RNA duplex CCGGUp. ΔG° (dashed line) was derived from Equation 3, where enthalpy and entropy were averaged from the optical melting curve fits, assuming that they were independent of the temperature. (solid line) was calculated from Equations 1–3, where the heat capacity was accounted. (B) Free energy difference is ΔΔG° = (62).
Free energy differences of RNA duplexes
| Sequence | Δ | ΔΔ | |||||
|---|---|---|---|---|---|---|---|
| 0°C | 10°C | 60°C | 75°C | 100°C | |||
| CCGG | −4.36 | −382 | 0.5 | 0.2 | 0.9 | 1.4 | 3.1 |
| CCGGAp | −6.58 | −263 | 0.9 | 0.5 | 0.2 | 0.4 | 1.3 |
| CCGGUp | −5.56 | −355 | 0.8 | 0.4 | 0.4 | 0.8 | 2.1 |
| ACCGGp | −5.39 | −393 | 0.8 | 0.4 | 0.5 | 0.9 | 2.5 |
| ACCGGUp | −8.17 | −434 | 1.7 | 1.0 | −0.1 | 0.2 | 1.3 |
aExperimental results of total free energy at 39°C.
bFree energy difference: , where ΔG° is derived from Equation 3, assuming that the enthalpy and entropy were independent of the temperature and is calculated from Equations 1–3, including the non-zero heat capacity (73).
Calculation time and memory size of dynamic programming for sequences of different length
| Sequence | Length (nt) | ||||||
|---|---|---|---|---|---|---|---|
| Time (h:min:s) | Memory (MB) | Time (h:min:s) | Memory (MB) | ||||
| 77 | 00:00:00.3 | (00:00:00.3) | 13 | 00:00:00.2 | (00:00:00.3) | 13 | |
| 268 | 00:00:21 | (00:00:03) | 13 | 00:00:04 | (00:00:03) | 14 | |
| 433 | 00:02:27 | (00:00:12) | 15 | 00:00:14 | (00:00:11) | 16 | |
| 631 | 00:11:59 | (00:00:35) | 16 | 00:00:46 | (00:00:34) | 19 | |
| 1542 | 06:09:03 | (00:06:47) | 31 | 00:13:42 | (00:07:41) | 45 | |
| 2904 | 67:00:43 | (00:47:00) | 73 | 01:41:00 | (01:03:54) | 121 | |
Calculation size and time on a computer with Pentium 4, 3.2 GHz, processor and 1 GB of RAM using the gcc (version 3.2.3) compiler on Red Hat Enterprise Linux 3. The algorithm was improved from O(N4) to O(N3) in time complexity. In parentheses are the results with a limitation of internal loop size set at fewer than or equal to 30 unpaired nucleotides. The O(N3) algorithm is the implementation of the Lyngsø et al. (71) algorithm.
Figure 2Improvement of prediction at optimal growth temperatures. The sequences are those from mesophiles (optimal growth temperature from 10 to 60°C) without organisms with optimal growth at 37°C. The lowest free energy secondary structures were predicted at the organims' optimal growth temperatures using two models. The previous model and parameters are those of Serra and Turner (24), which are widely used. The improved prediction uses the model and parameters presented in this work. The small and large subunits of rRNA sequences are divided into domains of <700 nt. The total sensitivity is the average of sensitivities of different types of RNA.
Prediction sensitivities of the lowest free energy structurea
| Organisms' optimal growth temperature | Nucleotides | Average sensitivity (%) | |
|---|---|---|---|
| Prediction at 37°C | Prediction at optimal growth temperature | ||
| ≤21 | 5536 | 79 ± 19.4 | 62.8 ± 28.4 |
| 22–26 | 7459 | 70.6 ± 13.4 | 71.0 ± 12.6 |
| 27–31 | 20 877 | 66.8 ± 10.4 | 67.7 ± 9.6 |
| 32–36 | 3124 | 64.9 ± 15.9 | 72.4 ± 21.8 |
| 38–42 | 1471 | 79.8 ± 2.2 | 79.8 ± 2.2 |
| 43–47 | 6268 | 78.3 ± 16.3 | 75.6 ± 20.2 |
| 48–52 | 1255 | 75.4 ± 14.8 | 71.3 ± 19.3 |
| 53–57 | 385 | 87.7 ± 8.6 | 90.8 ± 13.0 |
| 58–62 | 2937 | 84.5 ± 15.2 | 84.1 ± 15.7 |
| ≥63 | 12 395 | 76.9 ± 11.2 | 48.6 ± 11.3 |
aThe sequences are those from organisms with optimal growth temperature from 10 to 90°C, excluding 37°C.
bThe prediction at 37°C and optimal growth temperature for the organisms growing in different range of temperatures, using the current model in Materials and Methods.
cSensitivity equals the number of correctly predicted base pairs divided by the total number of known base pairs. The average sensitivity is the average of sensitivities of available types of RNA at different range of temperatures.
Figure 3PPV for optimal structure and base pairs with different pairing probabilities. PPV equals the number predicted base pairs in that are in the known structure divided by total number of predicted base pairs. Pairs in the optimal structures are grouped by different thresholds of pairing probabilities. The pairing probabilities were calculated with a partition function calculation (22) at organisms' optimal growth temperatures, using the model and parameters presented in Materials and Methods. The small and large subunits of rRNA sequences are divided into domains of <700 nt. The sequences of different type of RNA are those from mesophiles (living from 10 to 60°C) without organisms living at 37°C.
Figure 4Secondary structure prediction of Saccharomyces cerevisiae tRNA (RM4000) at optimal growth temperature (30°C) (B) and at 37°C (C) with the presented nearest neighbor parameters. Base pairs in the original structure (A) are derived from the comparative analysis database (42–49,58,59). Structures are also color annotated to indicate predicted base pair probabilities (Pbp) for each helix: red, Pbp ≥ 0.95; yellow, 0.95 > Pbp ≥ 0.7; green, 0.7 > Pbp ≥ 0.3; blue, 0.3 > Pbp. The structures were drawn with XRNA () and Adobe Illustrator.
Figure 5Experimental (Supplementary Data) (25–31) versus predicted (Tm = ΔH°/ΔS° − 273.15) melting temperatures of hairpin stem–loop structures. The line shows the ideal location of points, predicted Tm = measured Tm. The root mean squared deviation (r.m.s.d.) of prediction compared to experiment is 5.86°C. The new enthalpy parameters provide improved Tm prediction compared to the previous compilation of parameters (24), which have an r.m.s.d. of 7.58°C as compared to experiment for this dataset.
Figure 6Relationships of melting temperatures, nucleotide contents and optimal growth temperatures of different types of RNA in different organisms with optimal growth temperature from 10 to 90°C: (A) Predicted melting temperature; (B) G–C pair content; (C) G content; and (D) U content versus optimal growth temperature. Melting temperatures are predicted for different types of RNA sequences from comparative analysis databases (42–49,58,59) with a two-state transition assumption.