| Literature DB >> 20810537 |
Aron M Yoffe1, Peter Prinsen, William M Gelbart, Avinoam Ben-Shaul.
Abstract
We show on general theoretical grounds that the two ends of single-stranded (ss) RNA molecules (consisting of roughly equal proportions of A, C, G and U) are necessarily close together, largely independent of their length and sequence. This is demonstrated to be a direct consequence of two generic properties of the equilibrium secondary structures, namely that the average proportion of bases in pairs is ∼60% and that the average duplex length is ∼4. Based on mfold and Vienna computations on large numbers of ssRNAs of various lengths (1000-10 000 nt) and sequences (both random and biological), we find that the 5'-3' distance-defined as the sum of H-bond and covalent (ss) links separating the ends of the RNA chain-is small, averaging 15-20 for each set of viral sequences tested. For random sequences this distance is ∼12, consistent with the theory. We discuss the relevance of these results to evolved sequence complementarity and specific protein binding effects that are known to be important for keeping the two ends of viral and messenger RNAs in close proximity. Finally we speculate on how our conclusions imply indistinguishability in size and shape of equilibrated forms of linear and covalently circularized ssRNA molecules.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20810537 PMCID: PMC3017586 DOI: 10.1093/nar/gkq642
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Three different representations of the mfold-predicted minimum free energy secondary structure of a random 200 nt ssRNA of uniform composition (25% A, C, G, U). (A) Conventional schematic, drawn with mfold, showing base-paired regions (duplexes) and single-stranded loops. (B) jViz.Rna drawing (16), emphasizing the flexibility of single-stranded loops and scaled dimensions of duplexes. (C) Graph-theoretic mapping of this secondary structure, reducing duplexes to edges (bonds) and loops to vertices (filled circles); the single ‘exterior’ loop is depicted by an open circle.
Figure 2.Detailed view of an exterior loop consisting of covalent links and H-bonded links of nucleotides. The effective contour length of the loop is .
Composition ()-dependence of the average percentage of bases paired (f), the average duplex length (k) and the average 5′–3′ distance (D), for different sets of random and yeast-derived sequences of length 3000 nt; each set consists of 500 sequences
| Type of ssRNA | Folding program | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| G | C | A | U | ||||||
| Random, viral-like | RNAsubopt | 24 | 22 | 26 | 28 | 62 ± 1 | 4.0 ± 0.1 | 12 ± 4 | 11.6 |
| Random, uniform | RNAsubopt | 25 | 25 | 25 | 25 | 61 ± 1 | 3.9 ± 0.1 | 12 ± 5 | 12.6 |
| Yeast-derived | RNAsubopt | 19 | 19 | 31 | 31 | 58 ± 2 | 4.1 ± 0.1 | 14 ± 5 | 11.9 |
| Random, viral-like | mfold | 24 | 22 | 26 | 28 | 61 ± 1 | 4.5 ± 0.1 | 14 ± 7 | 12.8 |
Values following the ± symbols are standard deviations.
aThe randomly-permuted ssRNAs of each type are of identical composition; for the yeast ssRNAs, the mean composition is listed.
bThese are ssRNA transcripts of successive 3000 bp sections of yeast (S. cerevisiae) chromosomes XI and XII.
Values of f, k and D for viral ssRNAs, determined with RNAsubopt
| Viral taxon | No. of seq. | Host | ||||
|---|---|---|---|---|---|---|
| Bromoviridae RNA3 | 8 | Plant | 2210 | 63 ± 1 | 4.2 ± 0.1 | 19 ± 6 |
| Bromoviridae RNA2 | 8 | Plant | 2891 | 63 ± 2 | 4.3 ± 0.1 | 18 ± 4 |
| Bromoviridae RNA1 | 8 | Plant | 3265 | 64 ± 2 | 4.3 ± 0.1 | 15 ± 3 |
| Leviviridae | 9 | Bacterium | 3780 | 68 ± 2 | 4.3 ± 0.1 | 15 ± 9 |
| Sobemovirus | 9 | Plant | 4199 | 66 ± 2 | 4.2 ± 0.2 | 17 ± 4 |
| Luteovirus | 17 | Plant | 5725 | 62 ± 1 | 4.2 ± 0.1 | 16 ± 7 |
| Tymovirus | 9 | Plant | 6300 | 45 ± 4 | 3.9 ± 0.1 | 26 ± 5 |
| Tobamovirus | 22 | Plant | 6425 | 64 ± 1 | 4.2 ± 0.1 | 19 ± 5 |
| Astroviridae | 6 | Animal | 6719 | 63 ± 1 | 4.3 ± 0.1 | 16 ± 8 |
| Caliciviridae | 18 | Animal | 7713 | 62 ± 1 | 4.1 ± 0.1 | 20 ± 19 |
Values following the ± symbols are standard deviations.
aNumber of sequences analyzed.
Figure 3.Mean ensemble-averaged 5′–3′ distances, , from Equation (1), for random and viral sequences. Standard deviations are shown with vertical bars. The small black points represent the 10 groups of viral sequences listed in Table 2. The large gray points represent the 14 different lengths of randomly-permuted RNAs (50–8000 nt), of viral-like composition, described in the text. The line is a least-squares fit to the values for random sequences with . The asymptotic value of for the random sequences is very close to the theoretically predicted one, [see Equation (2)].