| Literature DB >> 31830902 |
Guangyao Zhou1, Jackson Loper2, Stuart Geman3.
Abstract
BACKGROUND: A pairings of nucleotide sequences. Given this forbidding free-energy landscape, mechanisms have evolved that contribute to a directed and efficient folding process, including catalytic proteins and error-detecting chaperones. Among structural RNA molecules we make a distinction between "bound" molecules, which are active as part of ribonucleoprotein (RNP) complexes, and "unbound," with physiological functions performed without necessarily being bound in RNP complexes. We hypothesized that unbound molecules, lacking the partnering structure of a protein, would be more vulnerable than bound molecules to kinetic traps that compete with native stem structures. We defined an "ambiguity index"-a normalized function of the primary and secondary structure of an individual molecule that measures the number of kinetic traps available to nucleotide sequences that are paired in the native structure, presuming that unbound molecules would have lower indexes. The ambiguity index depends on the purported secondary structure, and was computed under both the comparative ("gold standard") and an equilibrium-based prediction which approximates the minimum free energy (MFE) structure. Arguing that kinetically accessible metastable structures might be more biologically relevant than thermodynamic equilibrium structures, we also hypothesized that MFE-derived ambiguities would be less effective in separating bound and unbound molecules.Entities:
Keywords: Comparative secondary structure; Minimum free energy; Non-coding RNA; RNA folding kinetics; Ribonucleoproteins; Self-splicing introns; Thermodynamic equilibrium
Mesh:
Substances:
Year: 2019 PMID: 31830902 PMCID: PMC6909616 DOI: 10.1186/s12859-019-3303-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data Summary
| Family | Number | Min length | Max length | Median |
|---|---|---|---|---|
| Group I Introns | 116 | 210 | 2630 | 451 |
| Group II Introns | 34 | 619 | 2729 | 990 |
| tmRNA | 404 | 102 | 437 | 363 |
| SRP RNA | 346 | 66 | 533 | 274 |
| RNase P | 407 | 189 | 486 | 330 |
| 16s rRNA | 279 | 612 | 2394 | 1512 |
| 23s rRNA | 59 | 953 | 4381 | 2913 |
The seven families of RNA used in the experiments. Table includes the number of molecules in each family, as well as basic statistics about the numbers of nucleotides in the primary sequence of each of the molecules. Data was downloaded from the http://www.rnasoft.ca/strand/
Comparative Secondary Structures: calibrated ambiguity indexes, by RNA family
| Number | Median | Median | |
|---|---|---|---|
| Family | molecules | length | |
| Group I Introns | 116 | 451 | 0.432 |
| Group II Introns | 34 | 990 | 0.181 |
| tmRNA | 404 | 363 | 0.926 |
| SRP RNA | 346 | 274 | 0.790 |
| RNase P | 407 | 330 | 0.925 |
| 16s rRNA | 279 | 1512 | 0.938 |
| 23s rRNA | 59 | 2913 | 1.000 |
The number of molecules, the median length (number of nucleotides), and the median α scores for the T-S ambiguity indexes (Eq. 11) for each of the seven RNA families studied. RNA molecules from the first two families (unbound) are active without necessarily forming ribonucleoprotein complexes; the remaining five are bound in ribonucleoproteins. Molecules from the unbound families have lower ambiguity indexes
MFE Secondary Structures: calibrated ambiguity indexes, by RNA family
| Number | Median | Median | |
|---|---|---|---|
| Family | molecules | length | |
| Group I Introns | 116 | 451 | 0.833 |
| Group II Introns | 34 | 990 | 0.841 |
| tmRNA | 404 | 363 | 0.867 |
| SRP RNA | 346 | 274 | 0.803 |
| RNase P | 407 | 330 | 0.955 |
| 16s rRNA | 279 | 1512 | 0.982 |
| 23s rRNA | 59 | 2913 | 1.000 |
Identical to Table 1, except that the ambiguity indexes and their calibrations are calculated using the MFE secondary structures rather than comparative analyses. There is little evidence in the MFE secondary structures for lower ambiguity indexes among the unbound RNA molecules
Fig. 1Unbound or Bound? ROC performance of classifiers based on thresholding the T-S ambiguity index. Small values of dT-S(p,s) are taken as evidence that a molecule belongs to the unbound group as opposed to the bound group. In the left panel, the classifier is based on using the comparative secondary structure for s to compute the ambiguity index. Alternatively, the MFE structure is used for the classifier depicted in the right panel. AUC: Area Under Curve—see text for interpretation. Additionally, for each of the two experiments, a p-value was calculated based only on the signs of the individual ambiguity indexes, under the null hypothesis that positive indexes are distributed randomly among molecules in all seven RNA families. Under the alternative, positive indexes are more typically found among the unbound as opposed to bound families. Under the null hypothesis the test statistic is hypergeometric—see Eq 14. Left Panel: p=1.2×10−34. Right Panel: p=0.02. In considering these p-values, it is worth re-emphasizing the points made about the interpretation of p-values in the paragraph following Eq 14. The right panel illustrates the point: the ambiguity index based on the MFE secondary structure “significantly distinguishes the two categories (p=0.02)” but clearly has no utility for classification. (These ROC curves and those in Fig. 2 were lightly smoothed by the method known as “Locally Weighted Scatterplot Smoothing,” e.g. with the python command Y=lowess(Y, X, 0.1, return_sorted=False) coming from statsmodels.nonparametric.smoothers_lowess)
Fig. 2Comparative or MFE? As in Fig. 1, each panel depicts the ROC performance of a classifier based on thresholding the T-S ambiguity index, with small values of dT-S(p,s) taken as evidence that s was derived by comparative as opposed to MFE secondary structure analysis. Left Panel: performance on molecules chosen from the unbound group. Right Panel: performance on molecules chosen from the bound group. Conditional p-values were also calculated, using the hypergeometric distribution and based only on the signs of the indexes. In each case the null hypothesis is that comparative secondary structures are as likely to lead to positive ambiguity indexes as are MFE structures, whereas the alternative is that positive ambiguity indexes are more typical when derived from MFE structures. Left Panel: p=5.4×10−14. Right Panel: p=0.07
Numbers of Positive Ambiguity Indexes, by family
| Family | #mol’s | # | # |
|---|---|---|---|
| Group I Introns | 116 | 50 | 94 |
| Group II Introns | 34 | 8 | 27 |
| tmRNA | 404 | 368 | 358 |
| SRP RNA | 346 | 269 | 264 |
| RNase P | 407 | 379 | 377 |
| 16s rRNA | 279 | 210 | 254 |
| 23s rRNA | 59 | 53 | 54 |
#mol’s: number of molecules; # dT-S>0: numbers of positive T-S ambiguity indexes, secondary structures computed by comparative analysis; # dT~-S~>0: numbers of positive T-S ambiguity indexes, secondary structures computed by minimum free energy