| Literature DB >> 32448174 |
Hiroki Takizawa1, Junichi Iwakiri1, Kiyoshi Asai2,3.
Abstract
BACKGROUND: Analysis of secondary structures is essential for understanding the functions of RNAs. Because RNA molecules thermally fluctuate, it is necessary to analyze the probability distributions of their secondary structures. Existing methods, however, are not applicable to long RNAs owing to their high computational complexity. Additionally, previous research has suffered from two numerical difficulties: overflow and significant numerical errors. RESULT: In this research, we reduced the computational complexity of calculating the landscape of the probability distribution of secondary structures by introducing a maximum-span constraint. In addition, we resolved numerical computation problems through two techniques: extended logsumexp and accuracy-guaranteed numerical computation. We analyzed the stability of the secondary structures of 16S ribosomal RNAs at various temperatures without overflow. The results obtained are consistent with previous research on thermophilic bacteria, suggesting that our method is applicable in thermal stability analysis. Furthermore, we quantitatively assessed numerical stability using our method..Entities:
Keywords: Accuracy-guaranteed numerical computation; Dynamic programming; Interval arithmetic; RNA secondary structure; Ribosomal RNA.
Year: 2020 PMID: 32448174 PMCID: PMC7245837 DOI: 10.1186/s12859-020-3535-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Calculation time of the proposed method. The red solid line represents y=0.0010101x2. The purple dashed line represents y=3.0163e−10x5. Both lines were fit to the result using the least squares method
Fig. 2Calculation time of the proposed method. Each data point is the calculation time of a single sequence from the S151Rfam dataset. The y-axis represents the logarithm of calculation time, and the x-axis represents the length of RNA
Fig. 3Thermal-stability analysis for secondary structures of E. coli and T. thermophilus 16S rRNAs. The "initial reference" is the reference structure obtained using CentroidFold. The "refined reference" is the reference structure obtained using RintC and the base-pairing probability matrix (BPPM) (see the “Thermal stability of ribosomal RNA” section for the details). The "natural reference" is the reference structure derived from the three-dimensional structure. The “Experimental procedure” section provides a detailed description of the "natural reference"
Fig. 4The result of the numerical error experiment with RF00008B (54 nucleotides). The leftmost plot explains the convex hulls for the result values and their errors under each experimental condition. The three plots to the right are scatter plots of the raw data for the result values and errors under one experimental condition and the convex hulls for each. In this evaluation, the reference structure was obtained by CentroidFold
Fig. 5The results of the numerical error experiment. a Numerical error comparison. b Relationship between sequence length and numerical error. The x- and y-axes have minima of (log10(median)+log10(width)) under the DFT and FFT methods. The reference structure was obtained using CentroidFold [17]. (γ=1.0)
Computational complexity of the existing and proposed methods are summarized
| RintW, time | RintC (proposed), time | |
|---|---|---|
| preprocessing | ||
| main calculation | ||
| postprocessing (DFT) | ||
| postprocessing (FFT) | ||
| total (DFT) | ||
| total (FFT) | ||
| RintW, space | RintC (proposed), space | |
| preprocessing | ||
| main calculation | ||
| postprocessing (DFT) | ||
| postprocessing (FFT) | ||
| total (DFT) | ||
| total (FFT) |
N = sequence length. W = maximum-span. Note that H≤N and W≤N always holds. U = degree of parallelism