| Literature DB >> 35924027 |
Anna Kirkpatrick1, Kalen Patton1,2, Prasad Tetali1,2, Cassie Mitchell3.
Abstract
Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.Entities:
Keywords: Markov chain Monte Carlo; Markov chain convergence; RNA secondary structure; nearest neighbor thermodynamic Model
Year: 2020 PMID: 35924027 PMCID: PMC9344895 DOI: 10.3390/mca25040067
Source DB: PubMed Journal: Math Comput Appl ISSN: 1300-686X
Figure 1.A ribonucleic acid (RNA) secondary structure for one of the combinatorial RNA sequences used in this work and its corresponding plane tree. The ordering of the edges in the plane tree is derived from the 3’ to 5’ ordering of the RNA sequence. Note that the exterior loop corresponds to the root of the plane tree. The diagram in (a) was generated by ViennaRNA [19]. (a) A maximally-paired secondary structure for A4(C5GA4CG5A4)4 has 4 helices; (b) The corresponding plane tree has 4 edges and encodes the branching pattern seen in the secondary structure.
Nearest Neighbor Thermodynamic Model (NNTM) parameters and resulting energy functions. Energy functions are of the form αd + βd1 + γr.
| Y | Z | Turner | a | b | c | h | f | i | g |
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | G | 89 | 4.6 | 0.4 | 0.1 | −10.9 | 3.8 | 3.0 | −1.6 | −0.9 | −1.8 | −1.7 |
| G | C | 89 | 4.6 | 0.4 | 0.1 | −16.5 | 3.5 | 3.0 | −1.9 | −0.9 | −1.2 | −1.7 |
| C | G | 99 | 3.4 | 0 | 0.4 | −12.9 | 4.5 | 2.3 | −1.6 | 2.3 | 1.3 | −0.4 |
| G | C | 99 | 3.4 | 0 | 0.4 | −16.9 | 4.1 | 2.3 | −1.9 | 2.2 | 1.9 | −0.4 |
| C | G | 04 | 9.3 | 0 | −0.9 | −12.9 | 4.5 | 2.3 | −1.1 | −2.8 | −3.0 | 0.9 |
| G | C | 04 | 9.3 | 0 | −0.9 | −16.9 | 4.1 | 2.3 | −1.5 | −2.8 | −2.2 | 0.9 |
Figure 2.A plane tree with edges labeled according to the bijection Φ, along with its corresponding 2-Motzkin path.
The main Markov chain algorithm. This pseudocode calculates X given X0.
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Algorithm to convert a sampled 2-Motzkin path to a plan tree. The pseudocode calculates φ−1(x).
| root ← new Node() |
| // |
| |
| // |
| |
| // the stack will keep track of previous values of |
| stack = new Stack() |
| root.children.append( |
| |
| node ← new Node() |
| |
| |
| stack.push( |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Figure 3.The four level decomposition of (left), and the projection chains corresponding to each decomposition (right).