| Literature DB >> 20876693 |
Amber R Davis1, Charles C Kirkpatrick, Brent M Znosko.
Abstract
RNA is known to be involved in several cellular processes; however, it is only active when it is folded into its correct 3D conformation. The folding, bending and twisting of an RNA molecule is dependent upon the multitude of canonical and non-canonical secondary structure motifs. These motifs contribute to the structural complexity of RNA but also serve important integral biological functions, such as serving as recognition and binding sites for other biomolecules or small ligands. One of the most prevalent types of RNA secondary structure motifs are single mismatches, which occur when two canonical pairs are separated by a single non-canonical pair. To determine sequence-structure relationships and to identify structural patterns, we have systematically located, annotated and compared all available occurrences of the 30 most frequently occurring single mismatch-nearest neighbor sequence combinations found in experimentally determined 3D structures of RNA-containing molecules deposited into the Protein Data Bank. Hydrogen bonding, stacking and interaction of nucleotide edges for the mismatched and nearest neighbor base pairs are described and compared, allowing for the identification of several structural patterns. Such a database and comparison will allow researchers to gain insight into the structural features of unstudied sequences and to quickly look-up studied sequences.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20876693 PMCID: PMC3035445 DOI: 10.1093/nar/gkq793
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Single mismatch graph (top) and MC-Search input descriptor (bottom). The nucleotides are numbered A1 to A3 and B1 to B3 in the 5′ to 3′ direction. The ‘A’ and ‘B’ letter designations specify opposing RNA strands. The letter ‘N’ represents any nucleotide. The input descriptor identifies the canonical nearest neighbors by limiting the allowed pairing interactions to the canonical pairs defined by the Roman (85–87) and Arabic (88,89) numerals. Not all possible numerals for A–U, U–A, G–C, C–G, G–U and U–G pairs are shown here due to space limitations. The input descriptor identifies the mismatched nucleotides by allowing an interaction defined by no hydrogen bonds, while also prohibiting the canonical pairing interactions defined by the Roman and Arabic numerals.
Summary of the structural orientation and interaction of the 30 frequently occurring single mismatches
aAll possible orientations and hydrogen bonding patterns are not shown for each single mismatch-nearest neighbor combination. Only those representing at least 5% of total occurrences are included.
bFor each sequence, the top strand is written 5′–3′, and the bottom strand is written 3′–5′. Duplexes are written in alphabetical order by the loop nucleotide (A over G, not G over A). If the loop nucleotides are identical, then duplexes are written in alphabetical order by the nearest neighbors (CUG over GUU, not GUU over CUG).
cFrequency of occurrence in the database (84).
dNumber of times each single mismatch-nearest neighbor sequence combination was located in the three dimensional RNA structure database compiled from structures deposited into the PDB.
eNumber of occurrences in each subclass, which is determined among each sequence combination, considering four parameters: interacting edges for the single mismatch nucleotides and the nearest neighbor base pairs and hydrogen bond patterns for the single mismatch nucleotides and the nearest neighbor base pairs.
fAnnotated orientations and hydrogen bonding patterns of the single mismatch and 5′- and 3′-nearest neighbor nucleotides, which is described in ‘Materials and Methods’ section.
Figure 2.Representation of an A·G mismatch in the 5′(A)H/3′(G)S pairing, antiparallel, trans orientation with XI hydrogen bonding pattern (PDB ID 1C04), which is the most common orientation and interaction determined for the most frequently occurring A·G mismatch-nearest neighbor combinations (84) that were also represented in the PDB.
Figure 3.Representation of a U·U mismatch in the 5′(U)W/3′(U)W pairing, antiparallel, cis orientation with XVI hydrogen bonding pattern (PDB ID 1FJG), which is the most common orientation and interaction determined for the most frequently occurring U·U mismatch-nearest neighbor combinations (84) that were also represented in the PDB.
Figure 4.Representation of in the hydrogen bonded, stacked orientation (PDB ID 1O9M) (a) and in the non-hydrogen bonded, unstacked orientation (PDB ID 1O9M) (b).
Figure 5.Representation of an A·C mismatch in the 5′(A)H/3′(C)W pairing, antiparallel, trans orientation with XXV hydrogen bonding pattern (PDB ID 1FJG)), which is the most common orientation and interaction determined for the A·C mismatch-nearest neighbor combination of . This mismatch-nearest neighbor sequence combination is found in the 30 most frequently occurring single mismatches (84) and accounts for 80% of the total A·C mismatches found in this study.
Figure 6.Representation of a C·U mismatch in the 5′(C)W/3′(U)W pairing, antiparallel, cis orientation with one_hbond hydrogen bonding pattern (PDB ID 1FJG), which is the most common orientation and interaction determined for the most frequently occurring C·U mismatch-nearest neighbor combinations (84) that were also represented in the PDB.
Figure 7.Representation of a G·G mismatch annotated as having no interaction (PDB ID 2QAM), which is the most common orientation and interaction determined for the most frequently occurring G·G mismatch-nearest neighbor combination, (84) that was also represented in the PDB.