| Literature DB >> 24217996 |
Xiao Li1, Hilal Kazan, Howard D Lipshitz, Quaid D Morris.
Abstract
RNA-protein interactions differ from DNA-protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24217996 PMCID: PMC4253089 DOI: 10.1002/wrna.1201
Source DB: PubMed Journal: Wiley Interdiscip Rev RNA ISSN: 1757-7004 Impact factor: 9.957
Figure 1Three-dimensional structures of RNA-binding domain (RBD)–RNA complexes. (a) Solution structure of polypyrimidine tract binding (PTB) protein RBD1 in complex with CUCUCU RNA [Protein Data Bank (PDB): 2AD9]. PTB RBD1 binds a YCU site (Y indicating pyrimidine) through β4, β1, and β2, respectively. (b) Co-crystal structure of the PUM-homology domain (PUM-HD) in human Pum1 complexed with a 10-nucleotide single-stranded RNA, 5′-AUUGUACAUA where the last eight nucleotides (UGUACAUA) are individually recognized by three conserved amino acids in Puf repeats 8 to 1, respectively14 (PDB: 1M8Y). (c) Solution structure of the Vts1p sterile-α motif (specific affinity matrix, SAM) domain in complex with a 5′-CUGGC-3′ pentaloop as part of a 19nt hairpin (PDB: 2ESE). The specific interaction between the Vts1p SAM domain and the target RNA is stabilized by both the direct interaction to the third guanosine base in the RNA pentaloop and the contacts to the unique backbone structure.16–18 (d) Solution structure of dsRBD of yeast Rnt1p in complex with the 5′ terminal AGNN tetraloop of snR47 precursor RNA (PDB: 1T4l). Neither A nor G are recognized by specific hydrogen bonds; instead, the N-terminal helix of the Rnt1p dsRBD interacts with the backbone and the two nonconserved tetraloop bases, by snugly fitting into the minor groove side of the RNA tetraloop and extending into the minor groove at the top of the stem.19
Web Resources for RBP Binding Sites
| Database | Collection | Properties (Features) | Availability | References |
|---|---|---|---|---|
| ARESITE | AU-rich elements (ARE) in vertebrate mRNA UTR sequences | Input gene sequence is searched for enrichment of eight predefined consensus ARE. For each detected motif, conservation patterns and predicted accessibility values are displayed. | ||
| CisBP-RNA | RBP motifs identified by RNAcompete and RBPDB | Users can browse or bulk download motifs for all eukaryotic RBPs including direct measured motifs for more than 200 RBPs from RNAcompete or RBPDB, as well as thousands more motifs inferred by homology. Also, scans input RNA sequences for hits to directly motifs. | ||
| CLIPZ | Binding sites from CLIP experiments, including Quaking, Pumilio, Argonautes 1–4, TNRC6 A-C, IGF2BP 1–3 | Users can browse the clusters of genome- or transcript-based reads. Clusters from different experiments can be compared. The transcripts associated with a gene name could be searched for binding sites. There is also a motif enrichment tool that identifies overrepresented | ||
| doRiNA | RBP and miRNA binding sites identified by CLIP experiments | CLIP-derived peaks for RBPs and miRNAs from humans, mouse, flies, and worms are available. Users can also search overlapping sites between multiple RBPs or between RBPs and miRNAs. | ||
| RBPDB | Experiments and observations about RBP binding sites in metazoan genomes | All experiments with binding data related to metazoan RBPs can be retrieved by entering the associated gene name. Input sequences can be scanned for matches with RBP binding sites. Includes motif models for more than 70 RBPs. | ||
| Rfam | Non-coding RNA genes, structured | Each entry includes multiple sequence alignment, a secondary structure, and related references. Please see associated reference for a complete description of available features. | ||
| UTRSite | Regulatory elements in 5′ and 3′ UTRs | Each entry summarizes the current knowledge on a regulatory element: location (e.g., 3′UTR), Rfam cross-reference, binding proteins and interactor(s) of binding protein(s) and related references. Tools for searching and scanning are available. |
Motif Finding Methods
| Software/Method | Input | Summary | Availability | References |
|---|---|---|---|---|
| AMADEUS | DNA or RNA sequences | A method for finding short sequence motifs overrepresented in the promoters or 3′UTRs of a given set of genes | Software package: | |
| Aptamotif | RNA sequences identified by SELEX | A method for finding sequence-structure motifs in SELEX-derived aptamers. RNA secondary structure is predicted with ensemble-based methods. | Software package: available upon request | |
| CMfinder | RNA sequences | Extension of CM models to search for RNA motifs in a set of unaligned sequences with long flanking regions | Software package and web server: | |
| cERMIT | DNA or RNA sequences and associated expression or affinity measures | A rank-ordered-based method that searches for sequence motifs bested supported by the observed experimental evidence (i.e., semiquantitative genome-wide binding data). It uses the complete dataset and does not require a cutoff to define the positive set. | Software package: | |
| COVE | RNA sequences and alignment (optional) | Implementation of CMs for (1) secondary structure-based multiple sequence alignment; (2) consensus secondary structure prediction; and (iii) secondary structure-based database scanning. | Software package: | |
| FIRE | DNA or RNA sequences | A method to detect DNA or RNA motifs that model the mutual information between sequences and gene expression measurements. | Software package: | |
| MatrixREDUCE | DNA or RNA sequences and associated expression or affinity measures | A biophysical model to discover sequence-specific binding affinity of the factor of interest (TF or RBP). | Software package: | |
| MEME | DNA or RNA sequences | A generative model for finding motifs in DNA or protein sequences. Can be used for finding sequence motifs in RNA sequences. | Software package and web server: | |
| MEMERIS | RNA sequences and predicted structures | Extension of MEME for finding RNA motifs. It uses RNA structure information as a prior to guide the motif search toward single-stranded regions. | Software package (includes scripts for structure prediction): | |
| REFINE | DNA or RNA sequences | Extension of MEME, filters out regions of target sequences that are relatively devoid of discriminatory hexamers, and then applies MEME motif-finding algorithm. | Software package: | |
| RNAalifold | RNA alignment | A method for detecting conserved RNA secondary structures in a family of related RNA sequences. | Software package: | |
| RNAcontext | RNA sequences, associated affinity measures and predicted structures | A discriminatory approach for finding RNA motifs that represent the sequence and structure preferences of RBPs. RNAcontext can model a wide range of structure features using a flexible alphabet. | Software package (includes scripts for structure prediction): | |
| RNApromo | RNA sequences | CM-based model for finding RNA motifs. | Software package and web server: |
Figure 2Target site accessibility predicts in vivo binding for a diverse range of RNA-binding proteins (RBPs). Comparison of accuracy in predicting bound transcripts based on a given consensus, using either #ATS (i.e., the expected number of accessible target sites, y-axis) or #TS (i.e., the number of target sites, x-axis). Each dot represents the results of an RBP coupled with its previously defined consensus sequence. If there are multiple reported consensus sequences for a protein, the result for each is shown and is distinguished from others by a superscript. Cartoons indicate the species of origin (yeast, fly, or human). RBPs in bold have significantly improved AUROC for #ATS versus #TS (P < 0.05, Delong-Delong-Clarke-Pearson test). The RBDs housed in the RBPs (using SMART domains) are summarized in the pie graph.
Figure 3Structural context of target sites improves prediction of target mRNAs bound in vivo by RNA-binding proteins (RBPs). Bar graphs compare the accuracy of different methods that use the structural context of motif matches to predict in vivo binding of RBPs. The inset describes the different bars within the graph.
Web Resource for Predicting mRNA Secondary Structure
| Software/Method | Input | Summary | Availability | References |
|---|---|---|---|---|
| Mfold | RNA sequence | It predicts the suboptimal structures within a free energy increment from the minimum free energy. | Software package and web server: | |
| RNAshapes | RNA sequence | It calculates shapes and their probabilities by analyzing the full ensemble, predicts the complete set of suboptimal structures and their probabilities | Software package and web server: | |
| RNAstructure | RNA sequence | It includes algorithms for RNA secondary structure prediction and calculation of base-pair probabilities. | Software package with GUI: | |
| SFOLD | RNA sequence | It computes base pair probabilities from a representative sample of the full ensemble | Software package and web server: | |
| Vienna package | RNA sequence | RNAfold: predicts MFE energy structure and base-pair probabilities | Software package: |
Figure 4Comparison of prediction accuracy for in vivo binding of nine yeast RNA-binding proteins (RBPs) using parallel analysis of RNA structure (PARS) and RNAplfold to estimate the secondary structure of bound versus unbound transcripts. The results using PARS are shown on the y-axis, those using RNAplfold on the x-axis. (a) The analysis was performed on all consensus sites containing at least one nucleotide with a nonzero PARS score. (b) The analysis was performed only considering nucleotides with nonzero PARS score. (c) As for (b) but with the additional constraint that the transcript load (i.e., reads/nucleotide) was at least five. P-values were calculated using the two-tailed sign test.
Figure 5RNAcontext-predicted motifs. The figure shows motifs and their structural contexts predicted by RNAcontext using RNAcompete binding data.9 (Reprinted with permission from Ref 53. Copyright 2010, PLoS Computational Biology Creative Commons Attribution License.)
Figure 6Three-dimensional structures of multiple RNA-binding domains (RBDs) in complex with RNA. (a) Solution structure of polypyrimidine tract binding (PTB), RBD3, and RBD4 in complex with CUCUCU RNA [Protein Data Bank (PDB): 2ADC]. RBD3 and RBD4 have different binding specificity: RBD3 binds YCUNN and RBD4 binds YCN (Y, pyrimidine; N, any nucleotide). RBD3 and RBD4 interact extensively, resulting in an antiparallel orientation of their bound RNAs, suggesting that the only way to make these two RBDs bind to a single RNA is to separate their sites by a linker sequence.137 (b) Solution structure of ADAR2 dsRBD1 and dsRBD2 in complex with GluR-2 R/G RNA (PDB: 2L3J). The dsRBDs recognize their targets by the shape and by the primary sequence in the minor groove. Sequence-specific recognition is achieved through a hydrogen bond to the amino group of G (in the GG mismatch for dsRBD1; in the GC pair for dsRBD2) via a β1-β2 loop and via a hydrophobic contact to adenine H2 (in the AU pair for dsRBD1; in the AC mismatch for dsRBD2) via helix α1. The two dsRBDs bind one face of the RNA and cover about 120° of the turn of the RNA helix.15