| Literature DB >> 25504152 |
Kate B Cook, Timothy R Hughes, Quaid D Morris.
Abstract
RNA-binding proteins (RBPs) are important regulators of eukaryotic gene expression. Genomes typically encode dozens to hundreds of proteins containing RNA-binding domains, which collectively recognize diverse RNA sequences and structures. Recent advances in high-throughput methods for assaying the targets of RBPs in vitro and in vivo allow large-scale derivation of RNA-binding motifs as well as determination of RNA-protein interactions in living cells. In parallel, many computational methods have been developed to analyze and interpret these data. The interplay between RNA secondary structure and RBP binding has also been a growing theme. Integrating RNA-protein interaction data with observations of post-transcriptional regulation will enhance our understanding of the roles of these important proteins.Entities:
Keywords: RBP target identification; RNA secondary structure; RNA-binding proteins; high-throughput sequencing
Mesh:
Substances:
Year: 2014 PMID: 25504152 PMCID: PMC4303715 DOI: 10.1093/bfgp/elu047
Source DB: PubMed Journal: Brief Funct Genomics ISSN: 2041-2649 Impact factor: 4.241
Figure 1:RNA-binding domains use a variety of strategies for binding RNA. (A–C), Different arrangements of two RRM domains. (A) RRMs 1–2 of PABP1 are arranged to form a flat RNA-binding surface (PDB ID: ICVJ). (B) RRMs 1–2 of SXL form an RNA-binding cleft (1B7F). (C) RRMs 3–4 of PTB are arranged back to back (2ADC). (D–F) Examples of other RNA-binding domains. (D) KH domain 1 of PCBP2 forms an RNA-binding cleft (2PY9). (E) The Puf repeats of the FBF-2 PUM-HD form a concave RNA-binding surface (3K62). (F) The two CCCH zinc fingers of TIS11D/ZFP36L2 (1RGO). (G–I) RBPs binding to structured RNA. (G) Hairpin loop recognition by RRMs 1–2 of Nucleolin (1RKJ). (H) Bulge loop recognition by RRM 1 of U1A/SNRPA (1AUD). (I) dsRNA binding by the dsRBD domain of ADAR2 (2L2K).
Figure 2:In vitro methods for determining RBP targets. (A) SELEX consists of several rounds of binding and amplification of RNA molecules. SEQRS modifies traditional SELEX by sequencing the bound pool of RNA at each round. (B) RNAcompete queries a designed RNA pool under competitive conditions and assays the bound RNAs using a microarray. (C) RNA Bind-n-Seq assays RNA binding by incubating RNA and various amounts of protein and sequencing the bound RNAs.
Figure 3:In vivo methods for determining RBP targets. (A) RIP-chip and RIP-seq determine bound RNAs by analyzing immunoprecipitated RNPs by microarrays or high-throughput sequencing. (B) UV cross-linking and immunoprecipitation allows more stringent washing and RNase treatment of bound RNAs. iCLIP identifies binding sites more precisely by taking advantage of the fact that the amino acid tag left by proteinase K treatment terminates reverse transcription. (C) PAR-CLIP is another modification of CLIP-seq that first treats the cell with a modified nucleoside (4SU or 6SG), which is incorporated into transcribed RNA. The modified nucleotide can be cross-linked using longer wavelength UV radiation.
Motif-finding algorithms used for analyzing RBP-RNA interaction data
| Algorithm | Input | Type of motif generated | Considers secondary structure? | Reference |
|---|---|---|---|---|
| MEME | Positive (and optionally, negative) sequences | PWM | No | [ |
| PhyloGibbs | Positive (and optionally, negative) sequences | PWM | No | [ |
| REFINE | Positive sequences | N/A, Filtering procedure to only consider sequences containing three enriched hexamers; filtered sequences are then submitted to another motif finding algorithm | No | [ |
| cERMIT | Rank ordered sequences | PWM | No | [ |
| DRIMUST | Rank ordered sequences | IUPAC motif, possibly gapped | No | [ |
| StructuRED | Positive and negative sequences | PWM in a hairpin loop | Yes, considers possible hairpin loops up to 7 bases with at least 3 paired bases | [ |
| TEISER | Sequences and scores (e.g., stability scores) | PWM in a hairpin loop | Yes, considers possible hairpin loops with stems 4-7 bases long and loop sizes of 4-9 bases | [ |
| RNAcontext | Sequences and affinity scores | PWM with structural context scores | Yes, learns the preferred structural context of each base in a motif | [ |
| GraphProt | Positive and negative sequences | graph-based sequence and structure motifs, can be visualized with logos | Yes, models RNA structure using a graph-based encoding | [ |
| CMfinder | Positive sequences | structured sequence | Yes, SCFG-based, examines the most stable structures in the input | [ |
| RNApromo | Positive sequences | structured sequence | Yes, SCFG-based, optimizes a motif from an initial set of substructures generated from the input | [ |
| #ATS | Positive and negative sequences | IUPAC | Yes, scores candidate binding sites by accessibility | [ |
| MEMERIS | Positive and negative sequences | PWM | Yes, uses accessibility as prior knowledge to guide motif finding toward single-stranded regions | [ |
Databases that collect RNA-protein interactions
| Database | URL | Features | Reference |
|---|---|---|---|
| RBPDB | Direct observations of protein-RNA interactions in metazoans, both low- and high-throughput | [ | |
| CISBP-RNA | Directly observed and predicted (by homology with known proteins) motifs. Tools for scanning sequences and comparing motifs | [ | |
| starBase | RBP-RNA and miRNA-RNA interactions from CLIP data | [ | |
| doRiNa | mRNA-centric or RBP-centric search of CLIP data including combinatorial search | [ | |
| CLIPz | Storage and analysis (mapping reads, extracting clusters, mapping T→C conversions) of CLIP data | [ |