| Literature DB >> 29800452 |
Maya Polishchuk1,2, Inbal Paz1, Zohar Yakhini3,4, Yael Mandel-Gutfreund1,4.
Abstract
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29800452 PMCID: PMC6030986 DOI: 10.1093/nar/gky453
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A visualized summary of SMARTIV methodology. (A) The input for SMARTIV is a list of sequences ranked in a descending order according to the sequence binding scores. As a first step, we employ secondary structure predictions, defining each nucleotide in the sequence as either paired or unpaired and integrate the sequence and structural information to a new eight-letter alphabet (A,G,C,U for unpaired nucleotides and a,g,c,u for paired nucleotides). (B) We extract k-mers that are significantly enriched at the top of the ranked list compared to the bottom of the list, using the mHG statistics. (C) We cluster and align the k-mers. Consequently, we build a Position Weight Matrix (PWM) for each cluster, assigning it a P-value based on its correspondence to the original ranking of the sequences, based on the experimental binding scores, using the mmHG statistics.
Figure 2.SMARTIV results for extracting the combined sequence and structure motifs for SLBP. Rank list sequences from eCLIP experiment conducted for human SLBP in K562 cell were provided an input to SMARTIV webserver. Parameters were set to k-mer range: 6–9, and folding method: MFE RNAfold. Shown are the four most significant motifs in an eight-letter alphabet. On the right is a cartoon representing the secondary structure of the known SLBP binding motifs, which was solved by X-ray crystallography in complex with the SLBP protein. As shown, all four most significant motifs predicted by SMARTIV fit exactly to the known stem–loop binding site of SLBP on the histone mRNA, at both the sequence and structural level. For illustration, SMARTIV motif is mapped to the known stem–loop structure, using SMARTIV standard color-coding.