| Literature DB >> 24829458 |
Inbal Paz1, Idit Kosti1, Manuel Ares2, Melissa Cline3, Yael Mandel-Gutfreund4.
Abstract
Regulation of gene expression is executed in many cases by RNA-binding proteins (RBPs) that bind to mRNAs as well as to non-coding RNAs. RBPs recognize their RNA target via specific binding sites on the RNA. Predicting the binding sites of RBPs is known to be a major challenge. We present a new webserver, RBPmap, freely accessible through the website http://rbpmap.technion.ac.il/ for accurate prediction and mapping of RBP binding sites. RBPmap has been developed specifically for mapping RBPs in human, mouse and Drosophila melanogaster genomes, though it supports other organisms too. RBPmap enables the users to select motifs from a large database of experimentally defined motifs. In addition, users can provide any motif of interest, given as either a consensus or a PSSM. The algorithm for mapping the motifs is based on a Weighted-Rank approach, which considers the clustering propensity of the binding sites and the overall tendency of regulatory regions to be conserved. In addition, RBPmap incorporates a position-specific background model, designed uniquely for different genomic regions, such as splice sites, 5' and 3' UTRs, non-coding RNA and intergenic regions. RBPmap was tested on high-throughput RNA-binding experiments and was proved to be highly accurate.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24829458 PMCID: PMC4086114 DOI: 10.1093/nar/gku406
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A pipeline summarizing RBPmap algorithm. (A) The mandatory input parameters for RBPmap run; a query sequence and a motif of interest to be mapped to the sequence. (B) A match score for the motif is calculated for each site in the query sequence, in overlapping windows of the motif size. (C) The match scores are compared to the average match score that is calculated for each motif in a background of randomly chosen regulatory regions. This step uses two different thresholds; a significant threshold for the anchor site (default P-value<0.005) and a suboptimal threshold for the secondary sites (default P-value<0.01) used to evaluate the clustering propensity. (D) A WR score is calculated for a window of 50 nts around each significant site. This score reflects the propensity of suboptimal sites to cluster around the significant site, weighted by their match score to the motif of interest. (E) To reduce false-positive predictions, the WR scores are compared to a region-specific background model that is generated independently per each motif for different genomic regions, removing non-significant results (P-value≥0.05). The figure exemplifies the procedure conducted for a query sequence spanning three different genomic regions (mid-intron, intronic region flanking a splice site and an internal exon). (F) Finally, a conservation-based filtering step is applied only to sites mapped to mid-intron/intergenic regions, filtering out sites which fall in non-conserved regions (below the average conservation level calculated for intronic regulatory regions).
Figure 2.A view of RBPmap input and output pages. (A) An example of RBPmap home page demonstrating the mandatory input parameters needed for RBPmap run. (B) By clicking the link ‘Click here to select motifs from RBPmap full list’, a sorted list of all motifs in the RBPmap database is opened and the user is prompted to select the proteins/motifs of interest. (C) An example of RBPmap output page. In the example presented the job includes more than one query sequence. The results per each sequence are shown followed by a link to a text file summarizes the binding sites predictions for all the input sequences. (D) An example of the output summary of all predicted binding sites within one query sequence in a web-based presentation. The results are provided for each of the proteins selected by the user, where all the occurrences of motifs belonging to the same protein are listed together. (E) A visualized presentation of the predicted binding sites as custom tracks in the UCSC Genome Browser.