| Literature DB >> 28053162 |
Boqin Hu1, Yu-Cheng T Yang1,2, Yiming Huang1, Yumin Zhu1, Zhi John Lu3.
Abstract
We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (∼23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splicing, editing, and modification), RNA secondary structures, disease-associated variants, and gene expression and function. Moreover, POSTAR provides a friendly, multi-mode, integrated search interface, which helps users to connect multiple RBP binding sites with post-transcriptional regulatory events, phenotypes, and diseases. Based on our platform, we were able to obtain novel insights into post-transcriptional regulation, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs. In summary, POSTAR represents an early effort to systematically annotate post-transcriptional regulatory maps and explore the putative roles of RBPs in human diseases.Entities:
Mesh:
Substances:
Year: 2016 PMID: 28053162 PMCID: PMC5210617 DOI: 10.1093/nar/gkw888
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Multiple data modules in POSTAR can be used to annotate and interpret RBP binding sites at various levels. Experimentally probed and computationally predicted RBP binding sites were annotated with different genomic elements. The annotations and functions of RBPs and genes, as well as the predicted sequence motifs and structural preferences of RBPs, were provided (data module I). The RBP binding sites were annotated using extensive information at several levels, including molecular regulatory events (data module II), genomic variants (data module III), gene-function associations (data module IV), and RNA secondary structures (data module V).
Overview of data curated in POSTAR
| RBP binding sites from experiments | 1 752 329 | 1 003 984 | ||
| 39 201 | 78 922 | |||
| 7 731 846 | 96 346 | |||
| 4 598 307 | 1 013 008 | |||
| 6 703 559 | NA | eCLIP-seq peaks called by ENCODE (human: 56 RBPs)c | ||
| 439 817 | NA | PIP-seq peaks called by PMID24393486 (human: global RBPs) | ||
| RBP binding site from predictions | 25 623 567 | 18 540 386 | ||
| 19 447 967 | 24 621 203 | |||
| 16 586 127 | 11 905 150 | |||
| RBPs | 132 | 104 | Ensembl, PMID25365966 | |
| Sequence motifs | 726 | 180 | ||
| Structural preferences | 720 | 179 | ||
| Gene Ontologies | 15 677 | 13 849 | GOBP, GOMF, GOCCf | |
| Biological pathways | 186 | 105 | KEGG | |
| Gene expression | 34 cells/tissue types | 18 cell/tissue types | ||
| Alternative splicing (skip exon) | 34 cells/tissue types | 18 cell/tissue types | ||
| miRNA binding sites from experiments | 3 906 955 | 1 588 861 | ||
| miRNA binding sites from predictions | 70 516 087 | 38 336 372 | ||
| RNA modification sites | 177 049 | 91 930 | RMBase, PMID26863196 | |
| RNA editing sites | 2 583 302 | 8846 | RADAR, DARNED | |
| Splicing elements | 1 995 574 | 1 152 186 | Anno. in GENCODE human v19, mouse vM7 | |
| Conserved structural regions | 725 | 691 | EvoFam | |
| SNPs | 149 398 310 | 77 785 586 | dbSNP v146 | |
| Tissue-specific eQTL | 19 530 607 | NA | GTEx | |
| GWAS SNPs | 278 473 | NA | GWASdb2, | |
| Clinically important SNPs | 131 919 | NA | ClinVar, | |
| Cancer TCGA whole-exome SNVs | 828 119 | NA | PMID24390350, | |
| Cancer TCGA whole-genome SNVs | 4 745 891 | NA | PMID23945592, | |
| Cancer COSMIC SNVs | 2 371 219 | NA | COSMIC v76, | |
| Tissue-specific genes | 21 549 | NA | TiGER, SpeCond | |
| Gene-Disease associations | 419 906 | NA | OMIM, DisGeNET | |
| Gene-Cancer associations | 4485 | NA | ||
| Gene-Drug associations | 35 201 | NA | DGIdb 2.0 | |
| Predicted local structures | 82 242 543 | 57 095 233 |
aResults and data firstly generated by POSTAR are in bold font.
bWe provide all CLIP-seq peaks called by Piranha with P < 0.01. For CIMS, CITS and PARalyzer, we provide peaks with default significance cutoffs.
cSee Supplementary File 2 for the full list of eCLIP-seq data. The peaks were called by ENCODE.
dSee Supplementary File 5 for the RBPs and motifs used for prediction.
eSee Supplementary File 6 for the RBPs in DeepBind model.
fBP, Biological Process; MF, Molecular Function; CC, Cellular Component.
gSee Supplementary File 4 for the full list of 230 RNA-seq data sets in human and mouse.
hWe used all AGO CLIP-seq peaks called by Piranha (P < 0.01). The targeting miRNAs of the peaks were identified using miRanda with default parameters.
iWe used RNAfold to calculate the minimal free energy changes of local RNA secondary structures that are induced by the mutations.
jSee Supplementary File 3 for the full list of manually curated cancer genes.
kSee Supplementary File 7 for the experimental structural probing datasets. We predicted one local structure centered on each RBP binding site (window size: 150nt).
Figure 2.Input and output search interface of POSTAR: multiple search modes and multiple result viewers. (A) POSTAR provides six usage modes: (i) ‘POSTAR’ search, (ii) ‘RBP’ search, (iii) ‘Structure’ visualization, (iv) ‘Variation’ search, (v) ‘Functional gene’ search, and (vi) ‘Predict’ server. (B) POSTAR presents the search results in multiple ways. A table layout is the basic output format (1). In the ‘POSTAR’ search mode, the interactions between the target gene and multiple RBPs are visualized in a network (2). The expression levels of the target gene and splicing scores of skipped exons across multiple cell and tissue types are shown in a bar chart (3). Clicking on the genomic positions will direct the user to the UCSC Genome Browser, which will display any associated binding sites and regulatory events (4). In ‘Structure’ visualization mode, we provide RNA structural profiling data (5) and predicted RNA secondary structures based on these data (6). In ‘RBP’ search mode, we provide the sequence motifs (7) and structural preferences (8) of the RBP.
Figure 3.‘POSTAR’ search enables integrative viewing of multiple RBP binding sites and their potential to post-transcriptionally regulate a target gene (TP53 as an example). (A) In the PAR-CLIP Piranha data module, users may select ‘interaction network’, ‘all binding sites’, and multiple regulatory elements, including ‘miRNA binding (pred.)’, ‘RNA modification’, and ‘ClinVar SNPs’, to obtain detailed information in one page. (B) By clicking on the ‘Visualize in browser’ button (green), a user can select four RBPs among all bound RBPs to simultaneously visualize their binding sites (red tracks) and regulatory events (blue tracks) in an integrative manner via the UCSC genome browser.
Figure 4.Local structures of RBP binding sites on TP53. (A) Users can search for RBP binding sites on TP53 that are associated with disease SNPs by searching for the disease name (‘Li-Fraumeni syndrome’ as an example here) in the table on the server. (B) Predicted local secondary structure centered on a CPSF6 binding site. The local secondary structure around a Li-Fraumeni syndrome SNP (from the ClinVar database), which is a G-to-A mutation on the TP53's transcript (minus-strand), is magnified; it disrupts the base pair (G-C pair) in the hairpin's stem. Note that the mutation is a C-to-T mutation (box highlight in (A)) as annotated by ClinVar on the plus-strand. (C) Another predicted local secondary structure, centered on an ELAVL1 binding site, which contains a GWAS SNP that is an A-to-U mutation.