| Literature DB >> 30239819 |
Yumin Zhu1,2, Gang Xu1, Yucheng T Yang3, Zhiyu Xu1, Xinduo Chen1, Binbin Shi1, Daoxin Xie1, Zhi John Lu1, Pengyuan Wang2.
Abstract
Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module 'Translatome', which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein-RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30239819 PMCID: PMC6323971 DOI: 10.1093/nar/gky830
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Framework to construct POSTAR2 database. (A) POSTAR2 covers six species including human, mouse, fly, worm, Arabidopsis and yeast. (B) POSTAR2 provides three modules: (i) ‘RBP’ module, which provides annotations and functions of RBPs, as well as RBP-binding sites; (ii) ‘RNA’ module, consisting of several sub-modules including ‘Binding sites’, ‘Crosstalk’, ‘Variation’ and ‘Disease’, which annotates the RBP-binding sites using various regulatory events and genomic variants; (iii) ‘Translatome’ module, which aims for exploring the translation landscape of genes across different tissues and cell lines. (C) POSTAR2 provides a user-friendly interface for searching and visualization such as table views, network views, histograms and heatmaps.
Overview of data curated in POSTAR2
| Category | Human | Mouse | Fly | Worm | Arabidopsis | Yeast | Notes | |
|---|---|---|---|---|---|---|---|---|
| RBP-binding sites | RBP-binding sites from experiments | 3 759 076 | 1 193 757 | 97 322 | 35 652 | 31 183 | 324 641 | All CLIP-seq peaks called by Piranha |
| 75 734 | 110 876 | 1717 | 46 | 568 | 5800 | HITS-CLIP peaks called by CIMS | ||
| 15 788 784 | 226 458 | 417 150 | 29 784 | NA | 4 575 287 | PAR-CLIP peaks called by PARalyzer | ||
| 9 131 076 | 1 067 309 | 87 049 | 406 571 | 119 754 | NA | iCLIP peaks called by CITS | ||
| 2 436 040 | NA | NA | NA | NA | NA | eCLIP peaks called by ENCODE | ||
| 439 817 | NA | NA | NA | NA | NA | PIP-seq peaks called by PMID24393486 | ||
| RBP | RBPs | 171 | 39 | 5 | 5 | 2 | 62 | Ensembl, PMID25365966 |
| Sequence motifs | 1218 | 252 | 30 | 30 | 12 | 366 | MEME, HOMER | |
| Structural preferences | 1169 | 245 | 30 | 30 | 11 | 352 | RNApromo, RNAcontext | |
| Gene Ontologies | 108 787 | 41 501 | 2976 | 2145 | 1238 | 26 013 | GOBP, GOMF, GOCC | |
| RNA | Gene expression | 12 cell/tissue types | 10 cell/tissue types | 30 developmental stages | 35 developmental stages | 4 cell/tissue types | 3 conditions | GEO database Expression Atlas |
| miRNA-binding sites from experiments | 3 906 955 | 1 588 861 | NA | NA | NA | NA | AGO CLIP-seq peaks called by Piranha, the targeting miRNAs identified by miRanda | |
| miRNA-binding sites from predictions | 12 196 959 | 7 563 080 | 1 099 046 | 671 012 | 2524 | NA | miRanda, RNAhybrid, psRobot, psRNAtarget | |
| RNA modification sites | 489 629 | 495 232 | 6819 | NA | 20 331 | 71 466 | RMBase2, PMID26863196 | |
| RNA editing sites | 2 583 302 | 8846 | 5037 | 111 134 | NA | NA | RADAR, DARNED, PMID25373143 | |
| SNVs | 323 138 224 | 81 432 271 | 5 618 672 | 189 322 | 13 412 332 | 486 302 | dbSNP, PMID21079745 | |
| GWAS SNPs | 278 473 | NA | NA | NA | NA | NA | GWASdb2 | |
| Clinically important SNPs | 131 919 | NA | NA | NA | NA | NA | ClinVar | |
| Cancer TCGA whole-exome SNVs | 3 427 854 | NA | NA | NA | NA | NA | PMID29596782 | |
| Cancer TCGA whole-genome SNVs | 4 745 891 | NA | NA | NA | NA | NA | PMID23945592 | |
| Cancer COSMIC SNVs | 2 371 219 | NA | NA | NA | NA | NA | COSMIC | |
| Translatome | Condition | 17 cell/tissues types | 6 cell/tissue types | 5 stages/cell types | 3 cell types | 8 conditions | 6 conditions | GEO database |
| Annotated ORF | 65 319 | 38 686 | 30 357 | 20 108 | 26 916 | 6498 | ORFs annotated by reference | |
| Truncated ORF | 2 922 855 | 2 072 685 | 1 993 300 | 556 378 | 749 484 | 193 126 | ORFs with the same stop codon as aORF but downstream start codon | |
| Extended ORF | 102 866 | 3128 | 29 440 | 7840 | 11 673 | 0 | ORF with the same stop codon as aORF but upstream start codon | |
| Internal overlapped ORF | 2 828 307 | 1 973 410 | 1 490 433 | 704 924 | 983 723 | 193 982 | Off-frame ORFs that overlaps with aORF | |
| uORF | 413 508 | 226 589 | 273 310 | 14 460 | 48 784 | 0 | ORFs located upstream of aORF | |
| dORF | 3 266 469 | 1 921 443 | 551 813 | 51 642 | 141 319 | 0 | ORFs located downstream of aORF | |
| Unannotated ORF | 5 815 149 | 3 836 094 | 155 945 | 953 269 | 1 210 658 | 11 461 | ORFs with no annotation |
Figure 2.Statistics of POSTAR2 database. (A) Number of RBPs in the human, mouse, worm, fly, Arabidopsis and yeast. (B) The distribution of human RBP-binding sites on chromosomes. HNRNPC, HNRNPA1 and U2AF2 have the largest number of binding sites among 171 human RBPs. (C) Genomic distribution of RBP-binding sites in six species identified using Piranha. (D) Summary of CLIP-seq and Ribo-seq datasets. (E) Diagram for different ORF categories. (i) Annotated ORFs (aORFs): ORFs that are annotated by reference annotation, which are colored with black in the diagram. (ii and iii) Truncated and extended ORFs: ORFs that contain the same stop codon with aORFs, but have different translation initiation sites. (iv) Internal ORFs: ORFs that are located in or have partial overlap with aORFs. (v and vi) Upstream and downstream ORFs: ORFs that are located upstream or downstream of aORFs. (vii) Unannotated ORFs: ORFs that are defined from transcripts without any reference annotation. (F) Number of ORFs for each category across six species.
Figure 3.Integrative viewing of translation activity of a target gene (ADAM17) and its post-transcriptionally regulation events. (A) In the ‘Translatome’ module, all ORFs in ADAM17 are summarized based on their categories (i). Users can investigate each ORF by clicking on the name of the ORF (ii). For example, in ADAM17, estimation on the translation efficiency (iii) and the signal track (iv) reveals the potential of translation up-regulation in tumor samples compared to normal. (B) In the RBP module, search on ADAM17 provides the interactions network of ADAM17 gene and various RBPs (v). The number of RBPs binding along the transcript (vi) and genomic context of the binding sites (vii) can be visualized and searched. At last, the impact of SNVs in RBP-binding sites in both TCGA (viii) and COSMIC (ix) datasets further supports the association between ADAM17 and tumorigenesis.