Literature DB >> 31598693

EuRBPDB: a comprehensive resource for annotation, functional and oncological investigation of eukaryotic RNA binding proteins (RBPs).

Jian-You Liao1,2, Bing Yang1,2, Yu-Chan Zhang3, Xiao-Juan Wang1,2, Yushan Ye1,4, Jing-Wen Peng1,2, Zhi-Zhi Yang1,2, Jie-Hua He1,2, Yin Zhang1,2, KaiShun Hu1,2, De-Chen Lin5, Dong Yin1,2.   

Abstract

RNA binding proteins (RBPs) are a large protein family that plays important roles at almost all levels of gene regulation through interacting with RNAs, and contributes to numerous biological processes. However, the complete list of eukaryotic RBPs including human is still unavailable. Here, we systematically identified RBPs in 162 eukaryotic species based on both computational analysis of RNA binding domains (RBDs) and large-scale RNA binding proteomic data, and established a comprehensive eukaryotic RBP database, EuRBPDB (http://EuRBPDB.syshospital.org). We identified a total of 311 571 RBPs with RBDs (corresponding to 6368 ortholog groups) and 3,651 non-canonical RBPs without known RBDs. EuRBPDB provides detailed annotations for each RBP, including basic information and functional annotation. Moreover, we systematically investigated RBPs in the context of cancer biology based on published literatures, PPI-network and large-scale omics data. To facilitate the exploration of the clinical relevance of RBPs, we additionally designed a cancer web interface to systematically and interactively display the biological features of RBPs in various types of cancers. EuRBPDB has a user-friendly web interface with browse and search functions, as well as data downloading function. We expect that EuRBPDB will be a widely-used resource and platform for both the communities of RNA biology and cancer biology.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31598693      PMCID: PMC6943034          DOI: 10.1093/nar/gkz823

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

RNA binding proteins (RBPs) are involved in the regulation of the metabolism, transportation, translation and function of both coding and non-coding RNAs through direct RNA-protein interaction (1). RBPs ensure the smooth flowing of genetic information from DNA to RNA, and ultimately to proteins, making them essential and instrumental for all physiological and pathological processes (1). Numerous diseases have been caused by the aberrant of expression or function of RBPs, including cancer, metabolic disorders and neuropathies (2–4). Comprehensive identification and annotation of all RBPs are primary and crucial steps for characterization of their functions. To date, several RBPs databases exist for a few eukaryotes, but these databases only collected a small number of well-characterized RBPs from one or few species. For example, RBPDB is a database focusing on the collection of experimentally validated RBPs and RNA binding domains (RBDs), and it contained only 1171 RBPs from human, mouse, fly and worm (5). ATtRACT is a manually curated database that collects compiled information for only 370 well-characterized RBPs from 39 species (6). Clearly, the RBP repertoire collected by these existing databases are far from complete for any species, human included. RBPs bind to RNA via structurally well-defined RBDs, such as Dead box helicase domain, RNA recognition motif (RRM) (7,8). Here, we annotated proteins containing a RBD as canonical RBPs. Additionally, many studies have suggested the existence of complex protein-RNA interactions that do not require canonical RBDs (9,10), instead through other structures such as intrinsically disordered regions (IDRs) (11). It is thus challenging to identify non-canonical RBPs without known RBDs in a high-throughput and unbiased manner. Recent advances in RNA binding proteome (RBPome) technology significantly facilitate the large-scale identification of non-canonical RBPs (12–18), including the capture of polyadenylated RNA interactome (11,16–21), click chemistry-based capture of RNA interactome (13), and orthogonal organic phase separation (OOPS) of RBPs (14,15,19). These methods crosslink the RBPs with RNA using UV, then apply different strategies to extract total RBPs from cells or tissues. The purified total RBPs are used to analyze the RBPome based on mass spectrometry (MS). These RBPome technologies have been applied to many eukaryotes, including human (11,15,16,19–21), mouse (12) and fly (18), and identified a large number of novel canonical and non-canonical RBPs. It should be noted that as an experimental method, none of RBPome technologies is capable of capturing the complete category of RBPs, due to the limitation of total RBP purification strategy and MS technology (12–19). Moreover, most of the present RBPome studies applied stringent filtering process to control for the false positivity, which is associated with high false negativity and low sensitivity. In the rapid progression of RNA biology field (1), a great need exists to build a comprehensive eukaryotic RBP database to explore the annotation, expression and function of RBPs. To address this, we collected a full list of RBDs from both Pfam (22) and published RBPome datasets from 6 eukaryotes (human, mouse, zebrafish, yeast, fly and worm) (Supplemental Table S1). In parallel, we predicted RBPs based on RBDs using HMMER (23) from the genomes of 162 eukaryotes. Upon integration, we established currently the most comprehensive database of eukaryotic RBP, EuRBPDB (Figure 1). EuRBPDB contains a total of 315 222 RBPs, with detailed annotations for each RBP. Moreover, given the crucial role of RBP in cancer biology, in order to facilitate users to explore the clinical relevance of RBPs, we separately built a Cancer web interface to display integrated cancer-associated omics datasets. The database has a user-friendly interface to interactively exhibit and search the detailed annotations. EuRBPDB will therefore greatly promote the investigation and understanding of the RNA biology.
Figure 1.

A system-level overview of the EuRBPDB core framework. A total of 315 222 RBPs, including 311 571 canonical RBPs and 3651 non-canonical RBPs, were identified by combination of computational RBP searching with RBPome profiling. All RBPs were annotated by information retrieved from public database, like NCBI, Ensembl, STRING, KEGG and GeneCards. Cancer-relevant RBPs were identified by literature mining and systematic TCGA data analysis. All the results generated by EuRBPDB were deposited in MySQL relational databases and displayed in the web pages. All species photos were downloaded from Ensembl database (24).

A system-level overview of the EuRBPDB core framework. A total of 315 222 RBPs, including 311 571 canonical RBPs and 3651 non-canonical RBPs, were identified by combination of computational RBP searching with RBPome profiling. All RBPs were annotated by information retrieved from public database, like NCBI, Ensembl, STRING, KEGG and GeneCards. Cancer-relevant RBPs were identified by literature mining and systematic TCGA data analysis. All the results generated by EuRBPDB were deposited in MySQL relational databases and displayed in the web pages. All species photos were downloaded from Ensembl database (24).

MATERIALS AND METHODS

Identification and annotation of RBPs

All protein sequences of 162 eukaryotes were downloaded from Ensembl database (24) (release 96, http://www.ensembl.org/). Proteins were annotated as canonical RBPs if they contain one or more domains known to directly interact with RNA. The search of RBPs was based on the searching of sequence homologs of known RBDs in proteins using probabilistic models known as profile hidden Markov models [4]. The present RBD list was curated based on the comprehensive RBD list established by Gerstberger et al. (25). After careful examination, we found that eight RBDs (RRM6, KH_3, MRL1, Ribosomal_S3_N, Lactamase_B2, tRNA_synt_2b, RnaseH, tRNA_anti) have been removed by Pfam, and thus they were eliminated from our list. Finally, we obtained a total of 791 RBDs (can be downloaded from http://EuRBPDB.syshospital.org/data/download/791_RBDs.PFam.gz). We extracted RBD HMM profiles from the Protein families (Pfam) database (Pfam HMM profiles, release v32) (22), and applied the hmmsearch program in HMMER (v3.2.1) (23) package to search for all of the eukaryotic protein sequences against the RBD HMM profiles to identify RBPs. Proteins with E-value less than 0.0001 were considered as bona fide canonical RBPs. In total, we identified 311 571 canonical RBPs from 162 eukaryotic species. In parallel, we manually collected large-scale RBPome datasets of human, mouse, zebrafish, yeast, fly and worm from 21 published works (Supplementary Table S1). Human non-canonical RBPs are required to be detected in at least two RBPome datasets. For other species, the RBPs detected by any RBPome were included in EuRBPDB. As a result, we obtained 3651 non-canonical RBPs from six species. Finally, EuRBPDB collected a total of 315 222 RBPs, representing the largest eukaryotic RBP database currently available. EuRBPDB has four lines of evidence of RNA-binding for each RBP, namely (i) literatures supporting of RNA-binding capacity, (ii) RNA-binding domain, (iii) RBPome and (iv) RNA-binding sites detected by CLIP-Seq. We graded those RBPs with only one of four pieces of evidence as ‘putative’ in Description section on the Basic information subpage. The basic information, GO and phenotype annotation of RBPs were obtained from NCBI, Genecards and Ensembl databases. The protein–protein interaction (PPI) information was parsed from STRING database (26). The pathway annotation was obtained from KEGG database (27). Expression data were obtained from GTEx (28) and SRA.

Classification of eukaryotic RBP family

We characterized and classified canonical RBPs by their sequence-specific RBDs. RBP family was named as the RBD domain if its RBPs only contain one type of RBD. If a RBP contains multiple types of RBDs, it was categorized into each of the family. All non-canonical RBPs were classified as non-canonical RBP family. In total, we obtained 686 RBP families.

Orthologs and paralogs

The reciprocal best hit (RBH) method (29) was used to predict the putative orthologs of RBPs among different species. We performed the all-against-all BLASTP (v2.7.1+) search between proteins of two genomes with strict cutoffs (E-value ≤ 1e–6, coverage ≥ 50%, identity ≥ 30%) and annotated the reciprocal best hit pairs as orthologs. Paralogs was predicted by the BLAST score ratio (BSR) (30) approach. BLASTP search was conducted in each genome with the same parameters as in orthologs search. The BSR value cutoff was set to 0.4.

Differential expression, copy number variation (CNV), mutation and survival analysis of RBPs

RNA-Seq, whole-exon sequencing and clinical data were retrieved from TCGA database using R/Bioconductor package TCGAbiolinks (v2.8.4) (31). Differential expression analysis was performed using R package edgeR (v3.22.5) (32) [false discovery rate (FDR) ≤ 1e–5, log2 fold change (log2FC) > = 1]. Kaplan–Meier survival analysis was performed by R package survival (v2.43-3). Significant amplification and deletion genomic regions in cancer samples were downloaded from Broad GDAC Firehose website (https://gdac.broadinstitute.org/).

Cellular effects of drugs to RBP expression

Two L1000 assay level-5 datasets (GSE92742 and GSE70138) (33) generated by the Library of Integrated Cellular Signatures (LINCS) project were downloaded from GEO. These datasets contain over 1 600 000 subdatasets measuring the effects 30 744 drugs on the RNA profiles of 44 cell lines. L1000 assay datasets were parsed and displayed by campR (v1.0.1) and ggplot2 (v3.1.0) R packages as suggested by LINCS project. Expression of RBPs is displayed as z-score.

RNA binding sites of RBPs

A total of 227 eCLIP isogenic replicated datasets generated from K562 (120 RBPs) and HepG2 (103 RBPs) cell lines and human adrenal gland tissues (two RBPs) were retrieved from ENCODE database (https://www.encodeproject.org/). Peak and bam files of each datasets were downloaded. We used intersectBed of bedtools package (v2.27.1) (34) to annotate each peak, and used coverageBed of bedtools to retrieve the RPM value of each peak.

Literature analysis of RBP

Literature mining was conducted in geneclip3 (http://ci.smu.edu.cn/genclip3/). In brief, Entrez ids of all RBPs were submitted to geneclip3. Key words of function model of geneclip3 were set as ‘cancer or tumor’ to search for cancer-associated literatures, and ‘RNA binding or RNA-binding’ to search for literatures on RNA-binding. Geneclip3 was run in GeneRIF mode to search for cancer-associated literatures, and in MEDLINE mode to search for literatures on RNA-binding. The searching will return the PubMed IDs of all literatures that study the RBPs in cancers or RNA-binding capacity. The information of all literatures was retrieved from PubMed based on PubMed IDs. RBPs reported in 3 cancer-relevant studies were considered to be cancer-associated.

DATABASE CONTENT AND WEB INTERFACE

The web-based exploration of RBPs

EuRBPDB provides genome-wide identification of RBPs in large amount of eukaryotic species based on HMMER searching results combined with RBPome datasets analyses. In total, 315 222 RBPs, including 311 571 canonical RBPs corresponding to 6368 ortholog groups and 3651 non-canonical RBPs, were identified in 162 eukaryotic species. With the systematic annotation of these RBPs, we designed a user-friendly web interface for users to query the database conveniently and interactively. Users can either browse the entire RBP list of any 162 eukaryotes collected in database, or search for any RBP in any eukaryotes of interest. EuRBPDB provides two different ways to browse the data, one is to browse by species, the other is to browse by family defined by RBDs. On the ‘Species’ page, 162 species were classified into 12 categories according to Ensembl taxonomy. To browse the RBP list of each species, users just need to click the species image of interest, and retrieve the detailed RBP information through the following steps: families→family gene list →single gene annotation. On the ‘Family’ page, EuRBPDB lists all 686 RBP families from 162 eukaryotes. RBP families were ordered by family size in descending order. By clicking the family name, users will get all RBPs grouped by species in this family. Users can also obtain the detailed information of RBP through the following steps: species→gene list →single gene annotation. Users can search the specific RBP of interest using the quick search box at the top right corner of navigation bar in any page, the search will return all RBPs in any species matching the searching criteria. To browse the detailed information of any specific RBP, users can specify both the species and RBP name/ID in ‘Search’ page. Both search and browser functions direct users to the detailed information page of any specific RBP. This page comprises of two subpages, namely ‘Basic Information’ subpage and ‘Cancer Related Information’ subpage (only for human RBPs currently). All two subpages consist of a number of information sections constructed by data collected from other published databases. We can readily add any new sections to these subpages, and thus it is easy and convenient to update EuRBPDB regularly. In Basic information subpage, EuRBPDB provides basic information including gene structure (Gene Model section), evidences for RNA-binding (RBDs, RBPome, RPI and Literatures sections), expression (Expression section), and functional annotation (PPI, Pathway and Gene Ontology sections etc.). ‘Cancer Related Information’ subpage will be introduced in the following sections.

Cancer web interface

RBPs contribute extensively and significantly to numerous processes in cancer biology. To facilitate RBP research in cancer, EuRBPDB provides cancer associated annotation of RBPs in Cancer web interface. Through systematic literature mining using geneclip3 (http://ci.smu.edu.cn/genclip3/), we found that a total of 727 RBPs are reported to be associated with human cancers (reported by at least three literatures). Among them, 144 RBPs were frequently investigated (reported by >20 literatures). Moreover, we conducted differential expression, somatic mutation, CNV, as well as survival analysis based on TCGA data to reveal comprehensively the alterations of RBPs in human cancers. As a result, we identified 1361 RBPs showing aberrant expression in at least one cancer type, 2900 RBPs harboring nonsense and/or missense mutations (1761 of them mutated in RBD regions), 2851 RBPs having genomic deletions or amplifications, and 2897 RBPs exhibiting significant survival correlation. Mutational analysis of RBDs showed that certain cancer types such as Pheochromocytoma and Paraganglioma (PCGP) and PAAD, have higher mutational rate targeting RBD regions than others (Supplementary Figure S1A). This result is congruent with the findings that the expression and functions of RBPs have cell-type specificity (12,18). On the other hand, certain RBDs have higher mutation rates across human cancers (Supplementary Figure S1B), such as MMR_HSR1 and RRM_1 domain. Notably, mutations in RRM_1 of RBM10 have been suggested to play important role in the development and progression of lung adenocarcinomas (35–37), highlighting that our analysis is capable of identifying functional mutations in cancer-associated RBPs. These results together suggest further investigation of the functional significance of candidate RBPs and RBD in cancer biology. It is conceivable that larger number of genes mutated in a given RBP PPI network will result in higher degree of network dysregulation. A bar plot showing the number of aberrant RBP PPI network (defined as number of mutated gene >30% within the network) of each cancer is provided in Cancer interface. To facilitate the users to explore the number of mutated genes of each RBP PPI network in each cancer type, we added a bar plot under the PPI network figure in Basic information subpage of each RBP. Among RBPs with cancer-associated alterations, most of them have hitherto not been reported to be associated with any cancers, providing a valuable and novel resource for cancer researchers. EuRBPDB provides the overview of the cancer-associated RBPs in ‘Cancer’ page, as well as the list of published and novel cancer-associated RBPs deposited in EuRBPDB. By clicking the ‘Details’ link of each RBP, users can be redirected to detailed information page of RBP with Cancer Related Information subpage. There are six sections in this subpage, showing the literatures investigating selected RBP (Literatures), differential expression boxplot (Differential Expression), mutations in RBP (mutation), copy number variation (CNV), survival analysis (survival), as well as the expression changes across 44 different cell lines under the treatment of ∼2000 drugs (33).

RBPredictor web-server for the annotation of eukaryotic RBPs

A web-based tool, RBPredictor, was further developed to assist users to determine whether the protein of interest (from any eukaryote) is a putative canonical RBP. Such RBP prediction is based on the RBD sets used in this study, and we performed hmm-search program in HMMER (v3.2.1) package to determine whether the protein sequence submitted is a putative RBP (25). In ‘RBPredictor’ page, users are only required to input one or multiple protein sequences in fasta format, or submit a fasta file with protein sequences. If an input protein is identified as a putative RBP, RBPredictor will also list all potential RBDs such protein harbors.

DISCUSSION AND CONCLUSIONS

In this study, we systematically identified eukaryotic RBPs by integrating both large-scale RBPome experimental data and computational RBD identification data. We identified a total of 311 571 high-confident canonical RBPs corresponding to 6368 ortholog groups in 162 eukaryotes, and 3651 non-canonical RBPs without known RBDs in six eukaryotes (human, mouse, zebrafish, fly, worm and yeast). Currently, all non-canonical RBPs were grouped into non-canonical_RBP protein family. 311 571 canonical RBPs formed 686 protein families. Except some large RBP families, such as RRM_1 (33 193 RBPs, 597 ortholog groups), zf-met (20 101 RBPs, 589 ortholog groups), zf-C2H2 (16 879 RBPs, 507 ortholog groups), MMR_HSR1 (22 986 RBPs, 467 ortholog groups), most RBP families contain small amount of ortholog group (median: 4) (Supplementary Figure S2). 2961 RBPs were identified in human with high confidence, including 1836 canonical RBPs and 1135 non-canonical RBPs, significantly expanding the human RBP repertoire. Moreover, most human RBPs were found to have cancer-related alterations. We systematically annotated all eukaryotic RBPs in this study, and constructed the most comprehensive eukaryotic RBPs database, EuRBPDB. Through the integration of various large-scale omics data (such as CLIP-Seq, RNA-Seq and L1000 assay), EuRBPDB provides a comprehensive platform to explore the function and cancer-relevance of RBPs. Users can readily obtain basic, functional and cancer-relevant information of any RBPs of interest from EuRBPDB (Figure 2). EuRBPDB also provides a RBPredictor web-server, which enables users to easily and rapidly determine whether a eukaryote protein not included in EuRBPDB is an RBP. EuRBPDB provides a framework to systematically identify eukaryotic RBPs based on RBD searching and RBPome data.
Figure 2.

Illustration of RBP exploration in EuRBPDB: SRSF1 as an example. Through EuRBPDB searching, (A) first, users can easily obtain the detailed basic information of SRSF1, (B) then users can obtain multiple lines of RNA binding evidences of the SRSF1 including canonical RNA binding domain it contains, all RBPome datasets that detected SRSF1, and literatures that reported the binding RNA capacity of SRSF1. (C) Next, users can find the conservation status of SRSF1 through looking over the list of paralog and ortholog of SRSF1. (D) Users can further explore the function of SRSF1 through systematically investigate the protein-RNA interaction/RNA binding sites of RBP, protein-protein interaction network, pathway and gene ontology (GO) information provides by EuRBPDB. (E) Finally, users can systematically acquire the aberrant information of SRSF1 across cancers, including differential expression, mutations, copy number variations, survival correlation and literatures that reported the cancer regulatory role of SRSF1.

Illustration of RBP exploration in EuRBPDB: SRSF1 as an example. Through EuRBPDB searching, (A) first, users can easily obtain the detailed basic information of SRSF1, (B) then users can obtain multiple lines of RNA binding evidences of the SRSF1 including canonical RNA binding domain it contains, all RBPome datasets that detected SRSF1, and literatures that reported the binding RNA capacity of SRSF1. (C) Next, users can find the conservation status of SRSF1 through looking over the list of paralog and ortholog of SRSF1. (D) Users can further explore the function of SRSF1 through systematically investigate the protein-RNA interaction/RNA binding sites of RBP, protein-protein interaction network, pathway and gene ontology (GO) information provides by EuRBPDB. (E) Finally, users can systematically acquire the aberrant information of SRSF1 across cancers, including differential expression, mutations, copy number variations, survival correlation and literatures that reported the cancer regulatory role of SRSF1. Identification of RBP through RBD matching is a highly effective and accurate approach (25). However, recent RBPome studies showed that a large number of proteins without canonical RBDs also bind RNA, and many of them bind RNA through IDRs (11). Therefore, clearly it is insufficient to identify RBPs merely based on RBD searching. On the other hand, RBPome methods are likewise incapable of detecting all RBPs because of the (i) context-dependent RNA binding capacity of many RBP approach (1); (ii) restricted expression pattern of RBPs, since the RBPome were performed in only a few cell types; (iii) technical limitation of purification strategy of total RBP (14,15,19); (iv) low sensitivity of MS technology. We also find that only about half of human canonical RBPs can be detected by different RBPome methods (Supplementary Figure S3, Supplementary Table S1). Thus, presently a comprehensive way to acquire a more complete RBP repertoire is to combine the computational RBP searching with RBPome profiling. To verify the reliability of RBP dataset we generated, we have cross-checked against all current RBP databases. The results showed that EuRBPDB identified the vast majority of the RBPs (ranging from 90.1% to 100%) across different species collected by other databases (Figure 3), validating the accuracy and consistency of our work. Furthermore, we used the GO annotation to evaluate the robustness and accuracy of our human RBP list. Indeed, we found that 95.3% of canonical RBP and 73.8% of non-canonical RBP were annotated by RNA-related GO terms, such as ‘RNA-binding’, ‘RNA modification’ and ‘endoribonuclease activity’. These results together highlight that our RBP identification approach has high accuracy and robust performance.
Figure 3.

EuRBPDB contained most of RBPs deposited in other databases. 100% (157/157), 98.3% (168/171), 94.7% (36/38), 93.0% (385/414), 96.9% (154/159), 97.0% (65/67), and 90.1% (1389/1542) human RBPs from ENCODE database, POSTAR2, starBase v2.0, RBPDB, ATtRACT, SpliceAid-F and Gerstberger et al. RBP sets (25) were contained, respectively, in EuRBPDB. 92.31% (36/39), 100% (14/14), 91.7% (373/407) and 92.6% (25/27) mouse RBPs from POSTAR2, starBase v2.0, RBPDB and ATtRACt were included, respectively, in EuRBPDB. 100% (3/3), 92.2% (226/245) and 96.2% (51/53) Drosophila melanogaster RBPs from POSTAR2, RBPDB and ATtRACT were included, respectively, in EuRBPDB. 100% (5/5), 100% (2/2), 90.4% (208/230) and 100% (20/20) Caenorhabditis elegans RBPs from POSTAR2, starBase v2.0, RBPDB and ATtRACT were included, respectively, in EuRBPDB.

EuRBPDB contained most of RBPs deposited in other databases. 100% (157/157), 98.3% (168/171), 94.7% (36/38), 93.0% (385/414), 96.9% (154/159), 97.0% (65/67), and 90.1% (1389/1542) human RBPs from ENCODE database, POSTAR2, starBase v2.0, RBPDB, ATtRACT, SpliceAid-F and Gerstberger et al. RBP sets (25) were contained, respectively, in EuRBPDB. 92.31% (36/39), 100% (14/14), 91.7% (373/407) and 92.6% (25/27) mouse RBPs from POSTAR2, starBase v2.0, RBPDB and ATtRACt were included, respectively, in EuRBPDB. 100% (3/3), 92.2% (226/245) and 96.2% (51/53) Drosophila melanogaster RBPs from POSTAR2, RBPDB and ATtRACT were included, respectively, in EuRBPDB. 100% (5/5), 100% (2/2), 90.4% (208/230) and 100% (20/20) Caenorhabditis elegans RBPs from POSTAR2, starBase v2.0, RBPDB and ATtRACT were included, respectively, in EuRBPDB. Many databases have been established to aid the research of RNA biology (5,6,38). However, currently no comprehensive RBP database is available for all species. All existing RBP databases focus on the collection and integration of the structure, RBD, RBP binding sites or disease correlation of small amount of well-characterized RBPs in a limited types of eukaryotes, such as RBPDB (5), ATtRACT (6), SpliceAid-F (38), POSTAR2 (39), starBase (40) etc. Compared with these RBP databases, EuRBPDB provides the largest eukaryotic RBP repertoire (315 222 RBPs, forms 6368 ortholog groups), the most comprehensive functional and cancer-associated annotation, and an intuitive and easy-to-use web interface. Therefore, EuRBPDB provides a powerful platform to decode the RBP function and regulatory mechanisms.

FUTURE DIRECTIONS

EuRBPDB is a comprehensive eukaryotic RBP database, characterizing RBPs of 162 eukaryotic genome-wide. With the ever-increasing amount of RBPome and eukaryotic genome data, we will continue to update and maintain the RBP repertoire and annotation regularly. We will also integrate additional omics datasets (e.g. CLIP-seq, RNA-Seq) from public databases like Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) to further improve our understanding of the function and regulatory mechanism of RBPs.

DATA AVAILABILITY

EuRBPDB database is freely available at http://EuRBPDB.syshospital.org. Click here for additional data file.
  40 in total

1.  The KEGG resource for deciphering the genome.

Authors:  Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Yasushi Okuno; Masahiro Hattori
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 2.  The Spliceosome: The Ultimate RNA Chaperone and Sculptor.

Authors:  Panagiotis Papasaikas; Juan Valcárcel
Journal:  Trends Biochem Sci       Date:  2015-12-09       Impact factor: 13.807

Review 3.  RNA recognition motifs: boring? Not quite.

Authors:  Antoine Cléry; Markus Blatter; Frédéric H-T Allain
Journal:  Curr Opin Struct Biol       Date:  2008-06       Impact factor: 6.809

4.  Capturing the interactome of newly transcribed RNA.

Authors:  Xichen Bao; Xiangpeng Guo; Menghui Yin; Muqddas Tariq; Yiwei Lai; Shahzina Kanwal; Jiajian Zhou; Na Li; Yuan Lv; Carlos Pulido-Quetglas; Xiwei Wang; Lu Ji; Muhammad J Khan; Xihua Zhu; Zhiwei Luo; Changwei Shao; Do-Hwan Lim; Xiao Liu; Nan Li; Wei Wang; Minghui He; Yu-Lin Liu; Carl Ward; Tong Wang; Gong Zhang; Dongye Wang; Jianhua Yang; Yiwen Chen; Chaolin Zhang; Ralf Jauch; Yun-Gui Yang; Yangming Wang; Baoming Qin; Minna-Liisa Anko; Andrew P Hutchins; Hao Sun; Huating Wang; Xiang-Dong Fu; Biliang Zhang; Miguel A Esteban
Journal:  Nat Methods       Date:  2018-02-12       Impact factor: 28.547

5.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

6.  Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing.

Authors:  Marcin Imielinski; Alice H Berger; Peter S Hammerman; Bryan Hernandez; Trevor J Pugh; Eran Hodis; Jeonghee Cho; James Suh; Marzia Capelletti; Andrey Sivachenko; Carrie Sougnez; Daniel Auclair; Michael S Lawrence; Petar Stojanov; Kristian Cibulskis; Kyusam Choi; Luc de Waal; Tanaz Sharifnia; Angela Brooks; Heidi Greulich; Shantanu Banerji; Thomas Zander; Danila Seidel; Frauke Leenders; Sascha Ansén; Corinna Ludwig; Walburga Engel-Riedel; Erich Stoelben; Jürgen Wolf; Chandra Goparju; Kristin Thompson; Wendy Winckler; David Kwiatkowski; Bruce E Johnson; Pasi A Jänne; Vincent A Miller; William Pao; William D Travis; Harvey I Pass; Stacey B Gabriel; Eric S Lander; Roman K Thomas; Levi A Garraway; Gad Getz; Matthew Meyerson
Journal:  Cell       Date:  2012-09-14       Impact factor: 41.582

7.  The RNA-binding protein repertoire of embryonic stem cells.

Authors:  S Chul Kwon; Hyerim Yi; Katrin Eichelbaum; Sophia Föhr; Bernd Fischer; Kwon Tae You; Alfredo Castello; Jeroen Krijgsveld; Matthias W Hentze; V Narry Kim
Journal:  Nat Struct Mol Biol       Date:  2013-08-04       Impact factor: 15.369

8.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

9.  Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks.

Authors:  Endre Sebestyén; Babita Singh; Belén Miñana; Amadís Pagès; Francesca Mateo; Miguel Angel Pujana; Juan Valcárcel; Eduardo Eyras
Journal:  Genome Res       Date:  2016-04-13       Impact factor: 9.043

10.  The Pfam protein families database in 2019.

Authors:  Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  14 in total

Review 1.  Drug delivery approaches for HuR-targeted therapy for lung cancer.

Authors:  Rajeswari Raguraman; Santny Shanmugarama; Meghna Mehta; Jo Elle Peterson; Yan D Zhao; Anupama Munshi; Rajagopal Ramesh
Journal:  Adv Drug Deliv Rev       Date:  2021-11-22       Impact factor: 15.470

2.  LncSEA: a platform for long non-coding RNA related sets and enrichment analysis.

Authors:  Jiaxin Chen; Jian Zhang; Yu Gao; Yanyu Li; Chenchen Feng; Chao Song; Ziyu Ning; Xinyuan Zhou; Jianmei Zhao; Minghong Feng; Yuexin Zhang; Ling Wei; Qi Pan; Yong Jiang; Fengcui Qian; Junwei Han; Yongsan Yang; Qiuyu Wang; Chunquan Li
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

3.  Global alternative splicing landscape of skeletal muscle atrophy induced by hindlimb unloading.

Authors:  Junjie Sun; Hua Yang; Xiaoming Yang; Xin Chen; Hua Xu; Yuntian Shen; Fei Ding; Xiaosong Gu; Jianwei Zhu; Hualin Sun
Journal:  Ann Transl Med       Date:  2021-04

4.  Quaking 5 suppresses TGF-β-induced EMT and cell invasion in lung adenocarcinoma.

Authors:  Shengjie Wang; Xin Tong; Chang Li; Ersuo Jin; Zhiyue Su; Zelong Sun; Weiwei Zhang; Zhe Lei; Hong-Tao Zhang
Journal:  EMBO Rep       Date:  2021-03-26       Impact factor: 8.807

5.  YBX1 Indirectly Targets Heterochromatin-Repressed Inflammatory Response-Related Apoptosis Genes through Regulating CBX5 mRNA.

Authors:  Andreas Kloetgen; Sujitha Duggimpudi; Konstantin Schuschel; Kebria Hezaveh; Daniel Picard; Heiner Schaal; Marc Remke; Jan-Henning Klusmann; Arndt Borkhardt; Alice C McHardy; Jessica I Hoell
Journal:  Int J Mol Sci       Date:  2020-06-23       Impact factor: 5.923

Review 6.  Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners.

Authors:  Alessio Colantoni; Jakob Rupert; Andrea Vandelli; Gian Gaetano Tartaglia; Elsa Zacco
Journal:  Biochem Soc Trans       Date:  2020-08-28       Impact factor: 5.407

7.  RNA-Binding Protein Expression Alters Upon Differentiation of Human B Cells and T Cells.

Authors:  Nordin D Zandhuis; Benoit P Nicolet; Monika C Wolkers
Journal:  Front Immunol       Date:  2021-11-17       Impact factor: 7.561

8.  RBP2GO: a comprehensive pan-species database on RNA-binding proteins, their interactions and functions.

Authors:  Maiwen Caudron-Herger; Ralf E Jansen; Elsa Wassmer; Sven Diederichs
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

Review 9.  Interplay of RNA-Binding Proteins and microRNAs in Neurodegenerative Diseases.

Authors:  Chisato Kinoshita; Noriko Kubota; Koji Aoyama
Journal:  Int J Mol Sci       Date:  2021-05-18       Impact factor: 5.923

10.  Identification and validation of seven RNA binding protein genes as a prognostic signature in oral cavity squamous cell carcinoma.

Authors:  Zijing Huang; Tianjun Lan; Junjie Wang; Zhifeng Chen; Xiaolei Zhang
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.