Literature DB >> 26210853

Database of RNA binding protein expression and disease dynamics (READ DB).

Seyedsasan Hashemikhabir1, Yaseswini Neelamraju1, Sarath Chandra Janga2.   

Abstract

RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, curated database of human RBPs. RBPs curated from different experimental studies are reported with their annotation, tissue-wide RNA and protein expression levels, evolutionary conservation, disease associations, protein-protein interactions, microRNA predictions, their known RNA recognition sequence motifs as well as predicted binding targets and associated functional themes, providing a one stop portal for understanding the expression, evolutionary trajectories and disease dynamics of RBPs in the context of post-transcriptional regulatory networks.
© The Author(s) 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26210853      PMCID: PMC4515031          DOI: 10.1093/database/bav072

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

RNA Binding Proteins (RBPs) have a primary role in the post-transcriptional regulation of genes by adding an extra level of plasticity in controlling gene expression (1). They form dynamic Ribonucleoprotein (RNP) complexes and control various stages in the metabolism of RNA. The process of binding to RNA is mediated by RNA binding domains such as the RNA Recognition Motif (RRM), K homology domain (KH) domain etc. (2). Individual RBPs contain multiple domains that can independently bind to RNA. In recent years, several methods like SELEX, CLIP, PAR-CLIP, iCLIP, RNA compete have been developed to identify RNA bound proteome which lead to an addition of novel RBPs to those previously known (3–5). Here, we present READ DB which is a unified resource of ∼1350 RBPs in humans curated from recent experimental studies. We report multiple properties of RBPs including gene summary information from NCBI, tissue-wide transcript and protein expression levels from the Human Body Map (6) and Human Proteome (7) Projects respectively, evolutionary conservation, physical high-confidence protein interactions from BioGRID (8), disease associations from MalaCards (9), miRNA target predictions for RBPs, RNA recognition sequence motifs and their predicted binding targets as well as enriched processes and pathways in the human genome, in the form of an easy to navigate and accessible resource for researchers working on specific groups of RBPs.

Materials and methods

Data collection

A non-redundant and curated list of 1344 RBPs in the human genome was constructed from various experimental sources (10) as described later (See Figure 1): Each RBP in the database can be queried for or is associated with the following levels of information as summarized in Figure 2
Figure 1.

Schematic showing the number of RBPs collected from various experimental studies.

Figure 2.

Integration of data from multiple sources in READ DB.

mRNA interactome of HeLa cell (3), mRNA bound proteome identified from photoreactive nucleoside-enhanced UV crosslinking and oligo(dT) affinity purification approach (4), Human orthologs of proteins identified in mouse embryonic stem cells to be bound to RNA (5), RBPs screened in a RNA compete study (11), RBPs reported in RBPDB (12). Schematic showing the number of RBPs collected from various experimental studies. Integration of data from multiple sources in READ DB.

(A) Summary

This section provides for each RBP, its synonyms from HGNC annotations, Ensembl accession ID and a description from the NCBI Refseq. We have also included subcellular localization from Uniprot database (13).

(B) Domain Information

We extracted domain information for every RBP from Pfam database (14) and present a visualization of its domain architecture using Prosite MyDomains (15).

(C) Evolutionary Conservation

The orthologs of human RBPs in 62 different species are extracted from Ensembl (v73) compara (16). These species come from different taxonomic groups namely—Fungi, Chordates, Reptiles, Aves, Amphibians, Mammals, Insects, Rodents and Primates. We represent the extent of conservation as a heatmap as well as a phylogenetic gene tree for each queried RBP. Phylogenetic trees are imported from Ensembl and users can also access the gene tree and alignment files for the specific RBP being queried.

(D) Protein Expression

The expression levels of all the protein isoforms corresponding to each RBP gene are extracted from the human proteome map for 24 different tissues (7).

(E) Transcript Expression

Transcripts annotated as ‘protein_coding’ in Ensembl are extracted and their expression levels across 16 different tissues are obtained from the Human Body Map (6).

(F) Disease Association

Diseases associated with each RBP along with their relevance scores are obtained from the Malacards database (9).

(G) Protein–Protein Interactions

Protein interactions for RBPs are retrieved from BioGRID database (8). We report the interacting partner of each RBP as well as their synonyms, evidence code for the reported protein interaction along with the supporting publications.

(H) MicroRNA Predictions

MicroRNAs (miRNAs) targeting the 3′UTR of each protein-coding transcript of a given RBP are predicted using Miranda (17) and TargetScan (18) algorithms. Miranda predicts microRNA targets based on a three phase method that incorporates sequence-matching to estimate the complementarity between a miRNA and a potential target gene to give a score (S). In contrast, TargetScan searches for the presence of conserved 8mer and 7mer sites that match the seed region of each miRNA. The prediction efficacy is calculated as context +score of the sites which is the sum of site-type contribution, 3′ pairing contribution, local AU contribution and position contribution. To include all potential factors that can facilitate a stronger bond between miRNA and its target RBP transcript, we included only those miRNAs that are predicted at a defined recommended threshold by both Miranda [Threshold: score(S) > 145 and free energy (ΔG) < −22 kcal/mol] and TargetScan (Threshold: context + score ≤ −0.5). Although miRNA target sites and their binding affinity may be different in different tissues, we plotted the relative precision/recall values of Miranda and Target scan results for different thresholds compared with the binding sites reported based on CLASH method (19) to see whether these thresholds would produce high quality predictions (Supplementary Figure S1a and b). Our analysis revealed that the chosen thresholds compared few other tested thresholds, are appropriate to identify relatively high precision and recall values for miRNA predictions and hence they were considered for reporting the predictions. These predictions are represented as downloadable tracks generated using UCSC genome browser in our database.

(I) Binding Motifs

RNA recognition capabilities of RBPs can be represented as sequence logos (20) and Position-specific Weight Matrices (PWMs) from the RNA compete experiments conducted by Ray et al (11) were obtained to generate the sequence logos for all the annotated motifs of an RBP. In addition, CLIP-seq data for 50 human RBPs was obtained from CLIPdb (21) and RBP binding peaks reported as significant by the authors were used to identify binding motifs using HOMER (22). This set was augmented by curating literature for cross-linking and SELEX data to identify additional RBPs with available motifs. All the identified binding motifs were used as PWMs to seqLogo (23) to generate sequence logos for motif representation. Each of the PWMs associated with an RBP were also scanned using Find Individual Motif Occurrences (FIMO) (24) from MEME suite (25), on all the annotated human gene sequences obtained from ENSEMBL (v73), to identify the potential binding target regions and corresponding genes. These predictions are made available as a browsable table for each motif along with the significance level of each motif instance. Only the top 4000 most significant motif occurrences from FIMO runs were included. Genes predicted to contain at least one occurrence of these significant motif instances were considered for functional enrichment analysis using g:Profiler (26). Enriched functional processes and pathways are also made available as a browsable table for each motif. For RBPs with multiple annotated motifs, each motif and its associated gene set as well as enriched processes are all accessible as a component so that users can scroll over to see the associated content for each motif.

Results

When searched for an RBP, the following information can be identified/visualized in addition to the description, synonyms for the RBP searched (Figure 2). Species with orthologs: This section illustrates the list of species (categorized into different taxonomic groups) in which an orthologous gene was identified to be present. Protein Expression: This section shows the list of protein isoforms (denoted by RefSeq ids) encoded by the gene and their expression levels in 24 different tissues. Additionally, the expression levels of each isoform can be visualized as barplots. Transcript Expression: For each protein-coding transcript encoded by the gene, we present the expression levels in 16 tissues. These expression levels measured as Reads Per Kilobase per Million mapped reads are also visualized as bar plots. Disease Association: For each RBP, we provide the list of disease terms associated with it and the corresponding relevance score. Protein–Protein Interactions: The physical interactions of a given RBP with other proteins is provided under this section. In addition to the gene symbol of the interactor, its corresponding synonyms, evidence code for the physical interaction which describes the experimental method used to identify the interaction and the reference to the identified interaction is also provided. miRNA predictions: The miRNAs predicted to be targeting the 3′UTRs of the protein coding transcripts of RBPs are represented as genomic tracks. Each track image shows the transcript (ENSEMBL ID) and the binding locations of different miRNAs on its 3′UTR. MicroRNA predictions associated with each transcript of an RBP where available, are represented as downloadable tracks generated using UCSC genome browser in our database. Binding Motifs: The sequence motif to which the RBP binds on its target genes is depicted as sequence logo and a list of target genes in the human genome which contain the top 4000 most significant occurrences of the motif are made available as a viewable table. Enriched processes and pathways corresponding to the gene list for each motif are also made available as a table for the end user.

Future directions

We anticipate including tissue-specific protein interactions, increasing the number of RBPs with experimentally available sequence motifs as well as providing expression of RBPs in disease contexts such as in cancer samples (27) in future releases of READ DB. We believe such a unique resource on expression and disease for RBPs would not only provide a one stop portal for understanding post-transcriptional regulatory network dynamics in general but also help experimentalists working on specific RBPs to target and prioritize tissues for furthering our understanding of diverse post-transcriptional regulatory mechanisms ranging from RNA splicing to editing. We plan to update READ DB at least once per year to incorporate newly available information and to expand our repertoire of experimental conditions and tissues. As more motif data becomes available, we will build and make available more comprehensive RBP-RNA networks in a RBP-centric manner.

Supplementary Data

Supplementary data are available at Database Online.

Funding

This work was supported by School of Informatics and Computing at Indiana University Purdue University Indianapolis (IUPUI) in the form of start-up funds for SCJ. Funding for open access charge: SCJ from IUPUI. Conflict of interest. None declared.
  26 in total

1.  Sequence logos: a new way to display consensus sequences.

Authors:  T D Schneider; R M Stephens
Journal:  Nucleic Acids Res       Date:  1990-10-25       Impact factor: 16.971

2.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.

Authors:  Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass
Journal:  Mol Cell       Date:  2010-05-28       Impact factor: 17.970

3.  Insights into RNA biology from an atlas of mammalian mRNA-binding proteins.

Authors:  Alfredo Castello; Bernd Fischer; Katrin Eichelbaum; Rastislav Horos; Benedikt M Beckmann; Claudia Strein; Norman E Davey; David T Humphreys; Thomas Preiss; Lars M Steinmetz; Jeroen Krijgsveld; Matthias W Hentze
Journal:  Cell       Date:  2012-05-31       Impact factor: 41.582

4.  FIMO: scanning for occurrences of a given motif.

Authors:  Charles E Grant; Timothy L Bailey; William Stafford Noble
Journal:  Bioinformatics       Date:  2011-02-16       Impact factor: 6.937

5.  Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs.

Authors:  David M Garcia; Daehyun Baek; Chanseok Shin; George W Bell; Andrew Grimson; David P Bartel
Journal:  Nat Struct Mol Biol       Date:  2011-09-11       Impact factor: 15.369

6.  RBPDB: a database of RNA-binding specificities.

Authors:  Kate B Cook; Hilal Kazan; Khalid Zuberi; Quaid Morris; Timothy R Hughes
Journal:  Nucleic Acids Res       Date:  2010-10-29       Impact factor: 16.971

7.  g:Profiler--a web server for functional interpretation of gene lists (2011 update).

Authors:  Jüri Reimand; Tambet Arak; Jaak Vilo
Journal:  Nucleic Acids Res       Date:  2011-06-06       Impact factor: 16.971

8.  Human MicroRNA targets.

Authors:  Bino John; Anton J Enright; Alexei Aravin; Thomas Tuschl; Chris Sander; Debora S Marks
Journal:  PLoS Biol       Date:  2004-10-05       Impact factor: 8.029

9.  MEME SUITE: tools for motif discovery and searching.

Authors:  Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

Review 10.  RNA-binding proteins and post-transcriptional gene regulation.

Authors:  Tina Glisovic; Jennifer L Bachorik; Jeongsik Yong; Gideon Dreyfuss
Journal:  FEBS Lett       Date:  2008-03-13       Impact factor: 4.124

View more
  7 in total

1.  Sequence-Based Prediction of RNA-Binding Residues in Proteins.

Authors:  Rasna R Walia; Yasser El-Manzalawy; Vasant G Honavar; Drena Dobbs
Journal:  Methods Mol Biol       Date:  2017

Review 2.  RNA-binding proteins in eye development and disease: implication of conserved RNA granule components.

Authors:  Soma Dash; Archana D Siddam; Carrie E Barnum; Sarath Chandra Janga; Salil A Lachke
Journal:  Wiley Interdiscip Rev RNA       Date:  2016-05-01       Impact factor: 9.957

Review 3.  Systems biology of lens development: A paradigm for disease gene discovery in the eye.

Authors:  Deepti Anand; Salil A Lachke
Journal:  Exp Eye Res       Date:  2016-03-16       Impact factor: 3.467

Review 4.  High throughput approaches to study RNA-protein interactions in vitro.

Authors:  Xuan Ye; Eckhard Jankowsky
Journal:  Methods       Date:  2019-09-05       Impact factor: 3.608

Review 5.  MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search.

Authors:  Noa Rappaport; Michal Twik; Inbar Plaschkes; Ron Nudel; Tsippi Iny Stein; Jacob Levitt; Moran Gershoni; C Paul Morrey; Marilyn Safran; Doron Lancet
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

6.  Mutational landscape of RNA-binding proteins in human cancers.

Authors:  Yaseswini Neelamraju; Abel Gonzalez-Perez; Poornima Bhat-Nakshatri; Harikrishna Nakshatri; Sarath Chandra Janga
Journal:  RNA Biol       Date:  2017-11-14       Impact factor: 4.652

7.  Role of SARS-CoV-2 in Altering the RNA-Binding Protein and miRNA-Directed Post-Transcriptional Regulatory Networks in Humans.

Authors:  Rajneesh Srivastava; Swapna Vidhur Daulatabad; Mansi Srivastava; Sarath Chandra Janga
Journal:  Int J Mol Sci       Date:  2020-09-25       Impact factor: 5.923

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.