| Literature DB >> 29931156 |
Jin Li1,2, Su-Ping Deng1,2, Jacob Vieira3, James Thomas4, Valerio Costa5, Ching-San Tseng6, Franjo Ivankovic4, Alfredo Ciccodicola5,7, Peng Yu1,2.
Abstract
RNA-binding proteins (RBPs) may play a critical role in gene regulation in various diseases or biological processes by controlling post-transcriptional events such as polyadenylation, splicing and mRNA stabilization via binding activities to RNA molecules. Owing to the importance of RBPs in gene regulation, a great number of studies have been conducted, resulting in a large amount of RNA-Seq datasets. However, these datasets usually do not have structured organization of metadata, which limits their potentially wide use. To bridge this gap, the metadata of a comprehensive set of publicly available mouse RNA-Seq datasets with perturbed RBPs were collected and integrated into a database called RBPMetaDB. This database contains 292 mouse RNA-Seq datasets for a comprehensive list of 187 RBPs. These RBPs account for only ∼10% of all known RBPs annotated in Gene Ontology, indicating that most are still unexplored using high-throughput sequencing. This negative information provides a great pool of candidate RBPs for biologists to conduct future experimental studies. In addition, we found that DNA-binding activities are significantly enriched among RBPs in RBPMetaDB, suggesting that prior studies of these DNA- and RNA-binding factors focus more on DNA-binding activities instead of RNA-binding activities. This result reveals the opportunity to efficiently reuse these data for investigation of the roles of their RNA-binding activities. A web application has also been implemented to enable easy access and wide use of RBPMetaDB. It is expected that RBPMetaDB will be a great resource for improving understanding of the biological roles of RBPs.Database URL: http://rbpmetadb.yubiolab.org.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29931156 PMCID: PMC6009576 DOI: 10.1093/database/bay054
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The rapid growth of papers related to RPBs in PubMed. Approximately 10 000 papers related to RPBs are indexed on PubMed according to the query of “RNA binding protein”[tiab] OR “RNA binding proteins”[tiab] at the time of writing. Since 2012, the number of papers published per year has been increasing more rapidly than ever before. In 2017 alone, over 1000 papers were published.
Figure 3.The number of RBPs containing a domain from a Pfam family with RNA-binding activity. Blue bars indicate the number of RBPs containing a domain from a family among all RBPs, and red bars indicate the numbers of RBPs containing a domain from a family among the RBPs with associated RNA-Seq datasets. Only families with a blue bar with ≥RBPs are shown.
Figure 2.Statistics of curated RNA-Seq datasets for RBPs. (a) The distribution of perturbation types: knock-out (KO), knock-down (KD), overexpression (OE), knock-in (KI) and other (e.g. point mutations of RBPs or treatment with inhibitors of RBPs) among all the curated datasets. The percentages are shown between parentheses. Knock-out experiments are the most common. (b) The curated datasets are generated from research labs worldwide. The US is the dominant country with a contribution of 60.1% of all the datasets. (c) The number of associated publications for the datasets increased from 2010 to 2017. The slow-down of increase in 2016 and the drop in 2017 are likely due to the missing PMIDs annotation for a subset of the recently released datasets on GEO.
Figure 4.Web interface of RBPMetaDB. The RBPMetaDB website presents information about the mouse RNA-Seq datasets with perturbed RBPs. Label A refers to the maximum number of entries shown on a page. Label B is about the relevant information for each RNA-Seq dataset including GEO accession numbers, titles of the datasets in GEO, number of samples, official gene symbols from Mouse Genome Informatics (MGI), perturbation types of the RBPs associated with a dataset, and PMIDs of the related papers. Label C refers to the field specific search boxes.
Figure 5.A use case of RBPMetaDB for the mouse RPB METTL3. (a) Here is a use case of RPB METTL3 to demonstrate the advantage of RBPMetaDB over GEO. By using the keyword ‘Mettl3’, RBPMetaDB accurately returns six mouse RNA-Seq datasets with Mettl3 perturbed. (b) However, GEO returns 35 mouse RNA-Seq datasets without identifying which datasets are from experiments with Mettl3 perturbed.