Literature DB >> 34551440

RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation.

Tianyu Cui1, Yiying Dou1, Puwen Tan1, Zhen Ni2, Tianyuan Liu1, DuoLin Wang3, Yan Huang4, Kaican Cai2, Xiaoyang Zhao5, Dong Xu3, Hao Lin6, Dong Wang1,7.   

Abstract

Resolving the spatial distribution of the transcriptome at a subcellular level can increase our understanding of biology and diseases. To facilitate studies of biological functions and molecular mechanisms in the transcriptome, we updated RNALocate, a resource for RNA subcellular localization analysis that is freely accessible at http://www.rnalocate.org/ or http://www.rna-society.org/rnalocate/. Compared to RNALocate v1.0, the new features in version 2.0 include (i) expansion of the data sources and the coverage of species; (ii) incorporation and integration of RNA-seq datasets containing information about subcellular localization; (iii) addition and reorganization of RNA information (RNA subcellular localization conditions and descriptive figures for method, RNA homology information, RNA interaction and ncRNA disease information) and (iv) three additional prediction tools: DM3Loc, iLoc-lncRNA and iLoc-mRNA. Overall, RNALocate v2.0 provides a comprehensive RNA subcellular localization resource for researchers to deconvolute the highly complex architecture of the cell.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34551440      PMCID: PMC8728251          DOI: 10.1093/nar/gkab825

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The subcellular localization of RNA plays an important role in cell growth and development, cell differentiation and inflammation, cell signal transduction and transcriptional regulation (1,2). At the cellular level, where an RNA is located likely determines whether it will be stored, processed, translated or degraded (3–5). Although the importance of RNA subcellular localization has been widely recognized, the related bioinformatics resources are limited compared to those available for protein localization. For example, a subcellular map of the human proteome, Human Protein Atlas (HPA), records detailed information about protein subcellular localization (6). Moreover, many resources also provide information about the subcellular localization of proteins, such as UniProt, PSORTdb, SubCellBarCode, MiCroKiTS 4.0 and SUBA4 (7–12). Corresponding protein subcellular localization technology and prediction tools include FLIRT, SUbCons, BUSCA and DeepMito (13–18). There are already some databases for collecting the information of RNA subcellular localization at the transcriptome-wide level. For example, lncSLdb (19) stores subcellular localization of long noncoding RNAs (lncRNAs) from literature mining, LncATLAS (20) collects subcellular localization of lncRNAs from RNA-seq data, and EVmiRNA (21) involves the information of microRNAs (miRNAs) in extracellular vesicles. Several computational prediction tools, including DM3Loc (22), mRNALoc (23), lncLocator (24) and iLoc-lncRNA (25), were developed based on the first version of RNALocate(26). Of note, many experimental techniques for detecting RNA subcellular localizations have been developed in recent years, including APEX-Seq (27), proximity RNA-seq (28), MERFISH (29) and subRNA-seq (30), together with extensive new data. In view of these, this is the right time to update our database to RNALocate v2.0 (http://www.rnalocate.org/ or http://www.rna-society.org/rnalocate/), that is the collection of RNA subcellular localization data from literatures, other databases and RNA-seq datasets. RNALocate v2.0 is a repository of integrated experimentally validated information on subcellular localization of RNA through manual curation of the literature and five other resources, along with analyses of 35 datasets from the Gene Expression Omnibus (GEO) (31) under a common framework (Figure 1). It also supports three RNA subcellular localization prediction tools: DM3Loc, iLoc-lncRNA and iLoc-mRNA (32). In total, RNALocate v2.0 integrates more than 213 000 RNA subcellular localization entries at 171 locations across 104 species. This resource will provide a valuable resource for better understanding the subcellular localization of the transcriptome.
Figure 1.

Overview of the RNALocate v2.0 database.

Overview of the RNALocate v2.0 database.

MATERIALS AND METHODS

Data collection

RNALocate v2.0 integrates RNA subcellular localization data from the literature, five databases and 35 RNA-seq datasets. Publications in PubMed (mainly from 2016 to 2021) were screened with the following keyword combinations: (localization name) AND (RNA molecule). ‘Localization name’ represents the subcellular localization name, and ‘RNA molecule’ represents RNA symbols or RNA category names. Finally, we reviewed over 35 000 published studies that included 38 508 RNA subcellular localization entries. The other 174 752 entries were integrated from five other databases, including CSCD, EVmiRNA, exoRBase, PomBase and TAIR (21,33–36). RNA-seq datasets from GEO were screened with the following criteria: species (Homo sapiens or Mus musculus), sequencing type (bulk RNA-seq or small RNA-seq), condition (delete unknown) and replicate (≥2) and publication date (after 2016). RNALocate v2.0 adds over 200 new samples from 35 datasets of RNA-seq data with subcellular localization information. To facilitate elucidating the role of RNA localization at the subcellular level, more annotation information was collected, including RNA subcellular localization conditions, methods and corresponding figures from the literature, RNA homology information from NCBI Gene (37), RNA interactions from RNAInter (38), and RNA-related diseases from MNDR v3.0 (39). Simultaneously, the transcript sequences from Refseq (40) and miRBase (41) were also included. For RNA-seq datasets, GEO accession, localization, sample condition, and PubMed ID were provided. In addition, the RNA expression and Gene Ontology (GO) enrichment results of the top 50 RNAs for each sample were also incorporated (42).

Data processing

Integrating multisource data requires unifying them into common reference databases to annotate various RNAs. Major types of RNA symbols were used: (i) miRNA symbols from the miRBase database, (ii) messenger RNA (mRNA), small nucleolar RNA (snoRNA), small nuclear RNA (snRNA) and lncRNA symbols from the NCBI Gene database, (iii) ribosomal RNA (rRNA) and piwi interacting RNA (piRNA) symbols from the RNAcentral (43) database and (iv) circular RNA (circRNA) symbols from the circBase (44) and exoRBase databases. Then, we reconstructed a hierarchical structure for all of the localizations according to the cellular component annotation curated in Gene Ontology. Additionally, miRBase accession, NCBI Gene ID, Ensembl Gene stable ID, RNAcentral identifier, circBase ID, exoRBase ID and their external links were also provided, which can help to efficiently retrieve a substantial amount of RNA-associated information from external resources. For the convenience of users, the RNA-associated information also contains RNA names from the literature, aliases, and sequences, among others. In particular, we screened and processed 203 samples from 35 RNA-seq datasets that had labels of the subcellular locations in 26 conditions and 13 cell lines. All datasets contained 15 subcellular locations (Figure 2B, Supplementary Table S1). The raw data were downloaded and processed by the NCBI SRA Toolkit v2.10.5 for format conversion, and then adaptor contaminants and low-quality bases were removed using Trimmomatic v0.39 (45). The processed clean reads were aligned to the human and mouse reference genomes (GRCh38 and GRCm38 from GENCODE) with gene annotations (Release 34 and M25 from GENCODE), and the gene expression of each sample was estimated using HISAT2 v2.1.0, SAMtools v1.4 and featureCounts v2.0.1 (46,47). The RNA expression levels were normalized by transcripts per million (TPM). All the data consisted of two independent biological replicates per sample (except for samples from APEX-seq, which have at least two replicates). In order to further analysis, we standardized the genes in each dataset similar to the approach of LncATLAS. Genes with TPM >0 in all replicates of at least one sample were retained (gene expressed in some replicates but not expressed in others were excluded). And removed genes with a greater than twofold difference between replicates.
Figure 2.

Summary of the RNA-seq datasets. (A) The workflow of RNA-seq datasets (left). Top 20 gene expression levels and top 50 gene functional annotations for each sample (right). (B) The proportion of samples and the number of datasets in 15 subcellular localizations.

Summary of the RNA-seq datasets. (A) The workflow of RNA-seq datasets (left). Top 20 gene expression levels and top 50 gene functional annotations for each sample (right). (B) The proportion of samples and the number of datasets in 15 subcellular localizations.

RESULTS

New data and annotations

To improve the accuracy of our database, we carefully calibrated all of the data in the first release of the database and deleted 6739 entries that represent protein subcellular localizations and unclear localizations. In addition, we merged 2897 entries in which all of the information was the same except for cell lines or tissue types. In summary, RNALocate v2.0 contains 213 260 experimentally validated RNA subcellular localization entries, including 38 508 manually curated entries from the literature and 174 752 entries from databases. These entries involved 112,304 nonredundant RNAs and 16 newly added RNA types (such as circRNA, lincRNA, mtRNA, scRNA, scaRNA and Y RNA). The 129 new subcellular locations (such as chromatin, insoluble cytoplasm, mitochondrial cloud, and plasma membrane) were also added. The distribution of the subcellular localizations among different RNA types is shown in Figure 3A and Supplementary Table S2. The number of species in RNALocate v2.0 increased from 65 to 104 compared with the first version. The species cover seven categories: apicomplexa, euglenozoa, fungi, metozoa, rhodophyta, viridiplantae, and viruses. The top three species are Homo sapiens, Mus musculus, and Saccharomyces cerevisiae, as shown in Figure 3B. Other model species, such as Drosophila melanogaster, Rattus norvegicus and zebrafish (Danio rerio), have also been documented in RNALocate v2.0. Of note, some RNA subcellular localizations that only occur under certain conditions are also recorded in our database.
Figure 3.

Statistics on RNALocate v2.0. (A) The distribution of 25 RNA categories in 171 subcellular localizations. (B) Number of entries in the top 10 species.

Statistics on RNALocate v2.0. (A) The distribution of 25 RNA categories in 171 subcellular localizations. (B) Number of entries in the top 10 species.

Features and utilities of RNALocate v2.0

RNALocate v2.0 provides a user-friendly platform for searching, browsing and profiling RNA subcellular localization data. To improve its search capability, RNALocate v2.0 provides search function for data from literature and RNA-seq dataset, respectively. For search from literature page, it enables an optimized query with a new function of fuzzy and batch search. ‘Fuzzy Search’ can help users search entries using nonstandardized RNA names and subcellular localization. Meanwhile, ‘batch search’ supports queries by a list of official symbols/IDs or a file upload to obtain associated entries. Apart from basic annotations, such as RNA information, localization information, other subcellular localizations and ncRNA disease information, we modified the corresponding homology and interaction data in detail. The ‘RNA-RNA interaction’ presents only when both RNAs have subcellular localization information. Similarly, ‘homology information’ shows homologous RNA-associated entries instead of homologous genes. All of the information links to their corresponding databases. In addition, we added the method of RNA subcellular localization from literatures, and also included the corresponding conditions and figures. To illustrate the different subcellular localizations of each RNA from RNA-seq datasets, the detail page of ‘Search From RNA-seq Dataset’ shows the basic information and subcellular localization in each RNA-seq dataset. Basic information included gene symbol, ensemble gene stable id, genome location and gene type. The latter included: (i) Subcellular localization (chromatin, cytoplasm and nucleoplasm) in different conditions (only in Mus musculus); (ii) Different subcellular localizations in individual cell type; (iii) Subcellular localizations revealed by APEX-Seq; (iv) Single subcellular localization in different conditions and (v) Subcellular localizations of cytoplasm and nucleus in different cell lines (Supplementary Figure S1). The search result page of literature and the detail page of RNA-seq dataset can be switched to each other. ‘Browse By RNA-seq dataset’ shows the sample information, gene expression and gene GO enrichment analysis for each dataset on the ‘Browse’ page. GEO accessions, locations, sampling conditions and other information were included in detail page. And also provided a histogram of the top 20 RNA expressions and the result of the top 50 RNA functional annotations for each sample (Figure 2A). In addition, all of the RNA expression in each dataset can be downloaded. In response to the diverse needs of users, RNALocate v2.0 incorporates three prediction tools: DM3Loc, iLoc-lncRNA and iLoc-mRNA (all prediction tools were trained on RNALocate v1.0). They are used to predict the subcellular localizations of lncRNAs (iLoc-lncRNA) or mRNAs (DM3Loc and iLoc-mRNA).

CONCLUSION AND FUTURE PERSPECTIVES

Here, we present a resource of RNA subcellular localization information, RNALocate v2.0, generated by information obtained from the literature, databases and RNA-seq datasets. It contains more than 213 000 RNA subcellular localization entries, guiding and helping researchers perform further studies. RNALocate v2.0 integrates RNA-seq data with subcellular localization to quantify the expression of RNAs at the subcellular level. In addition, RNALocate v2.0 also incorporates three prediction tools for the various needs of users. The biological functions of RNAs are usually influenced by their localizations. The fact that RNAs are located at multiple subcellular localizations also increases the complexity of the cell. The analysis of the protein-protein interaction network at the subcellular level has been confirmed to have a unique effect different from the cellular level, and corresponding methods have also emerged, such as CellWhere and ComPPI (48–50). Because of this, we expect that continuing to expand and improve RNALocate v2.0 can also help explore the RNA-RNA interaction network at the subcellular level in the future. Thus, RNALocate is the most comprehensive map of the subcellular localization of the transcriptome and it can satisfy different requirements. Click here for additional data file.
  50 in total

1.  The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier.

Authors:  Zhen Cao; Xiaoyong Pan; Yang Yang; Yan Huang; Hong-Bin Shen
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

2.  The Arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome.

Authors:  Tanya Z Berardini; Leonore Reiser; Donghui Li; Yarik Mezheritsky; Robert Muller; Emily Strait; Eva Huala
Journal:  Genesis       Date:  2015-08-04       Impact factor: 2.487

3.  SnapShot: Subcellular mRNA Localization.

Authors:  Mohammad Mofatteh; Simon L Bullock
Journal:  Cell       Date:  2017-03-23       Impact factor: 41.582

4.  NCBI GEO: archive for functional genomics data sets--update.

Authors:  Tanya Barrett; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Michelle Holko; Andrey Yefanov; Hyeseung Lee; Naigong Zhang; Cynthia L Robertson; Nadezhda Serova; Sean Davis; Alexandra Soboleva
Journal:  Nucleic Acids Res       Date:  2012-11-27       Impact factor: 16.971

5.  Alternative 3' UTRs act as scaffolds to regulate membrane protein localization.

Authors:  Binyamin D Berkovits; Christine Mayr
Journal:  Nature       Date:  2015-04-20       Impact factor: 49.962

6.  miRBase: from microRNA sequences to function.

Authors:  Ana Kozomara; Maria Birgaoanu; Sam Griffiths-Jones
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

7.  DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks.

Authors:  Castrense Savojardo; Niccolò Bruciaferri; Giacomo Tartari; Pier Luigi Martelli; Rita Casadio
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

9.  RefSeq: an update on prokaryotic genome annotation and curation.

Authors:  Daniel H Haft; Michael DiCuccio; Azat Badretdin; Vyacheslav Brover; Vyacheslav Chetvernin; Kathleen O'Neill; Wenjun Li; Farideh Chitsaz; Myra K Derbyshire; Noreen R Gonzales; Marc Gwadz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Chanjuan Zheng; Françoise Thibaud-Nissen; Lewis Y Geer; Aron Marchler-Bauer; Kim D Pruitt
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  RNAInter in 2020: RNA interactome repository with increased coverage and annotation.

Authors:  Yunqing Lin; Tianyuan Liu; Tianyu Cui; Zhao Wang; Yuncong Zhang; Puwen Tan; Yan Huang; Jia Yu; Dong Wang
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

View more
  8 in total

1.  RNAInter v4.0: RNA interactome repository with redefined confidence scoring system and improved accessibility.

Authors:  Juanjuan Kang; Qiang Tang; Jun He; Le Li; Nianling Yang; Shuiyan Yu; Mengyao Wang; Yuchen Zhang; Jiahao Lin; Tianyu Cui; Yongfei Hu; Puwen Tan; Jun Cheng; Hailong Zheng; Dong Wang; Xi Su; Wei Chen; Yan Huang
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

2.  NEAT1 variant 1 weakens the genome-wide effect of miR-3122 on blocking H3K79me3 in bladder cancer.

Authors:  Wenchao Zhao; Fanghao Sun; Liansheng Zhang; Jun Ouyang
Journal:  Aging (Albany NY)       Date:  2022-06-10       Impact factor: 5.955

3.  DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.

Authors:  Omar Barukab; Yaser Daanial Khan; Sher Afzal Khan; Kuo-Chen Chou
Journal:  Appl Bionics Biomech       Date:  2022-04-13       Impact factor: 1.664

Review 4.  Spatially resolved transcriptomics provide a new method for cancer research.

Authors:  Bowen Zheng; Lin Fang
Journal:  J Exp Clin Cancer Res       Date:  2022-05-19

5.  ViRBase v3.0: a virus and host ncRNA-associated interaction repository with increased coverage and annotation.

Authors:  Jun Cheng; Yunqing Lin; Linfu Xu; Kechen Chen; Qi Li; Kaixin Xu; Lin Ning; Juanjuan Kang; Tianyu Cui; Yan Huang; Xiaoyang Zhao; Dong Wang; Yanhui Li; Xi Su; Bin Yang
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

6.  The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

7.  RNAPhaSep: a resource of RNAs undergoing phase separation.

Authors:  Haibo Zhu; Hao Fu; Tianyu Cui; Lin Ning; Huaguo Shao; Yehan Guo; Yanting Ke; Jiayi Zheng; Hongyan Lin; Xin Wu; Guanghao Liu; Jun He; Xin Han; Wenlin Li; Xiaoyang Zhao; Huasong Lu; Dong Wang; Kongfa Hu; Xiaopei Shen
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

8.  Characterizing the molecular heterogeneity of clear cell renal cell carcinoma subgroups classified by miRNA expression profile.

Authors:  Tao Shen; Yingdong Song; Xiangting Wang; Haiyang Wang
Journal:  Front Mol Biosci       Date:  2022-08-26
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.