Literature DB >> 26464443

RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data.

Wen-Ju Sun1, Jun-Hao Li1, Shun Liu1, Jie Wu1, Hui Zhou1, Liang-Hu Qu2, Jian-Hua Yang3.   

Abstract

Although more than 100 different types of RNA modifications have been characterized across all living organisms, surprisingly little is known about the modified positions and their functions. Recently, various high-throughput modification sequencing methods have been developed to identify diverse post-transcriptional modifications of RNA molecules. In this study, we developed a novel resource, RMBase (RNA Modification Base, http://mirlab.sysu.edu.cn/rmbase/), to decode the genome-wide landscape of RNA modifications identified from high-throughput modification data generated by 18 independent studies. The current release of RMBase includes ∼ 9500 pseudouridine (Ψ) modifications generated from Pseudo-seq and CeU-seq sequencing data, ∼ 1000 5-methylcytosines (m(5)C) predicted from Aza-IP data, ∼ 124 200 N6-Methyladenosine (m(6)A) modifications discovered from m(6)A-seq and ∼ 1210 2'-O-methylations (2'-O-Me) identified from RiboMeth-seq data and public resources. Moreover, RMBase provides a comprehensive listing of other experimentally supported types of RNA modifications by integrating various resources. It provides web interfaces to show thousands of relationships between RNA modification sites and microRNA target sites. It can also be used to illustrate the disease-related SNPs residing in the modification sites/regions. RMBase provides a genome browser and a web-based modTool to query, annotate and visualize various RNA modifications. This database will help expand our understanding of potential functions of RNA modifications.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26464443      PMCID: PMC4702777          DOI: 10.1093/nar/gkv1036

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Post-transcriptional modification of RNA molecules occurs in all living organisms, and is one of the most evolutionarily conserved properties of RNAs (1–5). It can affect the activity, localization as well as stability of RNAs, and has been linked with human diseases (1–5). Although more than 100 types of RNA modifications have been described so far, most of them were thought to be abundant in tRNAs, rRNAs and snRNAs, but rare in mRNAs and in regulatory non-coding RNAs (ncRNAs). To determine the transcriptome-wide landscape of RNA modifications, recently many studies have developed high-throughput modification sequencing methods to identify diverse post-transcriptional modifications of RNA molecules (1–5). Application of these methods has identified various modifications (e.g. pseudouridine, m6A, m5C, 2′-O-Me) within coding and non-coding sequences at single nucleotide or very high resolution (6–17). With the increasing amount of modification sequencing data available, there is a great need to integrate these large-scale data sets to explore the prevalence, mechanism and function of various modifications. Many novel functional roles of RNA modifications have been revealed by functional experiments in recent years. For example, m6A has been predicted to affect protein translation and localization (1–5) or mRNA stability (18) and stem cell pluripotency (19,20). Pseudouridylation of nonsense codons suppresses translation termination both in vitro and in vivo, suggesting that RNA modification may provide a new way to expand the genetic code (21). Importantly, many modification enzymes are dysregulated and genetically mutated in many disease types (1). For example, genetic mutations in pseudouridine synthases cause mitochondrial myopathy, sideroblastic anemia (MLASA) (22) and dyskeratosis congenital (23). However, the relationships between genetic variants identified from genome-wide association studies (GWAS) and modification sites identified by above-mentioned various high-throughput methods were yet unexplored. In this study, we developed RMBase to facilitate the annotation, visualization, analysis and discovery of RNA modification sites from large-scale modification sequencing data (Figure 1). In RMBase, we performed a large-scale integration of public RNA modification sites generated by high-throughput sequencing technology, and provided the RNA epigenetic map for various cell types that are presently available (Table 1). RMBase provides web interfaces to show the relationships between miRNA targets and RNA modifications. Furthermore, by integrating GWAS data into database, RMBase can be used to illustrate the clinically relevant RNA modification sites. As the integration of more than 100 types of RNA modifications, it is expected to help the researchers to investigate the potential functions and mechanisms of RNA modifications.
Figure 1.

System overview of RMBase core framework. We integrated a large set of RNA modification sites generated by 18 independent studies to profile the comprehensive genome-wide modification landscape of more than 100 types of RNA modifications. Integrative analysis of RNA modification sites has shown extensive post-transcriptional modification of RNA. Our combined analysis of RNA modification data with GWAS and miRNA target data identified thousands of miRNA targets and disease-related SNPs resided in the modification sites. High-throughput modification sequencing data were mapped to genomes and displayed in genome browser. All results generated by RBMBase are deposited in MySQL relational databases and displayed in the visual browser and web page.

Table 1.

The data statistics in RMBase

speciesΨm5Cm6A2′-O-MeOther types
Human412868094 895901617
Mouse32479728 00266497
Yeast212221113062422014

Statistics indicating the numbers of each modification type for the three organisms, including human, mouse, yeast. Ψ is pseudouridine modification, m5C is 5-methylcytosine methylation, m6A is N6-Methyladenosine methylation and 2′-O-Me is 2′-O-methylation, rare modification types are integrated into as ‘other types’.

System overview of RMBase core framework. We integrated a large set of RNA modification sites generated by 18 independent studies to profile the comprehensive genome-wide modification landscape of more than 100 types of RNA modifications. Integrative analysis of RNA modification sites has shown extensive post-transcriptional modification of RNA. Our combined analysis of RNA modification data with GWAS and miRNA target data identified thousands of miRNA targets and disease-related SNPs resided in the modification sites. High-throughput modification sequencing data were mapped to genomes and displayed in genome browser. All results generated by RBMBase are deposited in MySQL relational databases and displayed in the visual browser and web page. Statistics indicating the numbers of each modification type for the three organisms, including human, mouse, yeast. Ψ is pseudouridine modification, m5C is 5-methylcytosine methylation, m6A is N6-Methyladenosine methylation and 2′-O-Me is 2′-O-methylation, rare modification types are integrated into as ‘other types’.

MATERIALS AND METHODS

Integration of public high-throughput modification sequencing data sets

High-throughput Pseudo-seq, CeU-seq, Aza-IP, m6A-seq, MeRIP-Seq and RiboMeth-seq data were retrieved from the Gene Expression Omnibus (GEO) and the supplementary data of the original references (6–17). Barcodes or 3′-adapters of raw modification sequencing data were clipped using the FASTX-toolkit software (version 0.0.13). All unique reads without adapters in each sample were mapped to genomes using Bowtie 1.1.2 (24).The mapping reads were converted into BAM format and displayed in genome browser. Known modifications for rRNAs, snRNAs and tRNAs were extracted from snoRNABase (25), MODOMICS (26), Yeast snoRNA Database (27) as well as other literature sources (28–30), and then were mapped to genome using Bowtie program (24) to determine the genomic coordinates and construct the genome-wide landscape of RNA modifications.

Annotation of modification sites

All gene annotations were downloaded from UCSC bioinformatics websites (31) and Ensembl (32). Human (UCSC hg19), mouse (UCSC mm10, NCBI Build 38), Yeast (sacCer3) genome sequences were also downloaded from UCSC bioinformatics websites (31). All modification sites were annotated using above-mentioned annotation data sets. Modification sites were classified into the following gene types: tRNAs, rRNAs, Mt-tRNAs, scRNAs, snRNAs, snoRNAs, miRNAs, lincRNAs, misc_RNAs, protein-coding genes, processed_transcripts, pseudogenes, etc. and genomic regions which include CDS, 3′-UTR, 5′-UTR, intron, exon and intergenic.

Identification and annotation of m6A modification sites

To obtain high resolution m6A modification sites, we predicted exact m6A posi­tions from MeRIP-Seq or m6A-seq peaks by searching for consensus DRACH (where D denotes A, G or U, R denotes A or G and H denotes A, C or U) motifs as described by previous study (17,33). All these exact m6A positions were annotated as the above-mentioned descriptions.

Identification of disease-related SNPs in modification sites

As described in our previous study (34), disease/phenotype associated SNPs were curated from published GWAS data provided by the NHGRI GWAS Catalog (35), Johnson and O'Donnell (36), dbGAP (37) and GAD (38). Additional SNPs in linkage disequilibrium (LD) with reported disease-related loci were selected with the criteria requiring an r2 value over 0.5 in at least one of the four populations (CEU, CHB, JPT and YRI) genotype data of the HapMap project (release 28) (39). For each SNP, rs ID was lifted to dbSNP bulid 141 based on the ‘RsMergeArch.bcp’ and ‘SNPHistory.bcp’ table from dbSNP, and genomic coordinates were lifted to the hg19 assembly using the UCSC LiftOver tool. All these disease-related SNPs or LD SNPs were intersected with the modification regions, extended by an additional 10 nt in both the 5′- and 3′-directions for each modification site. Modification regions were defined according to the binding length of modification synthases (1,4), such as Fibrillarin (FBL, the methyltransferase) bind to complementary regions with at least 10 nt (40).

Association analysis of miRNA targets with RNA modification sites

All miRNA-target interactions for human and mouse were downloaded from our starBase platform (41,42). All miRNA target sites were intersected with RNA modification sites to identify modifications that may influence miRNA-target interaction.

Database and web interface implementation

All data sets were processed and stored in a MySQL Database Management System. The database query and user interface were developed using PHP and JavaScript. The query result table is based on jQueryUI and DataTables, which is a highly flexible tool for sorting and filtering the search result.

RMBase genome browser

We constructed RMBase Genome Browser to provide an integrated view of reference sequences, modification sequencing data, aligned sequencing reads, RNA modification sites, protein-coding genes, ncRNA genes and transcripts. RMBase Browser is built on JBrowse (43) which is a fast, smooth scrolling and zooming genome browser.

DATABASE CONTENT AND WEB INTERFACE

The genome-wide landscape of various RNA modification types

We integrated 139 025 RNA modification sites generated by 18 independent studies to profile the genome-wide modification landscape of more than 100 types of RNA modifications (Table 1). To provide more useful information, we generated extensive annotations and analyses for all RNA modification sites. Therefore, RMBase can be used to show the modification sites of distinct modification types varied from several to thousands, and the genomic context distributions of modification sites for different types distinguished from each other.

Annotating the association between RNA modifications and miRNA target sites

To help users investigate the association between RNA modifications and miRNA target sites, we collected all CLIP-Seq experimentally supported miRNA target sites from starBase database (41) and associated these data with all RNA modification sites from RMBase. RMBase allows users to retrieve all the RNA modification sites located within miRNA binding sites reported so far.

Predicting GWAS-associated modification sites

Although GWAS have revealed a significant number of genetic variants related to diseases or phenotypes, a considerable portion of these identified loci remain not been functionally explained to date (44). To help users explore whether some modifications may be the real causation for diseases or phenotypes, we collected a total of 87 677 unique disease-related SNPs from four public GWAS data source. In addition, we also performed LD analysis to extract SNPs that had high LD relationship with disease-related SNPs using a threshold of r2 > 0.5 in at least one population from the HapMap CEU, CHB, JPT and YRI genotype data, which yielded a total of 895 968 disease-related or LD SNPs (34). By comparing the genomic coordinates of SNPs with all modification sites in human, RMBase can be used to illustrate the disease-related SNPs which are mapped to modification sites.

The web-based exploration of different types of RNA modification sites

We provided five web interfaces (Pseudouridine/Ψ, m6A, m5C, 2-O-Me and otherType) which may be used to display RNA modification sites from various modification types. For each type of the RNA modification, users can select species in the query page. In the result page, the basic information of modification sites was displayed in a data table which includes 10 distinct fields to describe the details of modification sites. For each interface, the numbers of RNA modification sites are indicated in bottom-left corner of table. The user can also click on the title of the table to sort RNA modification sites according to various features, such as chromosome, genome positions, the number of supporting experiments, modId, the gene names or the gene types. User can input the keyword in search box to filter the results. The users can click on a modId within the table to launch a detailed page that provides further information about the RNA modification site in question. The detailed information for a modification site includes a description of the modification site, the list of supporting experiments and sequence that was extended by an additional 20 nt in both the 5′- and 3′-directions for the modification site. The ‘PubMed ID’ section enabled the retrieval of the primary articles yielding the modification data. Click the ID link to visit the NCBI PUBMED website. The interface for modSNP and modMirTar was also provided and organized similarly to the above-mentioned interface, as well as disease-related SNP and miRNA-target interaction information. Users can explore their relationships between modification sites and SNP or miRNA target sites by similar ways.

Visualization of various modification sequencing data using the RMBase genome browser

To facilitate visualization of the various modification sequencing data sets and exploration of RNA modification sites, we provide RMBase genome browser that is built on JBrowse (43) (Figure 2). In the query page of the browser, users can input one interested genomic region or gene name in the ‘search term’ and select corresponding genome assembly to gain an integrated view of various genomic features. Information on RNA modification sites, aligned reads generated by modification sequencing methods, as well as gene annotations from Ensembl were provided. Figure 2 illustrated the visualization of genomic context for ‘PseudoU_site_871’ modification site located within MALAT1 lncRNA using RMBase Browser. Users can click the ‘+’ or ‘−’ button at the top to shrink or extend on the center of the annotation tracks window. Users can open the track select panel by clicking ‘Select Tracks’ button located in the upper-left corner and choose different types of modification data sets derived from various cell lines or treatments. To explore RNA modification sites on a particular gene, users can type its gene symbol in the position textbox and then click the ‘GO’ button to update the display image to determine what modification sites are located within the gene.
Figure 2.

Illustrative screen shots from the RMBase genome browser. RMBase genome browser provides an integrated view of modification sequencing data, aligned sequencing reads, RNA modification sites, protein-coding genes, ncRNA genes.

Illustrative screen shots from the RMBase genome browser. RMBase genome browser provides an integrated view of modification sequencing data, aligned sequencing reads, RNA modification sites, protein-coding genes, ncRNA genes.

Associating other data with modification sites using web-based modTool server

We provide the web-based modTool, which offers a simple and user-friendly interface to annotate modification sites in genomic regions uploaded by user. The user is required to select an intended organism and then upload genomic regions in the browser extensible data (BED) format. After the user has completed the data submission, a typical iteration of the modTool program may require several seconds or minutes to finish. The output of this program mainly consisted of a data table that included 10 distinct fields to describe the details of hits. The results include the query name, modification positions on genomes, modification type, the number of supporting experiments or studies, gene name, gene type (e.g. protein-coding or ncRNA) and regions (CDS, 3′ UTR, exon, 5′ UTR, intron, intergenic) on genes. Users can reorder any columns in the result table. Thus, it is convenient for data view and comparison in the user-defined vision style. Moreover, the keyword search was supported to scale down the results. Only 200 entries of hit information are displayed in the table, and users can obtain all results in text format by clicking on the ‘export’ button.

DISCUSSION AND CONCLUSIONS

By integrating a large set of RNA modification sites derived from all available high-throughput modification sequencing methods (Pseudo-seq, CeU-seq, Aza-IP, MeRIP-Seq, m6A-seq, RiboMeth-seq) and public resources, RMBase reveals extensive post-transcriptional modification of RNA in mammalian and yeast. In comparison with the other databases related to RNA modifications, including MODOMICS (26), RNAMDB (45) and MeT-DB (46) which collected modification sites identified by traditionally experimental methods or contain one modification type only, the advances of our RMBase database are as follows: (i) RMBase provides the annotation and analysis of various public modification sequencing data generated by Pseudo-seq, CeU-seq, Aza-IP, m6A-seq and RiboMeth-seq, which are the newest high-throughput technology for the transcriptome-wide identification of RNA modification sites in both animals and plants. (ii) RMBase provides the genome-wide landscape of pseudouridine (Ψ), m5C and 2′-O-Me modifications. (iii) RMBase provides genomic coordinates of all modification sites. This will facilitate computational or experimental biologists to correlate their results with all modification sites deposited in RMBase. (iv) RMBase allows combined analysis of RNA modification data and GWAS data, which identify hundreds of disease-related SNPs resided in the modification sites. These results will help to reveal the real causations and mechanisms for diseases or phenotypes identified from GWAS studies. (v) RMBase also illustrates relationships between RNA modification sites and miRNA target sites. (vi) In RMBase, we provided RMBase genome browser to provide a quick overview of a particular region in the genome and for visually correlating various types of features (Figure 2). This browser may provide an integrated view of modification sequencing data, RNA modification sites, protein-coding genes and ncRNA genes (Figure 2). (vii) RMBase provides the comprehensive annotation of various types of RNA modifications (Figure 1) and a new web-based tool, modTool, to annotate modification sites in genomic regions uploaded by user. (viii) RMBase provides a variety of interfaces and graphic visualizations to facilitate analysis of the massive and heterogeneous modification data in normal tissues and cancer cells. Overall, RMBase allows an integrative analysis of various high-throughput modification data that provide insights into the epigenetic regulation of the transcriptome. As genome-wide high-throughput sequencing data for RNA modifications become more and more available, RMBase will help researchers further investigate these data and discover potential functional roles of RNA modifications hidden in these data.

FUTURE DIRECTIONS

With the development of new high-throughput modification sequencing method, there will be more and more single nucleotide resolution modification sequencing data. We have built an automatic pipeline which is run in our high-performance computer servers to automatically annotate, analyze and merge all high-throughput modification data sets, and then import these data into our local MySQL database. We will continually maintain and update the database every two months or whenever new high-throughput modification data sets are released in public databases. RMBase will continue to expand the storage space and improve the computer server performance for storing and analyzing these new data, and we will develop or integrate new tools to decode the landscape of RNA modifications.

AVAILABILITY

RMBase is freely available at http://mirlab.sysu.edu.cn/rmbase/. The RMBase data files can be downloaded and used in accordance with the GNU Public License and the license of primary data sources.
  46 in total

1.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

2.  High-resolution N(6) -methyladenosine (m(6) A) map using photo-crosslinking-assisted m(6) A sequencing.

Authors:  Kai Chen; Zhike Lu; Xiao Wang; Ye Fu; Guan-Zheng Luo; Nian Liu; Dali Han; Dan Dominissini; Qing Dai; Tao Pan; Chuan He
Journal:  Angew Chem Int Ed Engl       Date:  2014-12-09       Impact factor: 15.336

3.  starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.

Authors:  Jian-Hua Yang; Jun-Hao Li; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2010-10-30       Impact factor: 16.971

4.  The RNA Modification Database, RNAMDB: 2011 update.

Authors:  William A Cantara; Pamela F Crain; Jef Rozenski; James A McCloskey; Kimberly A Harris; Xiaonong Zhang; Franck A P Vendeix; Daniele Fabris; Paul F Agris
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

5.  snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome.

Authors:  Jian-Hua Yang; Xiao-Chen Zhang; Zhan-Peng Huang; Hui Zhou; Mian-Bo Huang; Shu Zhang; Yue-Qin Chen; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2006-09-20       Impact factor: 16.971

6.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

Review 7.  Mapping recently identified nucleotide variants in the genome and transcriptome.

Authors:  Chun-Xiao Song; Chengqi Yi; Chuan He
Journal:  Nat Biotechnol       Date:  2012-11       Impact factor: 54.908

8.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

9.  Discovery of Protein-lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets.

Authors:  Jun-Hao Li; Shun Liu; Ling-Ling Zheng; Jie Wu; Wen-Ju Sun; Ze-Lin Wang; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Front Bioeng Biotechnol       Date:  2015-01-14

10.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq.

Authors:  Dan Dominissini; Sharon Moshitch-Moshkovitz; Schraga Schwartz; Mali Salmon-Divon; Lior Ungar; Sivan Osenberg; Karen Cesarkas; Jasmine Jacob-Hirsch; Ninette Amariglio; Martin Kupiec; Rotem Sorek; Gideon Rechavi
Journal:  Nature       Date:  2012-04-29       Impact factor: 49.962

View more
  67 in total

Review 1.  Detecting RNA modifications in the epitranscriptome: predict and validate.

Authors:  Mark Helm; Yuri Motorin
Journal:  Nat Rev Genet       Date:  2017-02-20       Impact factor: 53.242

2.  In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR.

Authors:  Pavel P Kuksa; Yuk Yee Leung; Lee E Vandivier; Zachary Anderson; Brian D Gregory; Li-San Wang
Journal:  Methods Mol Biol       Date:  2017

Review 3.  Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Authors:  Jesse J Salk; Michael W Schmitt; Lawrence A Loeb
Journal:  Nat Rev Genet       Date:  2018-03-26       Impact factor: 53.242

4.  The Tudor SND1 protein is an m6A RNA reader essential for replication of Kaposi's sarcoma-associated herpesvirus.

Authors:  Belinda Baquero-Perez; Agne Antanaviciute; Ivaylo D Yonchev; Ian M Carr; Stuart A Wilson; Adrian Whitehouse
Journal:  Elife       Date:  2019-10-24       Impact factor: 8.140

5.  Dynamic landscape and evolution of m6A methylation in human.

Authors:  Hui Zhang; Xinrui Shi; Tao Huang; Xueni Zhao; Wanying Chen; Nannan Gu; Rui Zhang
Journal:  Nucleic Acids Res       Date:  2020-06-19       Impact factor: 16.971

6.  Most m6A RNA Modifications in Protein-Coding Regions Are Evolutionarily Unconserved and Likely Nonfunctional.

Authors:  Zhen Liu; Jianzhi Zhang
Journal:  Mol Biol Evol       Date:  2018-03-01       Impact factor: 16.240

7.  tRF2Cancer: A web server to detect tRNA-derived small RNA fragments (tRFs) and their expression in multiple cancers.

Authors:  Ling-Ling Zheng; Wei-Lin Xu; Shun Liu; Wen-Ju Sun; Jun-Hao Li; Jie Wu; Jian-Hua Yang; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2016-05-13       Impact factor: 16.971

Review 8.  The N6-Methyladenosine RNA modification in pluripotency and reprogramming.

Authors:  Francesca Aguilo; Martin J Walsh
Journal:  Curr Opin Genet Dev       Date:  2017-07-03       Impact factor: 5.578

Review 9.  Role of RNA modifications in brain and behavior.

Authors:  Y Jung; D Goldman
Journal:  Genes Brain Behav       Date:  2018-03       Impact factor: 3.449

10.  dreamBase: DNA modification, RNA regulation and protein binding of expressed pseudogenes in human health and disease.

Authors:  Ling-Ling Zheng; Ke-Ren Zhou; Shun Liu; Ding-Yao Zhang; Ze-Lin Wang; Zhi-Rong Chen; Jian-Hua Yang; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.