Literature DB >> 29657273

Recent Advances in Identification of RNA Modifications.

Abstract

RNA modifications are involved in a broad spectrum of biological and physiological processes. To reveal the functions of RNA modifications, it is important to accurately predict their positions. Although high-throughput experimental techniques have been proposed, they are cost-ineffective. As good complements of experiments, many computational methods have been proposed to predict RNA modification sites in recent years. In this review, we will summarize the existing computational approaches directed at predicting RNA modification sites. We will also discuss the challenges and future perspectives in developing reliable methods for predicting RNA modification sites.

Entities: Chemical Disease Gene Species

Keywords: N1-methyladenosine; N6-methyladenosine; RNA modification; pseudouridine

Year: 2016 PMID： 29657273 PMCID： PMC5831996 DOI： 10.3390/ncrna3010001

Source DB: PubMed Journal: Noncoding RNA ISSN： 2311-553X

1. Introduction

Since the first kind of RNA modification was discovered 60 years ago [1], more than 100 kinds of RNA modifications have been reported in different RNA species [2]. RNA modifications have been found to participate in various biological activities [3,4,5,6,7,8,9,10,11]. Therefore, the knowledge about their accurate positions in transcriptome is important for understanding the mechanisms and functions of these post-transcriptional modifications. Due to the lack of effective methods, studies on RNA modifications have been hindered for a long period of time. Recent advances in next-generation sequencing technology have opened doors for the detection of RNA modifications. By using high-throughput sequencing methods [12,13,14,15,16,17,18], RNA modifications have been detected in various species of RNAs. The details of these experimental techniques have been summarized in a recent review [19]. These experimental methods indeed played key roles in promoting research progress on the biological functions of RNA modifications. However, because of the labor-intensive nature of experiments, the gap between the number of transcriptomes from different cell lines or organisms and the number of known modification sites is widening rapidly. Hence, the development of computational methods to accurately predict post-transcriptional modification sites from sequence information is urgent for the biological community. A high quality dataset is the primary requirement for developing machine learning models. Although various high-throughput experimental techniques have been developed to predict diverse RNA modifications, the generated data are scattered separately and thus inconvenient for the scientific community until the appearance of the RMBase [20]. By collecting and integrating experimental data from high-throughput modification sequencing methods, RMBase provides information for N6-methyladenosine (m6A), pseudouridine (ψ), 5-methylcytosine (m5C), and other types of RNA modifications [20], which is invaluable for the development of computational models. Based on high-throughput experimental data and RMBase, a host of computational approaches have been developed for the identification of RNA modifications in the past four years [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]. This review will summarize the representative computational approaches developed for the identification of RNA modifications that have been mapped transcriptome-wide, i.e., m6A, ψ, and N1-methyladenosine (m1A). Current challenges facing the computational prediction of RNA modifications and future perspectives are also discussed.

2. Computational Models for N6-methyladenosine (m6A)

m6A occurs at the 6th N position of the adenosine residue in the consensus sequence motif RRACH (R is purine and H is either A, C, or U) [37,38]. As a dynamic chemical modification, m6A is catalyzed by multicomponent methyltransferase complex, i.e. methyltransferase like 3 (METTL3) methyltransferase like 14 (METTL14) and Wilm’s tumor 1 associating protein (WTAP) and is reversed by demethylases fat mass and obesity-associated protein (FTO) and alkylation repair homologue protein 5 (ALKBH5) [39,40]. Although a broad set of its biological functions have been revealed, how m6A controls these processes is still a major challenge. Therefore, knowledge about the positions of m6A site will be helpful for understanding its mechanisms and functions. Based on the data from m6A-seq experiment, Schwartz et al. proposed the first computational model to predict m6A site in the yeast transcriptome [38]. Although this work plays a role in promoting the research progress on computationally predicting the distribution of m6A, no web server was provided for their method. In 2014, Chen et al. established two efficient computational models to predict the m6A sites in yeast, namely, iRNA-Methyl [23] and m6Apred [29], respectively. The m6Apred not only considers the accumulated frequency of nucleotides surrounding m6A sites, but also integrates nucleotide chemical properties. For the convenience of biologists, a web server for m6Apred has been provided at http://lin.uestc.edu.cn/server/m6Apred.php, which is the first free online tool for predicting m6A sites. Later on, by incorporating RNA sequences using the pseudo nucleotide composition [41,42], the iRNA-Methyl was developed and is freely accessible at http://lin.uestc.edu.cn/server/iRNA-Methyl. Both methods have become a useful starting point for developing computational tools for predicting m6A sites. Inspired by Chen et al.’s works [23,29], Zhou and his colleagues developed a random forest-based method called SRAMP (http://www.cuilab.cn/sramp) to predict m6A sites [31], in which both sequence information and the RNA secondary structures were used to encode RNA sequences. The overall performance of SRAMP is comparable with that of iRNA-Methyl and m6Apred. In addition, SRAMP is not only applicable for predicting m6A sites in yeast, but is also able to predict m6A sites in human and mouse transcriptomes, which is superior to both m6Apred and iRNA-Methyl. Benefiting from the RMBase [20], Chen et al. proposed the MethyRNA (http://lin.uestc.edu.cn/server/methyrna) to predict m6A sites in both human and mouse transcriptomes [28], in which RNA sequences are encoded using the nucleotide-accumulated frequency and chemical properties. MethyRNA obtained accuracies of 90.38% and 88.39% for human and mouse [28], respectively. Considering that there was no computational tool available for predicting m6A sites in plants until 2016, by using the same encoding scheme as the previous work [28], an online tool called M6ATH was developed to predict m6A sites in the Arabidopsis thaliana transcriptome [26]. The online web server for M6ATH is available at http://lin.uestc.edu.cn/server/M6ATH.

3. Computational models for Pseudouridine (ψ)

ψ is the isomer of uridine [43], which has been found in rRNAs, tRNAs, snoRNAs, and mRNA [11]. Unlike m6A modification, ψ is not reversible [43]. In 2015, Li and his colleagues performed a pioneering work and developed the first ψ site predictor called PPUS [32], which is available at http://lyh.pkmu.cn/ppus/. Each RNA sequence sample in PPUS was formulated using the orthogonal binary coding scheme [32], i.e., the classic four nucleotides A, C, G, and U, and the dummy nucleotide X were encoded as {1,0,0,0,0}, {0,1,0,0,0}, {0,0,1,0,0}, {0,0,0,1,0}, and {0,0,0,0,1}, respectively. It has been shown that PPUS exhibit considerable accuracy for predicting ψ sites in fivefold cross-validation tests in both human and yeast transcriptomes. However, the accuracy for predicting ψ sites still needs to be improved. For predicting the Ψ modification sites with higher success rates and being able to cover more species, Chen et al. developed a more powerful predictor called iRNA-PseU [27], which is available at http://lin.uestc.edu.cn/server/iRNA-PseU. iRNA-PseU is trained based on the data collected from the RMBase [20], and in which RNA sequences are also encoded using the nucleotide accumulated frequency and chemical properties. iRNA-PseU is not only able to predict ψ sites in humans and yeast, but is also applicable to the mouse transcriptome. In addition, the performance of iRNA-PseU is better than that of PPUS when tested on the same independent dataset, indicating that iRNA-PseU will play a complementary role to the existing PPUS for predicting ψ sites.

4. Computational Model for N1-Methyladenosine (m1A)

m1A was first discovered in RNA 50 years ago [44]. However, research on m1A has lagged behind as a result of lack of effective methods for detecting m1A sites. Recently, two high-throughput experimental techniques, i.e., MeRIP-seq [12] and m1A-ID-seq [13], have been developed. These methods promote the research progress on predicting m1A sites. However, their resolutions are not fully satisfactory, as they cannot pinpoint which adenosine residue is modified. Therefore, it is necessary to develop new methods for studying the distribution of m1A sites. Experimental data has provided unprecedented opportunities. Based on the data from the MeRIP-seq and m1A-ID-seq experiments, Chen et al. proposed the RAMPred server for predicting m1A sites [22], which is the first computational tool for predicting m1A sites so far and is freely accessible at http://lin.uestc.edu.cn/server/RAMPred. The RAMPred is able to predict m1A sites in human, mouse, and yeast transcriptomes [22]. It has been shown that RAMPred achieves promising performances in the rigorous jackknife tests and cross cell line tests, indicating that RAMPred holds a very high potential to become a useful tool for predicting m1A sites.

5. Current Challenges and Future Directions

As a new level of gene regulation, RNA modifications participate in diverse biological processes, ranging from regulating RNA splicing, inducing RNA decay to altering the genetic code. Although high-throughput sequencing methods have been proposed to detect RNA modification sites, most of them are still cost-ineffective and could not precisely pinpoint out which ribonucleic acid is chemically modified. In the past four years, computational approaches have provided useful strategies for efficiently detecting RNA modification sites. Although impressive progress has been achieved by computational methods, there still exist some challenges that need to be considered in future work. Since most of the existing approaches are based on sequence context, they cannot accurately predict changes in position and stoichiometry of RNA modifications. As an epigenetic modification, RNA methylation is also a complicated progress. Besides sequence context and nucleotide chemical properties, other factors may also be helpful for RNA modification site identification. Recently, Patil et al. reported that RNA-binding motif protein 15 (RBM15) and its paralogue RBM15B binding sites are in near the methylated m6A sites in DRACH consensus, while the non-methylated DRACH neighborhood is poor in the RBM15/15B binding sites [5]. These results indicate the consensus motifs surrounding m6A sites may be important for identifying m6A sites. This point has been proved in our recent study [45]. By integrating the consensus motif in the computational model, the predictive performance for identifying m6A sites was improved [45]. We believe that the consensus motif will also be helpful for identifying other kinds of RNA modifications. Therefore, for improving the performance for RNA modification sites identification, it is wise to combine all the above-mentioned factors together when developing new models in the future work. Nevertheless, as the accuracy and predictive power of computational approaches in identification of RNA modification sites improve, they are expected to provide valuable avenues for understanding the biological functions of RNA modifications. Reminiscent of the regulation of gene expression by histone modifications, it is also possible that a combination of different types of RNA modifications might mediate biological functions together [46]. Thus, it is necessary to develop a platform that can be used to deal with the system that simultaneously contains several different types of RNA modifications.

47 in total

Review 1. The emerging epitranscriptomics of long noncoding RNAs.

Authors: Andrew Shafik; Ulrike Schumann; Maurits Evers; Tennille Sibbritt; Thomas Preiss
Journal: Biochim Biophys Acta Date: 2015-11-02

2. Transcriptome-wide mapping reveals reversible and dynamic N(1)-methyladenosine methylome.

Authors: Xiaoyu Li; Xushen Xiong; Kun Wang; Lixia Wang; Xiaoting Shu; Shiqing Ma; Chengqi Yi
Journal: Nat Chem Biol Date: 2016-02-10 Impact factor: 15.040

3. Identifying N ⁶-methyladenosine sites in the Arabidopsis thaliana transcriptome.

Authors: Wei Chen; Pengmian Feng; Hui Ding; Hao Lin
Journal: Mol Genet Genomics Date: 2016-09-02 Impact factor: 3.291

4. High-resolution N(6) -methyladenosine (m(6) A) map using photo-crosslinking-assisted m(6) A sequencing.

Authors: Kai Chen; Zhike Lu; Xiao Wang; Ye Fu; Guan-Zheng Luo; Nian Liu; Dali Han; Dan Dominissini; Qing Dai; Tao Pan; Chuan He
Journal: Angew Chem Int Ed Engl Date: 2014-12-09 Impact factor: 15.336

5. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition.

Authors: Wei Chen; Tian-Yu Lei; Dian-Chuan Jin; Hao Lin; Kuo-Chen Chou
Journal: Anal Biochem Date: 2014-04-13 Impact factor: 3.365

6. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors: Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal: Nat Methods Date: 2015-06-29 Impact factor: 28.547

7. Prediction of uridine modifications in tRNA sequences.

Authors: Bharat Panwar; Gajendra P S Raghava
Journal: BMC Bioinformatics Date: 2014-10-02 Impact factor: 3.169

8. iRNA-PseU: Identifying RNA pseudouridine sites.

Authors: Wei Chen; Hua Tang; Jing Ye; Hao Lin; Kuo-Chen Chou
Journal: Mol Ther Nucleic Acids Date: 2016

9. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences.

Authors: Wei Chen; Pengmian Feng; Hui Yang; Hui Ding; Hao Lin; Kuo-Chen Chou
Journal: Oncotarget Date: 2017-01-17

10. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis.

Authors: Schraga Schwartz; Sudeep D Agarwala; Maxwell R Mumbach; Marko Jovanovic; Philipp Mertins; Alexander Shishkin; Yuval Tabach; Tarjei S Mikkelsen; Rahul Satija; Gary Ruvkun; Steven A Carr; Eric S Lander; Gerald R Fink; Aviv Regev
Journal: Cell Date: 2013-11-21 Impact factor: 41.582

2 in total

Review 1. Non-coding RNAs, the Trojan horse in two-way communication between tumor and stroma in colorectal and hepatocellular carcinoma.

Authors: Cristina- Sorina Cătană; Martin Pichler; Gianluigi Giannelli; Robert M Mader; Ioana Berindan-Neagoe
Journal: Oncotarget Date: 2017-04-25

Review 2. Above the Epitranscriptome: RNA Modifications and Stem Cell Identity.

Authors: Francesco Morena; Chiara Argentati; Martina Bazzucchi; Carla Emiliani; Sabata Martino
Journal: Genes (Basel) Date: 2018-06-28 Impact factor: 4.096

2 in total