Literature DB >> 25776805

Rsite: a computational method to identify the functional sites of noncoding RNAs.

Pan Zeng1, Jianwei Li2, Wei Ma1, Qinghua Cui1.   

Abstract

There is an increasing demand for identifying the functional sites of noncoding RNAs (ncRNAs). Here we introduce a tertiary-structure based computational approach, Rsite, which first calculates the Euclidean distances between each nucleotide and all the other nucleotides in a RNA molecule and then determines the nucleotides that are the extreme points in the distance curve as the functional sites. By analyzing two ncRNAs, tRNA (Lys) and Diels-Alder ribozyme, we demonstrated the efficiency of Rsite. As a result, Rsite recognized all of the known functional sites of the two ncRNAs, suggesting that Rsite could be a potentially useful tool for discovering the functional sites of ncRNAs. The source codes and data sets of Rsite are available at http://www.cuilab.cn/rsite.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25776805      PMCID: PMC4361870          DOI: 10.1038/srep09179

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


RNA molecules are critical for a lot of cellular processes. Besides protein coding RNAs (mRNAs), recently the application of high-throughput transcriptome detection technologies has resulted in large sets of noncoding RNAs (ncRNAs)12, which don't have the ability of coding proteins but directly carry out functions at RNA level. More recently, increasing evidence has shown that ncRNAs have important and diverse functions3. Therefore, it will be not surprised that the dysfunctions of ncRNAs are associated with a wide spectrum of diseases456. According to the human microRNA (miRNA) disease database (HMDD, http://www.cuilab.cn/hmdd) and the lncRNA disease database (LncRNADisease, http://www.cuilab.cn/lncrnadisease), there are already ~400 diseases and ~200 diseases having been reported to be associated with miRNAs and long ncRNAs (lncRNAs), respectively. It is believed that ncRNAs are becoming a large class of novel molecules for disease diagnosis and therapy. Given the rapidly growing numbers of ncRNAs, identifying the functional sites of ncRNAs has become an important and emergent task in ncRNA research field. However, there is a paucity of methods. For the above purpose, recently, some biological-experiments based pioneering methods have been developed, such as SHAPE-MaP (SHAPE and mutational profiling)7, RNase footprinting89, and in-line Probing1011. Although the above methods provide great helps in RNA research, they have some inherent limitations. First, these biological experiments are generally complex and thus time and cost consuming. Moreover, the above methods work on a level of RNA backbone or RNA domain, so they have difficulties in identifying base-level functional sites. Therefore, computational methods will be an important supplement to the above biological-experiments based methods. However, no computational methods for the above purpose have been developed so far. For proteins, one class of much more well-studied biological molecules than ncRNAs, there are already a number of computational methods to predict their functional sites, including methods based on machine learning12, sequence conservation13, and tertiary structure14. These methods provide clues for the prediction of ncRNA functional sites. For machine learning based methods and sequence conservation based methods, prior known protein functional sites are needed. Because ncRNAs represent a new research field, the known ncRNA functional sites are limited. Moreover, ncRNAs normally show less sequence conservation. In addition, the conserved bases are not often the functional sites and the divergent bases are not often the non-functional sites. For example, divergent bases of one RNA across various species could be the functional sites indicating the functional divergence of the RNA in different species. Therefore, currently these methods are not feasible for the identification of ncRNA functional sites. For the tertiary structure based methods, there are two main categories: those predicting functional sites by structural similarities from proteins with known functional sites and those predicting functional sites by structural features such as geometry or electrostatics15. The first category also requires datasets of known protein functional sites. This category is thus not practicable for ncRNAs because a resource for ncRNA functional sites is not available yet. For the geometry based method, the residues that show high closeness centrality normally are considered to be those involved in function14. For example, it was reported that catalytic residues tend to be close to the molecular centroid16. Notably, geometry based methods are not dependent on known functional sites. Given limited known functional sites of ncRNAs, geometry based methods seem feasible to predict ncRNA functional sites. Moreover, in addition to the central residues described above, the surface-exposed ones could also be the functional sites of molecules17. Taken together, we hypothesized that both the most connected nucleotides and the most non-connected nucleotides in an ncRNA molecule tend to be functionally important, that is, they are putative functional sites. Based on the above observations, here we presented a computational method, Rsite, for the identification of ncRNA functional sites based on ncRNA geometry. The results showed that Rsite has a reliable accuracy, suggesting that Rsite could be a useful tool for the identification of ncRNA functional sites.

Results

In order to validate the accuracy and efficiency of Rsite, we applied it to two well-studied ncRNAs with known functional sites, tRNA (Lys) and Diels-Alder ribozyme.

The functional sites of the tRNA (Lys)

We first applied Rsite to predict the functional sites of the tRNA (Lys). Rsite first calculated the distance curve of the tRNA (Lys) (Figure 1). Rsite next identified the putative functional sites of the tRNA (Lys). As a result, all the 7 known functional sites were successfully predicted by Rsite (Table 1; Figure 2), which achieved a sensitivity of 100% (7/7). Among all the 11 predicted functional sites, 4 sites were not reported to be functional sites, suggesting that Rsite has a low false positive rate.
Figure 1

The raw nucleotide distance curve (A) and the smoothened nucleotide distance curve (B) of the tRNA (Lys).

Table 1

The known functional sites (FSs) and predicted results on the tRNA(Lys,3)(PDB#1FIR)

Site NoFSPredicted FSDescription
11–41,2Within acceptor stem(5' end)Interacting with RanContacting XpotInvolved in recognition by RNase Z and RNase PRecognized by aminoacyl-tRNA synthetase
213–2213,17,19,20,22DHU loopInteracting with the mRNA-ribosome complexContacting XpotInvolved in recognition by RNase P
334–3635Anticodon, Decoding mRNA codonRecognized by aminoacyl-tRNA synthetaseInteracting with the mRNA-ribosome complex
449–5149Within TψC stemBinding site of elongation factor
553–6156,60TψC loopInteracting with the mRNA-ribosome complexContacting XpotInvolved in recognition by RNase Z and RNase PProcessed by a tRNA ψ55 pseudouridine synthaseAffect 3' end processing and tRNA structure
663–6564Within TψC stemBinding site of elongation factorInteracting with RanInvolved in recognition by RNase P
772–7676Aminoacylation site(3' end)Recognized by aminoacyl-tRNA synthetaseInteracting with the mRNA-ribosome complex (7)Contacting Xpot (1)Involved in recognition by RNase ZProcessed by a CCA-adding enzyme
Figure 2

Graphical representation of the tRNA (Lys) functional sites identified by Rsite.

Figure 2A&B show the secondary structure and tertiary structure of the tRNA (Lys). The nucleotides in red color represent the predicted functional sites that hit known functional sites. The nucleotides in blue color stand for the predicted functional sites that are not reported to be functional sites.

The functional sites of the Diels-Alder ribozyme

We also applied Rsite to predict the functional sites of the Diels-Alder ribozyme. Rsite first calculated the distance curve of the Diels-Alder ribozyme (Figure 3). Rsite next identified the putative functional sites of the Diels-Alder ribozyme. As a result, Rsite successfully predicted all the 3 known functional sites (Table 2; Figure 4), indicating Rsite also achieved a sensitivity of 100% for the Diels-Alder ribozyme. Among the 6 predicted functional sites, 3 sites were not reported to be functional sites.
Figure 3

The raw nucleotide distance curve (A) and the smoothened nucleotide distance curve (B) of the Diels-Alder ribozyme.

Table 2

The known functional sites (FSs) and predicted results on the artificial Diels-Alder ribozyme(PDB#1YKV)

Site NoFSPredicted FSDescription
11–41,2,3A part of the catalytic pocket
223–2525A part of the catalytic pocket
342–4543A part of the catalytic pocket
Figure 4

Graphical representation of the Diels-Alder ribozyme functional sites identified by Rsite.

Supplementary Figure 2A&B show the secondary structure and tertiary structure of the Diels-Alder ribozyme. The nucleotides in red color represent the predicted functional sites that hit known functional sites. The nucleotides in blue color stand for the predicted functional sites that are not reported to be functional sites.

Discussion

As demonstrated by the above analyses, Rsite shows a reliable accuracy for the identification of ncRNA functional sites if the ncRNAs' tertiary structures are available, which would shed light on the ncRNA research. Currently, Rsite is only designed to predict the positions of functional sites but cannot predict their exact functional roles. Although this limitation exists, we think it is still helpful and useful because it presents molecular biologists candidate targets for further experiments. Combining molecular biology, we believe it could be feasible to investigate the exact functional roles of the identified functional sites. Given that the tertiary structures of some ncRNAs can be predicted by computational tools, Rsite can be easily applied to a number of ncRNAs based on predicted tertiary structures. However, the current algorithms predicting RNA tertiary structures runs well only for small RNAs (e.g. RNAs less than 100 nt in length) but cannot process bigger RNAs. This limits Rsite to a small fraction of ncRNAs. Therefore, doing as the said above must be very careful because of poor prediction accuracy of tertiary structures for RNA molecules, especially for large RNA molecules. Moreover, we believe that combining with other features such as sequence conservation can further improve Rsite. In addition, validation with more structures will help establish utility of Rsite. The NUCLEIC ACID DATABASE (NDB, http://ndbserver.rutgers.edu) collects a number of RNA tertiary structures18. However, currently, these RNAs do not have detailed information about the annotation of functional sites. It is thus necessary to re-visit the NDB database to test the validation of Rsite with more RNAs when their functional sites become available. Another limitation of Rsite is that it is difficult to predict the exact functional domains of a RNA molecule. The reason is that Rsite tries to find the local extreme points but misses the neighbor points of the extreme points. This procedure often successfully identifies one functional site located within a functional domain but misses the other functional sites in this domain. In the future, integrating the neighbor points of an extreme point could be an optional solution for this problem. Finally, although limitations exist, we envision that Rsite represent a potentially useful tool for biologists working on ncRNA research.

Methods

The tertiary structure data of two ncRNA molecules

To validate the accuracy of Rsite, we applied it to two ncRNA molecules, tRNA (Lys) and Diels-Alder ribozyme. We obtained the tertiary structure data of the two ncRNAs from the PDB database (http://www.rcsb.org/pdb/home/home.do) (PDB ID of the tRNA (Lys): 1FIR; PDB ID of the Diels-Alder ribozyme: 1YKV). The tertiary structure data of the two ncRNAs are also available at http://www.cuilab.cn/rsite.

Algorithms in Rsite

For a given ncRNA, Rsite first calculates the Euclidean distance between any two nucleotides using the coordinates of the two nucleotides derived from the ncRNA's structure data. Then for each nucleotide, Rsite sums up the distances between the nucleotide and all the other nucleotides. For an ncRNA with n nucleotides, we then obtain a distance curve (D) of length n. Here D(i) denotes the summed distance between the ith nucleotide and all the other nucleotides. Next, in order to decrease noise, the distance curve is smoothened by a Gaussian filter. For this purpose, here we used a window size of 2 for both RNAs. It is difficult to determine an optimal window size, which could have important influence on the prediction result. For longer RNAs, a bigger window size could be better. The local maximum points and the local minimum points of the smoothened distance curve are then identified. Finally, the extreme points are identified as the functional sites of the ncRNA. For the start point and the end point, if they show relative high (50 percentage) deviation from the average distance they will be considered to be functional sites. In addition, because most of the functional sites identified by Rsite are single nucleotide, we further merge multiple functional sites into one if they are close (< = 2 nucleotides for their sequence positions) to each other.

Author Contributions

Q.C. designed this study. P.Z. performed the study. J.L. and W.M. contributed to the coding of Rsite. Q.C. and P.Z. wrote the manuscript.
  18 in total

1.  Network analysis of protein structures identifies functional residues.

Authors:  Gil Amitai; Arye Shemesh; Einat Sitbon; Maxim Shklar; Dvir Netanely; Ilya Venger; Shmuel Pietrokovski
Journal:  J Mol Biol       Date:  2004-12-03       Impact factor: 5.469

Review 2.  Pattern recognition methods for protein functional site prediction.

Authors:  Zheng Rong Yang; Lipo Wang; Natasha Young; Dave Trudgian; Kuo-Chen Chou
Journal:  Curr Protein Pept Sci       Date:  2005-10       Impact factor: 3.272

3.  Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces.

Authors:  Avraham Ben-Shimon; Miriam Eisenstein
Journal:  J Mol Biol       Date:  2005-08-12       Impact factor: 5.469

4.  Structural analysis of RNA backbone using in-line probing.

Authors:  Ali Nahvi; Rachel Green
Journal:  Methods Enzymol       Date:  2013       Impact factor: 1.600

5.  Global identification of human transcribed sequences with genome tiling arrays.

Authors:  Paul Bertone; Viktor Stolc; Thomas E Royce; Joel S Rozowsky; Alexander E Urban; Xiaowei Zhu; John L Rinn; Waraporn Tongprasit; Manoj Samanta; Sherman Weissman; Mark Gerstein; Michael Snyder
Journal:  Science       Date:  2004-11-11       Impact factor: 47.728

6.  Mapping of mitochondrial RNA-protein interactions by digital RNase footprinting.

Authors:  Ganqiang Liu; Timothy R Mercer; Anne-Marie J Shearwood; Stefan J Siira; Moira E Hibbs; John S Mattick; Oliver Rackham; Aleksandra Filipovska
Journal:  Cell Rep       Date:  2013-10-31       Impact factor: 9.423

7.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

Authors:  Edouard de Castro; Christian J A Sigrist; Alexandre Gattiker; Virginie Bulliard; Petra S Langendijk-Genevaux; Elisabeth Gasteiger; Amos Bairoch; Nicolas Hulo
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

8.  RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP).

Authors:  Nathan A Siegfried; Steven Busan; Greggory M Rice; Julie A E Nelson; Kevin M Weeks
Journal:  Nat Methods       Date:  2014-07-13       Impact factor: 28.547

9.  An analysis of human microRNA and disease associations.

Authors:  Ming Lu; Qipeng Zhang; Min Deng; Jing Miao; Yanhong Guo; Wei Gao; Qinghua Cui
Journal:  PLoS One       Date:  2008-10-15       Impact factor: 3.240

10.  Proteolysis at a specific extracellular residue implicates integral membrane CLAG3 in malaria parasite nutrient channels.

Authors:  Wang Nguitragool; Kempaiah Rayavara; Sanjay A Desai
Journal:  PLoS One       Date:  2014-04-03       Impact factor: 3.240

View more
  4 in total

1.  Dissecting and predicting different types of binding sites in nucleic acids based on structural information.

Authors:  Zheng Jiang; Si-Rui Xiao; Rong Liu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  Mining for Ligandable Cavities in RNA.

Authors:  Jingru Xie; Aaron T Frank
Journal:  ACS Med Chem Lett       Date:  2021-06-01       Impact factor: 4.632

3.  Rsite2: an efficient computational method to predict the functional sites of noncoding RNAs.

Authors:  Pan Zeng; Qinghua Cui
Journal:  Sci Rep       Date:  2016-01-11       Impact factor: 4.379

4.  Recognition of small molecule-RNA binding sites using RNA sequence and structure.

Authors:  Hong Su; Zhenling Peng; Jianyi Yang
Journal:  Bioinformatics       Date:  2021-01-08       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.