Literature DB >> 28624191

iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC.

Pengmian Feng¹, Hui Ding², Hui Yang², Wei Chen³, Hao Lin⁴, Kuo-Chen Chou⁵.

Abstract

There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for in-depth understanding of their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, each one could only be used to deal with some type of modification individually. To our knowledge, no method has thus far been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called "iRNA-PseColl" has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform's web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide that will allow users to easily achieve their desired results without the need to go through the mathematical details involved in this paper.

Entities: Chemical Disease Gene Species

Keywords: PseKNC; RNA modification; SVM; collective effect; nucleotide chemicophysical property

Year: 2017 PMID： 28624191 PMCID： PMC5415964 DOI： 10.1016/j.omtn.2017.03.006

Source DB: PubMed Journal: Mol Ther Nucleic Acids

Introduction

Since the first modified RNA ribonucleic acid was found ∼60 years ago, ∼150 known RNA modifications have been reported. Emerging evidences suggest that RNA modifications are critical components of the gene regulatory landscape and are involved in a variety of biological processes in the post-transcriptional level, such as protein translation and localization, mRNA splicing, affecting ribosome biogenesis, mediating antibiotic resistance, and stem cell pluripotency. However, many aspects of RNA modifications remain unknown. Therefore, detecting the positions of RNA modifications plays an essential role for understanding their molecular mechanisms and functions. The advent of next-generation sequencing technologies has allowed investigation of RNA modifications on a genome-wide scale.9, 10, 11, 12, 13, 14, 15 For example, the N1-methyladenosine (m1A),9, 10 N6-methyladenosine (m6A), and 5-methylcytosine (m5C) maps are available for the human transcriptome. Although these experimental methods played active roles in promoting the research progress on understanding the biological functions and the identification of RNA modifications, they are still labor-intensive. As excellent complements to experimental techniques, some computational methods (based on the high-resolution experimental data) have been developed to identify RNA modifications.7, 16, 17, 18, 19, 20, 21 Reminiscent of the regulation of gene expression by histone modifications, it is also possible to mediate biological functions in a collective way by combining different kinds of RNA modifications. Unfortunately, to the best of our knowledge, no computational tool is available for dealing with a system that simultaneously contains several different kinds of RNA modifications. Actually, this kind of multi-modification systems may contain much more interesting things worthy of exploration. In view of this, the present study was initiated in an attempt to fill such a void by establishing a seamless package or platform that can be used to analyze a biological system that simultaneously contains the three well known types of RNA modifications: m1A, m6A, and m5C (Figure 1).

Figure 1

A Schematic Drawing to Show the Three Types of Modifications that May Simultaneously Occur in an RNA Sequence

Three types of modifications (m1A, m6A, and m5C) are shown.

A Schematic Drawing to Show the Three Types of Modifications that May Simultaneously Occur in an RNA Sequence Three types of modifications (m1A, m6A, and m5C) are shown.

Results and Discussion

By incorporating collective effects of nucleotides into PseKNC,22, 23 a seamless platform called “iRNA-PseColl” has been developed for identifying the occurrence sites of different RNA modifications. It has been observed by the most rigorous cross-validation, the jackknife test, that the success rates achieved by the new predictor are quite high for the three different types of RNA modification sites, respectively (Table 1).

Table 1

The Success Rates Obtained by the Proposed Model in Identifying Three Different Types of RNA Modification Sites

Modification Type	Metricsa
Modification Type	Sn (%)	Sp (%)	Acc (%)	MCC
(1) m¹A	98.38	99.89	99.13	0.98
(2) m⁶A	81.86	99.11	90.38	0.82
(3) m⁵C	75.83	79.17	77.50	0.55

The results were obtained by the jackknife tests on the three benchmark datasets given in Supplemental Materials and Methods, respectively. Acc, overall accuracy; MCC, Mathew’s correlation coefficient; Sn, sensitivity; Sp, specificity.

See Equation 13 and the relevant text for the definition of metrics.

The Success Rates Obtained by the Proposed Model in Identifying Three Different Types of RNA Modification Sites The results were obtained by the jackknife tests on the three benchmark datasets given in Supplemental Materials and Methods, respectively. Acc, overall accuracy; MCC, Mathew’s correlation coefficient; Sn, sensitivity; Sp, specificity. See Equation 13 and the relevant text for the definition of metrics. Because it is the first platform predictor ever developed for simultaneously identifying three different types of RNA modification sites based on its sequence information alone, it is not possible to demonstrate its power by a comparison with its counterparts because there is no such a counterpart yet for exactly the same purpose. Nevertheless, as we can see from Table 1, all the scores are quite high, particularly for the overall accuracy (Acc) and Mathew’s correlation coefficient (MCC). Let us use graphic analysis to further demonstrate the proposed platform’s quality. As it is, the graphical approach is a useful vehicle for studying complicated biological systems because it can provide intuitive insights, as demonstrated by a series of previous studies.25, 26, 27, 28, 29, 30, 31, 32, 33, 34 Therefore, it would be instructive and illuminative to give an intuitive illustration for the current study as well. To realize this, the graph of receiver operating characteristic (ROC)35, 36 was adopted as shown in Figure 2, where the ROC curves for the current method in identifying m1A, m6A, and m5C modifications were given, respectively. The best possible prediction method would yield a point with the coordinate (0, 1) representing 100% sensitivity and 0 false-positive rate or 100% specificity. Therefore, the (0, 1) point is also called a perfect classification. A completely random guess would give a point along a diagonal from the point (0, 0) to (1, 1). The area under the ROC curve, also called AUROC, is used to indicate the performance quality of the classifier: the value 0.5 of AUROC is equivalent to random prediction while 1 of AUROC represents a perfect one. The AUROC for the case of m1A, m6A, or m5C is 0.998, 0.849, or 0.911, respectively, indicating that the proposed platform is quite promising, holding very high potential to become a useful high throughput tool for genome analyses.

Figure 2

A Graphical Illustration to Show the Performances of iRNA-PseColl in Identifying m1A, m6A, and m5C Modification Sites, Respectively

The performances are illustrated by means of the ROC curves.35, 36 The area under the ROC curve is called AUROC. The greater the AUROC value is, the better the performance will be. See the text for further explanation.

A Graphical Illustration to Show the Performances of iRNA-PseColl in Identifying m1A, m6A, and m5C Modification Sites, Respectively The performances are illustrated by means of the ROC curves.35, 36 The area under the ROC curve is called AUROC. The greater the AUROC value is, the better the performance will be. See the text for further explanation. Inspired by a series of recent publications,20, 21, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 papers published with a publicly accessible web server will significantly enhance their impacts; this is particularly true for those papers aimed at developing novel prediction methods. Accordingly, the web server for the current platform has been established. Moreover, for the convenience of the scientific community, a user guide is given in the Supplemental Materials and Methods.

Materials and Methods

According to the Chou’s five-step guidelines that have been followed by many investigators in a series of recent publications,21, 39, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 56, 57, 58, 59, 60, 61, 62, 63 to develop a new prediction method that not only can be easily used by most experimental scientists but also can inspire theoretical scientists to develop many other relevant prediction methods, we should make the following five procedures very clear: (1) how to construct or select a valid benchmark dataset to train and test the prediction model, (2) how to represent a biological sequence sample with a mathematical formulation or vector that is really correlated with the target concerned, (3) how to introduce or develop a powerful engine (or algorithm) to run the prediction model, (4) how to properly perform the cross-validation tests to objectively evaluate the anticipated accuracy, and (5) how to design a user-friendly web server to make it easy for people to get their desired results. Below, we elaborate the five procedures in establishing the new predictor.

Benchmark Dataset

Owing to the fast development of high-throughput experimental techniques, the experimentally confirmed m1A, m6A, and m5C modification data is available for the human genome.9, 10, 13, 15 By mapping the experimental data to the human genome, the sequence samples with statistical significance were obtained for the three kinds of RNA modification sites as well. For facilitating the formulation, let us use the following scheme to represent a potential RNA modification-site-containing samplewhere the symbol denotes the single nucleic acid code A (adenine) or C (cytosine), the subscript is an integer, represents the -th upstream nucleotide from the center, the represents the -th downstream nucleotide, and so forth. The -tuple RNA sample, , can be further classified into the following two categories:where denotes a true modification segment with A or C at its center, denotes a false modification segment with A or C at its center, and the symbol means “a member of” in the set theory. In literature, the benchmark dataset usually consists of a training dataset and a testing dataset: the former is for the use of training a model, while the latter for testing the model. However, as elucidated in a comprehensive review, there is no need to artificially separate a benchmark dataset into the aforementioned two parts if the prediction model is examined by the jackknife test or subsampling (K-fold) cross-validation, because the outcome thus obtained is actually from a combination of many different independent dataset tests. Thus, the benchmark datasets for the current study can be further formulated aswhere the positive subset only contains those RNA samples that can have modification, and the negative subset only contains those RNA samples that cannot have modification, while denotes the symbol of “union” in the set theory, and so forth. The benchmark datasets were derived from the RNA sequences in human genome that have the experimentally confirmed m1A, m6A, and m5C modification sites.9, 10, 13, 15 The detailed procedures to construct the benchmark dataset are as follows. First, as done in Chou, by sliding the -tuple nucleotide window (Figure 3) along each of the aforementioned RNA sequences, only those RNA segments with A or C at the center were collected. Second, if the upstream or downstream in a RNA sequence was less than ξ or greater than where L is the length of the RNA sequence concerned, the lacking code was filled with the same code of its nearest neighbor. Third, the RNA segment samples thus obtained were put into the positive subset , , or if their centers were experimentally annotated as the m1A, m6A, or m5C sites; otherwise, into the corresponding negative subset , , or . Fourth, to reduce redundancy and bias, none of the included RNA segments had pairwise sequence identity with any other in a same subset. By strictly following the above procedures, we obtained an array of benchmark datasets with different values and hence different lengths of RNA samples as well (see Equation 1), as illustrated belowwhere the symbol means “formed by.” It was observed via preliminary tests as well as many reports19, 43, 66 that when (i.e., the RNA samples formed by 41 nucleotides [nt]), the corresponding results were most promising. Accordingly, hereafter we only consider the 41-nt RNA sequences. By doing so, we obtained 6,366, 1,130 and 120 sequence samples for the positive subsets , , and , respectively. The numbers of samples thus obtained in the corresponding negative subsets are much greater, and hence the benchmark datasets would be very imbalanced. Using such highly skewed benchmark dataset to train predictors would lead to the outcome that many positive cases might be mispredicted as negative ones.42, 44, 56, 67 To balance out the size between the positive subset and the negative subset, we randomly picked out 6,366, 1,130, and 120 from the corresponding negative samples to for the negative subsets , , and , respectively, as done in Chen et al. and Feng et al.

Figure 3

An Illustration to Show the Process of Collecting the RNA Samples by Sliding the (2ξ + 1) -nt Scaled Window along an RNA Sequence

Adapted from Chou with permission. See the text for further explanation.

An Illustration to Show the Process of Collecting the RNA Samples by Sliding the (2ξ + 1) -nt Scaled Window along an RNA Sequence Adapted from Chou with permission. See the text for further explanation. Finally, the detailed RNA sequence samples thus obtained for the benchmark dataset , , and are given in the Supplemental Materials and Methods, which can also be directly downloaded from http://lin.uestc.edu.cn/server/iRNA-PseColl/dataset.htm.

Formulating RNA Sequence Samples

One of the most challenging problems in computational biology today is how to formulate a biological sequence with a vector that can reflect its key pattern important for the function or mechanism concerned. The importance of such a challenge is due to the fact that nearly all the existing machine-learning algorithms were developed to handle vector rather than sequence samples, as elucidated in a review article. Unfortunately, a vector defined in a discrete model may lose many important sequence pattern features. To deal with such a problem for protein/peptide sequences, the pseudo amino acid composition (PseAAC)68, 69, 70, 71, 72 was developed. Ever since it was introduced, the concept of PseAAC has penetrated into nearly all the areas of computational proteomics (see a long list of references cited in two review papers55, 73). Inspired by the concept of PseAAC and encouraged by its great successes, the pseudo nucleotide composition (PseKNC)22, 74, 75, 76 was proposed and has been increasingly used in various fields of genome analysis.20, 21, 23, 37, 39, 40, 42, 43, 51, 52, 53, 58, 59, 60, 77, 78, 79, 80, 81, 82, 83, 84, 85 With both PseAAC and PseKNC being increasingly and widely used, it is highly desired to design a seamless package that can generate various modes of PseAAC and PseKNC according to users’ needs for protein/peptide and DNA/RNA sequences, respectively. This was exactly the driving force of establishing the web server called Pse-in-One and what it is about. The general form of PseKNC for an RNA sequence sample is given bywhere is a transpose operator, while the subscript an integer and its value as well as the components will depend on how to extract the desired features from the RNA sequence sample. In order to make Equation 4 able to reflect both the local feature of its individual constituent nucleotides and that of their collective effect, let us define the components in Equation 4 from the following two different approaches.

Local Features of Individual Nucleotides

RNA consists of four types of nucleotides: A (adenosine), C (cytidine), G (guanosine), and U (uridine). They can be classified into three different categories (Table 1): (1) from the angle of ring number, A and G have two rings, whereas C and U only one; (2) from the chemical functionality, A and C belong to amino group, while G and U to keto group; and (3) from the angle of hydrogen bonding, C and G can be bonded to each other with three hydrogen bonds, but A and U with only two (Figure 4). All these properties would have different impacts to RNA’s low-frequency internal motion87, 88 and its biological function89, 90, 91.

Figure 4

Illustration to Show the Structure of Paired Nucleic Acid Residues

Left: A-U pair bonded to each other with two hydrogen bonds. Right: G-C pair with three hydrogen bonds. Adapted from Chou with permission.

Illustration to Show the Structure of Paired Nucleic Acid Residues Left: A-U pair bonded to each other with two hydrogen bonds. Right: G-C pair with three hydrogen bonds. Adapted from Chou with permission. To reflect the aforementioned features, let us denote the i-th nucleotide of Equation 1 by92, 93where , , and refer to the attributes of (1) ring structure, (2) functional group, and (3) hydrogen bonding in Table 2, respectively. Accordingly, the nucleotide A can be formulated as (1, 1, 1), C as (0, 1, 0), G as (1, 0, 0), and U as (0, 0, 1); or generally we have

Table 2

Classification of Nucleotides

Angle of View	Attribute	Nucleotides
(1) Ring structure	purine	A, G
(1) Ring structure	pyrimidine	C, U
(2) Functional group	amino	A, C
(2) Functional group	keto	G, U
(3) Hydrogen bonding	stronger	C, G
(3) Hydrogen bonding	weaker	A, U

See Local Features of Individual Nucleotides for further explanation.

Classification of Nucleotides See Local Features of Individual Nucleotides for further explanation.

Collective Features of the Constituent Nucleotides

There are some methods to reflect the coupling of a biological sequence or the collective effect of its constituent elements, such as the conditional probability approach, degenerate Kmer strategy, and g-gap dipeptide mode. In this study, we would like to use a different approach; i.e., consider the occurrence frequency of a nucleotide not only for its local site but also for its distribution along the sequence of an RNA sample, as defined by the following equationwhere is the density of the nucleotide at the site of a RNA sequence, the length of the sliding substring concerned, denotes each of the site locations counted in the substring, and For instance, suppose a RNA sequence is “CACGUC.” The density of “A” at the sequence position 1, 2, 3, 4, 5, or 6 is , , , , , or , respectively; that of “C” is , , , , or , respectively; and so forth. By combing (Equation 6), (Equation 9), the i-th nucleotide of Equation 1 can be uniquely defined by a set of four variables; i.e., For example, the RNA sequence “CACGUC” can be expressed by the following five sets of digital numbers: (0, 1, 0, 1), (1, 1, 1, 0.5), (0, 1, 0, 0.66), (1, 0, 0, 0.25), (0, 0, 1, 0.2), and (0, 1, 0, 0.5). Submitting these numbers into Equation 5, we havemeaning that the 6-nt nucleotide example can be defined by a -D (dimensional) PseKNC vector. Accordingly, all the samples in the current benchmark datasets (Supplemental Materials and Methods) can be formulated with a -D vector.

Operation Engine

The prediction was operated by SVM (support vector machine), which has been widely used in various areas of bioinformatics and computational biology.20, 40, 42, 59, 67, 77, 78, 79, 80, 81, 95, 96, 97, 98, 99, 100, 101, 102, 103 Its basic idea has been elaborated in the aforementioned the papers, and there is no need to repeat it here. In the current study, the LibSVM package 3.18 was used to implement SVM, which can be downloaded for free from http://www.csie.ntu.edu.tw/∼cjlin/libsvm/. The SVM algorithm contains two uncertain quantities: one is the regularization parameter and the other is the kernel width parameter . They were optimized via an optimization procedure using the grid search approach as described bywhere and represent the step gaps for C and respectively. For those readers who are interested in knowing more about SVM, see Chou and Cai and Cai et al. or a monograph where a brief introduction or detailed description were given, respectively. The platform predictor obtained via the aforementioned procedures is called “iRNA-PseColl,” where “i” stands for “identify,” “Pse” for “pseudo component approach,” and “Coll” for “collective effects of nucleotides.”

Quality Control or Examination

Quality control is a very important process in industries; it is even more important for a predictor. To deal with this problem, we need to address the following two issues: (1) what standard or metrics should we adopt to measure the predictor’s quality, and (2) what test process or method we should take to calculate the metrics. Below, we address the two problems.

A Set of Four Intuitive Metrics

The current prediction is belonging to the category called “binary classification” widely existing in genome analyses. To measure the prediction quality of this kind, a set of four metrics are usually used in literature: (1) sensitivity or Sn, (2) specificity or Sp, (3) overall accuracy or Acc, and (4) Mathew’s correlation coefficient or MCC. Unfortunately, their formulations were directly taken from mathematical literature and difficult to be understood by most biological scientists. Fortunately, using the symbols introduced by Chou in studying signal peptides, Xu et al. and Chen et al. have derived a new set of metrics that is equivalent to the old one but much more intuitive and easier to be understood by most biologists, as given below To address this, we need to consider two issues: one is what metrics should be used to reflect the predictor’s success rates; the other is what test method should be adopted to derive the metrics rates. To quantitatively evaluate the quality of a binary classification predictor, four metrics are generally needed. They are: (1) Acc for the predictor’s overall accuracy; (2) MCC for its stability; (3) Sn for its sensitivity; and (4) Sp for its specificity. Unfortunately, the conventional formulations for the four metrics are not quite intuitive, and most biologists have difficulty understanding them, particularly the stability of MCC. Fortunately, as elaborated in Yu et al. and Chen et al., by using the Chou’s symbols and derivation in studying signal peptides, the conventional metrics can be converted into a set of four intuitive equations, as formulated below:where represents the total number of positive samples investigated, is the number of positive samples incorrectly predicted to be the negative, is the total number of negative samples investigated, and is the number of the negative samples incorrectly predicted to be the positive. With the metrics of Equation 13, the meanings of Sn, Sp, Acc, and MCC have become crystal clear as discussed and used in a series of follow-up studies for many different areas.20, 21, 38, 40, 42, 44, 45, 46, 47, 48, 49, 56, 57, 61, 67, 80, 82, 84, 97, 99, 112, 113, 114, 115 It is instructive to point out that more multi-label sequence samples have been emerging in system biology and medicine.49, 116, 117, 118, 119 To deal with this kind of multi-label system, a much more sophisticated set of metrics is needed as elaborated in Chou.

Jackknife Validation

Three different cross-validation methods are often adopted in literature. These methods include: (1) an independent dataset test, (2) a subsampling (or K-fold cross-validation) test, and (3) the jackknife test. However, as elucidated in Chou in the above three choices, the jackknife test has been demonstrated to be the least arbitrary that can always yield a unique outcome for a given benchmark dataset. Therefore, the jackknife test has been widely recognized and increasingly adopted by researchers to analyze the quality of various predictors.83, 103, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131 In view of this, we also used the jackknife test to examine the quality of the current prediction method. The jackknife test can exclude the “memory” effect because both the training dataset and testing dataset in a jackknife system are actually open, and each sample will be, in turn, moved between the two. The arbitrariness problem intrinsic to the independent dataset and subsampling tests no longer exists, because the outcome derived via the jackknife test for a predictor is always the same on a given benchmark dataset.

Author Contributions

W.C., H.L., and K.-C.C. conceived and designed the study. P.F. and H.D. conducted the experiments. P.F., H.D., and W.C. implemented the algorithms. H.Y. established the web server. W.C., H.L., and K.-C.C. performed the analysis and wrote the paper. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

124 in total

1. Some insights into protein structural class prediction.

Authors: G P Zhou; N Assa-Munt
Journal: Proteins Date: 2001-07-01

2. Subcellular location prediction of apoptosis proteins.

Authors: Guo-Ping Zhou; Kutbuddin Doctor
Journal: Proteins Date: 2003-01-01

3. pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties.

Authors: Zi Liu; Xuan Xiao; Dong-Jun Yu; Jianhua Jia; Wang-Ren Qiu; Kuo-Chen Chou
Journal: Anal Biochem Date: 2015-12-31 Impact factor: 3.365

4. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition.

Authors: Jianhua Jia; Zi Liu; Xuan Xiao; Bingxiang Liu; Kuo-Chen Chou
Journal: J Biomol Struct Dyn Date: 2015-10-29

5. Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine.

Authors: Ravindra Kumar; Abhishikha Srivastava; Bandana Kumari; Manish Kumar
Journal: J Theor Biol Date: 2014-10-22 Impact factor: 2.691

6. Site-specific methylation of 16S rRNA caused by pct, a pactamycin resistance determinant from the producing organism, Streptomyces pactum.

Authors: J P Ballesta; E Cundliffe
Journal: J Bacteriol Date: 1991-11 Impact factor: 3.490

7. Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition.

Authors: Mandana Behbahani; Hassan Mohabatkar; Mokhtar Nosrati
Journal: J Theor Biol Date: 2016-09-08 Impact factor: 2.691

8. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier.

Authors: Wang-Ren Qiu; Xuan Xiao; Zhao-Chun Xu; Kuo-Chen Chou
Journal: Oncotarget Date: 2016-08-09

9. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences.

Authors: Wei Chen; Pengmian Feng; Hui Yang; Hui Ding; Hao Lin; Kuo-Chen Chou
Journal: Oncotarget Date: 2017-01-17

10. Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors: Kuo-Chen Chou
Journal: J Theor Biol Date: 2010-12-17 Impact factor: 2.691

54 in total

1. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters.

Authors: Meng Zhang; Fuyi Li; Tatiana T Marquez-Lago; André Leier; Cunshuo Fan; Chee Keong Kwoh; Kuo-Chen Chou; Jiangning Song; Cangzhi Jia
Journal: Bioinformatics Date: 2019-09-01 Impact factor: 6.937

2. Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC.

Authors: Ahmad Hassan Butt; Nouman Rasool; Yaser Daanial Khan
Journal: Mol Biol Rep Date: 2018-09-20 Impact factor: 2.316

Review 3. Structural Variability in the RLR-MAVS Pathway and Sensitive Detection of Viral RNAs.

Authors: Qiu-Xing Jiang
Journal: Med Chem Date: 2019 Impact factor: 2.745

4. Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features.

Authors: Wakil Ahmad; Easin Arafat; Ghazaleh Taherzadeh; Alok Sharma; Shubhashis Roy Dipta; Abdollah Dehzangi; Swakkhar Shatabda
Journal: IEEE Access Date: 2020-04-22 Impact factor: 3.367

5. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome.

Authors: Fuyi Li; Chen Li; Tatiana T Marquez-Lago; André Leier; Tatsuya Akutsu; Anthony W Purcell; A Ian Smith; Trevor Lithgow; Roger J Daly; Jiangning Song; Kuo-Chen Chou
Journal: Bioinformatics Date: 2018-12-15 Impact factor: 6.937

6. Structural insights of dipeptidyl peptidase-IV inhibitors through molecular dynamics-guided receptor-dependent 4D-QSAR studies.

Authors: Rajesh B Patil; Euzebio G Barbosa; Jaiprakash N Sangshetti; Vishal P Zambre; Sanjay D Sawant
Journal: Mol Divers Date: 2018-03-13 Impact factor: 2.943

7. LPI-NRLMF: lncRNA-protein interaction prediction by neighborhood regularized logistic matrix factorization.

Authors: Hongsheng Liu; Guofei Ren; Huan Hu; Li Zhang; Haixin Ai; Wen Zhang; Qi Zhao
Journal: Oncotarget Date: 2017-10-19

8. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach.

Authors: Kunqi Chen; Zhen Wei; Qing Zhang; Xiangyu Wu; Rong Rong; Zhiliang Lu; Jionglong Su; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal: Nucleic Acids Res Date: 2019-04-23 Impact factor: 16.971

9. RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis.

Authors: Kunqi Chen; Bowen Song; Yujiao Tang; Zhen Wei; Qingru Xu; Jionglong Su; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

10. Evaluating machine learning methodologies for identification of cancer driver genes.

Authors: Sharaf J Malebary; Yaser Daanial Khan
Journal: Sci Rep Date: 2021-06-10 Impact factor: 4.379