Literature DB >> 31048185

iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks.

Muhammad Tahir¹, Hilal Tayara², Kil To Chong³.

Abstract

Pseudouridine is the most prevalent RNA modification and has been found in both eukaryotes and prokaryotes. Currently, pseudouridine has been demonstrated in several kinds of RNAs, such as small nuclear RNA, rRNA, tRNA, mRNA, and small nucleolar RNA. Therefore, its significance to academic research and drug development is understandable. Through biochemical experiments, the pseudouridine site identification has produced good outcomes, but these lab exploratory methods and biochemical processes are expensive and time consuming. Therefore, it is important to introduce efficient methods for identification of pseudouridine sites. In this study, an intelligent method for pseudouridine sites using the deep-learning approach was developed. The proposed prediction model is called iPseU-CNN (identifying pseudouridine by convolutional neural networks). The existing methods used handcrafted features and machine-learning approaches to identify pseudouridine sites. However, the proposed predictor extracts the features of the pseudouridine sites automatically using a convolution neural network model. The iPseU-CNN model yields better outcomes than the current state-of-the-art models in all evaluation parameters. It is thus highly projected that the iPseU-CNN predictor will become a helpful tool for academic research on pseudouridine site prediction of RNA, as well as in drug discovery.

Entities: Chemical Disease Gene Species

Keywords: RNA; convolution neural network; deep learning; iPseU-CNN; pseudouridine sites

Year: 2019 PMID： 31048185 PMCID： PMC6488737 DOI： 10.1016/j.omtn.2019.03.010

Source DB: PubMed Journal: Mol Ther Nucleic Acids

Introduction

Pseudouridine (Ψ) is a common RNA modification that has been found in both eukaryotes and prokaryotes. Currently, Ψ has been demonstrated in various categories of RNAs. The Ψ synthase enzyme catalyzes Ψ, the isomer of uridine, by removing uridine residue base from its sugar followed by the isomer of uridine, rotating it 180° along the N3–C6 axis, and ultimately, again linking the base’s 5-carbon to the 1′-carbon of the sugar, as shown in Figure 1. Currently, Ψ modification is considered to be an important process in the molecular mechanism, including stabilization of the tRNA structure,3, 4 and is important for gene regulation machinery, i.e., in the spliceosome. The presence of Ψ modifications in regions involved with RNA-protein or RNA-RNA interaction enhances the reaction and assembly of the spliceosome that is responsible for producing a functional mRNA, i.e., in AU/AC intron splicing. Furthermore, incorporation of Ψ into mRNA may inhibit the RNA-elicited innate immune response and enhance the translation efficiency of that mRNA. Although many researchers have unveiled the role of Ψ modification in most RNA systems, its biological functions and action mechanisms have yet to be identified. Therefore, it is important to highlight the Ψ modification sites in the transcriptome that govern the related biological principle.

Figure 1

Illustration of the Pseudouridine Modification

Illustration of the Pseudouridine Modification Although some lab exploratory techniques have been introduced to identify Ψ sites, they are costly and labor intensive.7, 8, 9 Because of the increasing availability of genomics and proteomics samples produced in the post-genomics era, it is necessary to develop robust, fast, low-cost computational models to predict Ψ sites on the RNA sequence. In previous works, several machine-learning-based computational methods or statistical-learning techniques have been introduced to identify Ψ sites.10, 11, 12 Li et al. introduced a computational method, PPUS, for the identification of Ψ-synthase (PUS)-specific Ψ sites in Saccharomyces cerevisiae and Homo sapiens. The method used the support vector machine (SVM) for classification and nucleotides around Ψ as the features. Similarly, the identifying RNA Ψ (iRNA-PseU) method was introduced by Chen et al., for the identification of Ψ sites in Mus musculus, S. cerevisiae, and H. sapiens. This method combines the occurrence frequency density distributions of the nucleotides and their chemical properties into pseudo K-tuple nucleotide composition (PseKNC). Most recently, the Ψ identification (PseUI) model was developed by He et al. for identification of Ψ sites from RNA samples in M. musculus, S. cerevisiae, and H. sapiens. This model used five types of feature-extraction technique, including dinucleotide composition (DC), nucleotide composition (NC), position-specific dinucleotide propensity (PSDP), position-specific nucleotide propensity (PSNP), and pseudodinucleotide composition (PseDNC). Then, a sequential forward-feature-selection strategy was used to select a relevant feature combination and a support vector machine as a classifier.16, 17 More recently, PseKNC has been effectively and widely used in the predation of several RNA/DNA regulatory elements, such as the nucleosome-positioning sequence,18, 19 RNA modification sites,20, 21, 22 DNA recombination spots,23, 24 translation initiation site, promoter, and origin of replication.27, 28 Although the above studies have illustrated that PseKNC is one of the most often used feature-extraction techniques to formulate RNA/DNA sequences, all of them used type-I PseKNC, which mixes various physicochemical properties. Because various properties may play various roles, the type-II PseKNC could handle these variances and improve the description of sequences. Recently, type II PseKNC was used in various DNA element identification and achieved good results.29, 30 On the other hand, the main focus of our work was use of a deep-learning technique for automatically extracting the important features directly from the sequence itself for classification. The performance of the above predictors and methods can be further improved by proposing other robust machine-learning or deep-learning methods. The existing methods use hand-designed input features based on domain knowledge. However, the proposed system can automatically learn the features from RNA sequences by using a deep-learning technique. Deep learning has produced better outcomes in natural language processing, information retrieval, speech recognition, and image recognition.34, 35, 36 Recently, a large number of genomics methods and techniques have been introduced based on deep-learning mechanisms—for example, CNNclust, BiRen, iDeepS, RNA branch point prediction, alternative splicing site prediction, and iRNA-PseKNC(2methyl). We introduce an efficient computational architecture for prediction of Ψ sites, using machine-learning and deep-learning approaches. In machine learning, two simple feature-extraction techniques were used as baselines—n-gram and multivariate mutual information (MMI)—and SVM was used as the classifier. In deep learning, we used a convolution neural network (CNN) model. As shown in the result and discussion sections, the deep-learning method produced better outcomes than the machine-learning ones. The proposed prediction iPseU-CNN (identifying Ψ by convolutional neural networks) model is based on a CNN. It is an efficient and simple architecture for Ψ site prediction and is evaluated on three various training benchmark datasets and two independent testing benchmark datasets. The proposed model achieves a more efficient outcome than the current state-of-the-art methods published recently in the literature. To the best of our knowledge, the proposed iPseU-CNN prediction model is the first model, automatically capture important features from RNA sequences using CNN for identification of Ψ sites.

Results and Discussion

In recent studies, four statistical parameters, Matthews’s correlation coefficient (MCC), sensitivity (Sen), specificity (Sp), and accuracy (Acc), have been used to define the effectiveness and performance of the computational methods.43, 44, 45, 46, 47 These parameters are expressed as:In this work, we implemented two simple machine-learning baselines. These methods are based on using n-gram and MMI for feature extraction and SVM as a classifier. The n-gram and MMI feature-extraction techniques are simple and are used widely in many applications. Table 1 shows the success rate of n-gram, MMI, and the proposed iPseU-CNN. It can be seen that the n-gram-based method outperformed the MMI-based one in the H. sapiens (H)_990, S. cerevisiae (S)_628, and M. musculus (M)_944 datasets. However, the CNN-based method markedly outperformed both machine-learning-based techniques. More specifically, iPseU-CNN improved accuracy by 6.68%, sensitivity by 13.49%, and MCC by 0.14 in the H_990 dataset. On the other hand, iPseU-CNN improved the performance of the S_628 dataset by 5.42%, 9.63%, and 0.12 in terms of accuracy, specificity, and MCC, respectively. Furthermore, iPseU-CNN improved the performance of the M_944 dataset by 9.1%, 9.75%, 8.73%, and 0.19 in terms of accuracy, sensitivity, specificity, and MCC, respectively. Thus, it is clear that the proposed iPseU-CNN predictor outperforms the baseline machine-learning methods.

Table 1

The Success Rates of iPseU-CNN and the Baseline Methods with the Training Datasets

Training Dataset	Methods	Accuracy (%)	Sensitivity (%)	Specificity (%)	MCC
H_990	n-gram	60.00	51.51	68.48	0.20
	MMI	58.78	47.47	70.10	0.18
	CNN	66.68	65.00	68.78	0.34
S_628	n-gram	62.73	64.64	60.82	0.25
	MMI	60.19	67.51	52.86	0.20
	CNN	68.15	66.36	70.45	0.37
M_944	n-gram	62.71	65.04	60.38	0.25
	MMI	58.26	63.13	53.38	0.16
	CNN	71.81	74.79	69.11	0.44

The Success Rates of iPseU-CNN and the Baseline Methods with the Training Datasets The prediction outcomes of the iPseU-CNN model were measured on two independent datasets, i.e., S_200 and H_200, and are illustrated in Table 2. We showed experimentally that the success rate of our iPseU-CNN model based on deep learning was better than that of the machine-learning baseline methods. More specifically, iPseU-CNN method improved the accuracy, sensitivity, and MCC on H_200 dataset by 2%, 19.72%, and 0.05, respectively. On the other hand, the success rates of the S_200 dataset were improved by 3%, 6.82%, and 0.06 in terms of accuracy, specificity, and MCC, respectively.

Table 2

The Success Rates of iPseU-CNN and the Baseline Methods with Two Independent Testing Datasets

Testing Dataset	Methods	Accuracy (%)	Sensitivity (%)	Specificity (%)	MCC
H_200	n-gram	67.00	57.00	78.00	0.35
	MMI	63.50	58.00	69.00	0.27
	CNN	69.00	77.72	60.81	0.40
S_200	n-gram	70.50	70.00	71.00	0.41
	MMI	69.50	72.00	67.00	0.39
	CNN	73.50	68.76	77.82	0.47

The Success Rates of iPseU-CNN and the Baseline Methods with Two Independent Testing Datasets It is clear that that the CNN-based approach outperforms the machine-learning-based approaches with a big margin in the different evaluation metrics as shown in Tables 1 and 2 and Figure 2.

Figure 2

The Success Rates of the iPseU-CNN and Baseline Methods

The Success Rates of the iPseU-CNN and Baseline Methods Finally, the prediction performance comparison of the iPseU-CNN model with the existing methods, such as iRNA-PseU and PseUI, is shown in Table 3. iRNA-PseU combines the occurrence frequency density distributions of the nucleotides and their chemical properties into PseKNC for feature extraction to identify Ψ sites. PseUI uses five feature-extraction techniques to identify Ψ sites.

Table 3

The Success Rates of iPseU-CNN and State-of-the-Art Methods with the Training Datasets

Training Dataset	Models	Accuracy (%)	Sensitivity (%)	Specificity (%)	MCC
H_990	iPseU-CNN	66.68	65.00	68.78	0.34
	PseUI	64.24	64.85	63.64	0.28
	iRNA-PseU	60.40	61.01	59.80	0.21
S_628	iPseU-CNN	68.15	66.36	70.45	0.37
	PseUI	65.13	62.74	67.52	0.30
	iRNA-PseU	64.49	64.65	64.33	0.29
M_944	iPseU-CNN	71.81	74.79	69.11	0.44
	PseUI	70.44	79.87	70.34	0.41
	iRNA-PseU	69.07	73.31	64.83	0.38

The Success Rates of iPseU-CNN and State-of-the-Art Methods with the Training Datasets The results in Table 3 show that the iPseU-CNN model improved all evaluation metrics for the H_990 dataset by 2.44%, 0.15%, 5.14%, and 0.06 in terms of accuracy, sensitivity, specificity, and MCC, respectively. In addition, iPseU-CNN improved all evaluation metrics for the S_628 dataset by 1.71%, 3.02%, 2.93%, and 0.07 in terms of specificity, sensitivity, accuracy, and MCC, respectively, and it improved accuracy and MCC for the M_944 dataset by 1.37% and 0.03, respectively. Furthermore, the performance of iPseU-CNN on independent datasets has been compared with those of iRNA-Pse and PseUI, as given in Table 4. It can be observed that the iPseU-CNN model improved all evaluation metrics for the S_200 dataset by 5.82%, 3.76%, 5%, and 0.1 in terms of specificity, sensitivity, accuracy, and MCC, respectively, and it improved accuracy, sensitivity and MCC for the H_200 dataset by 3.5%, 14.72%, and 0.09, respectively.

Table 4

The Success Rates of the iPseU-CNN and State-of-the-Art Methods with Two Independent Testing Datasets

Testing Dataset	Models	Accuracy (%)	Sensitivity (%)	Specificity (%)	MCC
H_200	iPseU-CNN	69.00	77.72	60.81	0.40
	PseUI	65.50	63.00	68.00	0.31
	RNA-PseU	61.50	58.00	65.00	0.23
S_200	iPseU-CNN	73.50	68.76	77.82	0.47
	PseUI	68.50	65.00	72.00	0.37
	iRNA-PseU	60.00	63.00	57.00	0.20

The Success Rates of the iPseU-CNN and State-of-the-Art Methods with Two Independent Testing Datasets It is clear that the CNN-based approach outperforms the current predictors in different evaluation metrics, as displayed in Tables 3 and 4 and Figure 3.

Figure 3

The Success Rates of the iPseU-CNN and State-of-the-Art Methods

The Success Rates of the iPseU-CNN and State-of-the-Art Methods Recently, the main direction of bioinformatics applications is in preparing databases48, 49 and establishing efficient web servers.22, 50 Therefore, our future work is to improve the performance and build a user-friendly web server for our developed tools. To conclude, we developed a deep-learning mechanism to identify Ψ sites from RNA samples—namely, iPseU-CNN. Machine-learning and deep-learning mechanisms were used; however, the performance of the deep-learning approach outperformed the machine-learning ones. We applied n-gram and MMI to extract the features in the machine-learning approach and SVM for classification. The deep-learning approach used a CNN model. The iPseU-CNN model automatically learned the features from RNA sequences compared with previous works that employ handcrafted features for classification. The proposed iPseU-CNN prediction model is the first model to full automatically capture important feature from RNA sequences using CNNs for identification of Ψ sites. The success rate indicates that the proposed prediction model is more stable and accurate than the current methods in terms of evaluation parameters. It is highly expected that the iPseU-CNN prediction model may be helpful in drug-related applications and academia.

Materials and Methods

We introduce the proposed model and benchmark datasets used for training and testing.

The Proposed Model

We introduce an efficient computational architecture for prediction of Ψ sites using machine-learning and deep-learning approaches. In machine-learning approaches, we used two different feature spaces, MMI and n-gram,51, 52 to extract the numerical features from RNA samples and SVM as an operation engine. Second, a deep-learning approach uses CNNs to identify Ψ sites from RNA/DNA samples directly. The CNN model automatically captures the key features from the input samples during training.

Machine-Learning Approach

We selected simple feature-extraction methods to work as baselines for comparison with the proposed deep-learning method.

n-gram

In this feature-extraction technique, n-gram is expressed as (vi, ci), where vi represents the feature and ci represents the total number of this feature in the protein or DNA/RNA sample. For instance, in the case of 3-g, v represents the three-nucleotide combination set and c represents the total number of combination occurrences inside the complete sequence. In this work, we constructed a feature vector containing from 1-g to 3-g. The n-gram can be mathematically expressed as:where S represents the combination list of nucleotides, S1, S2, and S3, with the 41, 42, and 43 features, respectively, and generates an 84-dimensional vector.

MMI

In prior work,54, 55, 56, 57 MMI has been widely adopted in protein samples to extract features. In the same manner, the nucleotide samples in RNA/DNA can be represented using the MMI feature-extraction technique. In this method, the RNA/DNA samples are represented by 2-tuple and 3-tuple as follows:There is no relationship with the order of the nucleotides for the MMI in a tuple. The K2 has 10 elements and K3 has 20 elements. The 2-tuple mutual information (MI) for the nucleotide pair in K2 can be defined as below:The 3-tuple MI for the nucleotide pair in K3 can be defined as below:where is a fraction of each nucleotide in the sequence and and are the occurrence frequency of 2-tuple and 3-tuple, respectively.

SVM

SVM is a learning tool for regression, classification, and pattern recognition. It has achieved more efficient results than other machine-learning methods or techniques.47, 58, 59, 60 In the current study, the LIBSVM package was used for implementing the SVM model, in which the radial basis function (RBF) was used as the kernel function. The kernel of RBF includes two parameters, g and c, that are set to 5.5 and 0.0035, respectively. The concrete values of these parameters are determined through the optimization procedure called a grid-search algorithm on the benchmark dataset.61, 62, 63, 64, 65

Deep-Learning Approach

We used a CNN to predict Ψ sites from RNA/DNA samples, and during training, it automatically searched the key features in the input samples. The CNN model took a single RNA sequence as an input (n = 21 for the M_944 and H_900 datasets and n = 31 for the S_628 dataset) and produced a real value. The input is represented by a one-hot vector with four channels A, C, G, and U. Its length depends on the value of n. For more details, A is denoted by (1 0 0 0), C is denoted by (0 1 0 0), G is denoted by (0 0 1 0), and U is denoted by (0 0 0 1). Figure 4 illustrates the architecture of the proposed CNN model.

Figure 4

Illustration of the Architecture of the iPseU-CNN Model

Illustration of the Architecture of the iPseU-CNN Model A one-step process in deep learning is represented by a layer that could be a convolution layer, a pooling layer, a normalization layer, a ReLU layer, a dropout layer, a loss layer, or a fully connected layer. The grid-search method was used for selecting the best-performing hyper-parameters. The tuned parameters are the number of filters, number of convolution layers, size of the filters, the strides, and the dropout probability. For the proposed CNN model, the list of tuned hyper-parameters is shown in Table 5.

Table 5

The Ranges of the Tuned Hyper-Parameters

Hyper-Parameter	Range
Convolution layers	[1,2]
Filters	[5,7,9]
Filter size	[3,5,7]
Stride	[1,2]
Dropout	[0.25, 0.50]

The Ranges of the Tuned Hyper-Parameters The best parameters were selected based on validation loss. The sigmoid function outputs normalized class probabilities for a given input. The convolution layer is mathematically represented and computed aswhere R represents the input of the RNA sample, f denotes the index of the filter, and j denotes the index of the output position. Each filter W is an S × N weight matrix of size S channels of N. The rectified linear function (ReLU) is expressed as:The output layer is transformed to [0, 1] by a sigmoid function that is used for Ψ sites predictions.In this study, the Keras framework was used to implement the iPseU-CNN model. The Adam optimizer with a learning rate of 0.001 was used, epochs were set to 50, and the batch size was set to 10.

Benchmark Datasets

In this study, three different benchmark datasets—M_944, S_628, and H_990—were used for training, where M, S, and H denoted M. musculus, S. cerevisiae, and H. sapiens, respectively, and each dataset contained 944, 628, and 990 samples, respectively. These three benchmark datasets of pseudouridylation sites were taken from the additional materials of Chen et al., who also introduced two various independent testing datasets for S. cerevisiae and H. sapiens denoted S_200 and H_200, respectively. The H_990, M_944, and S_628 datasets consisted of 495, 472, and 314 positive subsets of RNA samples, and every RNA sample had a uridine at the center position that could be pseudouridylated. Similarly, H_990, M_944, and S_628 datasets contained 495, 472, and 314 negative subsets of RNA samples, and each RNA sample had a uridine at the center position, but it could not be pseudouridylated. The RNA sample of these three datasets can be mathematically formulated as:where represents the RNA sample, the center U denotes uridine, denotes the upstream and denotes the downstream of the central uridine for all ξ-th elements. In H_990 and M_944 datasets, the length of each RNA sample was 21 nt, whereas in the S_628 dataset, the length of each RNA samples was 31 nt. Specifically, the value of ξ was 15 and the length of the RNA sample was 1 + 2 × 15 for the S_628 dataset. On the other hand, the value of ξ is 10 and the length of the RNA samples was 1 + 2 × 10 for the M_944 and H_900 datasets.

Cross-Validation

The error rate used in the machine- and deep-learning methods to evaluate the performance of the operation engine. In this regard, the dataset was divided into different mutually exclusive folds. In this work, we used a k-fold cross-validation test where a particular dataset can be divided into k-fold for cross-validation.61, 62, 67, 68 In this type of validation test, for the testing purpose, 1-fold was reserved, whereas for training a particular model, the remaining k − 1 folds were used. This is a k-time recursive process where every fold is tested once.62, 69 We applied a 5-fold cross-validation test to measure the four performance parameters.

Author Contributions

Conceptualization, M.T. and H.T.; Methodology, M.T. and H.T.; Investigation, M.T., H.T., and K.T.C.; Writing – Original Draft, M.T. and H.T.; Writing – Review & Editing, M.T., H.T., and K.T.C.; Visualization, M.T., and H.T.; Supervision, K.T.C.

Conflicts of Interest

The authors declare no competing interests.

55 in total

1. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.

Authors: Hao Lin; En-Ze Deng; Hui Ding; Wei Chen; Kuo-Chen Chou
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971

2. Pro54DB: a database for experimentally verified sigma-54 promoters.

Authors: Zhi-Yong Liang; Hong-Yan Lai; Huan Yang; Chang-Jian Zhang; Hui Yang; Huan-Huan Wei; Xin-Xin Chen; Ya-Wei Zhao; Zhen-Dong Su; Wen-Chao Li; En-Ze Deng; Hua Tang; Wei Chen; Hao Lin
Journal: Bioinformatics Date: 2017-02-01 Impact factor: 6.937

3. Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine.

Authors: Maqsood Hayat; Nadeem Iqbal
Journal: Comput Methods Programs Biomed Date: 2014-06-21 Impact factor: 5.428

4. Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou's PseAAC.

Authors: Maqsood Hayat; Asifullah Khan
Journal: Protein Pept Lett Date: 2012-04 Impact factor: 1.890

5. Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA.

Authors: Schraga Schwartz; Douglas A Bernstein; Maxwell R Mumbach; Marko Jovanovic; Rebecca H Herbst; Brian X León-Ricardo; Jesse M Engreitz; Mitchell Guttman; Rahul Satija; Eric S Lander; Gerald Fink; Aviv Regev
Journal: Cell Date: 2014-09-11 Impact factor: 41.582

6. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier.

Authors: Wang-Ren Qiu; Bi-Qian Sun; Xuan Xiao; Zhao-Chun Xu; Jian-Hua Jia; Kuo-Chen Chou
Journal: Genomics Date: 2017-11-17 Impact factor: 5.736

7. iRNA-PseU: Identifying RNA pseudouridine sites.

Authors: Wei Chen; Hua Tang; Jing Ye; Hao Lin; Kuo-Chen Chou
Journal: Mol Ther Nucleic Acids Date: 2016

8. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC.

Authors: Pengmian Feng; Hui Ding; Hui Yang; Wei Chen; Hao Lin; Kuo-Chen Chou
Journal: Mol Ther Nucleic Acids Date: 2017-03-29

9. Convolutional neural networks for classification of alignments of non-coding RNA sequences.

Authors: Genta Aoki; Yasubumi Sakakibara
Journal: Bioinformatics Date: 2018-07-01 Impact factor: 6.937

10. Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides.

Authors: Graham A Hudson; Richard J Bloomingdale; Brent M Znosko
Journal: RNA Date: 2013-09-23 Impact factor: 4.942

13 in total

1. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.

Authors: Kewei Liu; Wei Chen; Hao Lin
Journal: Mol Genet Genomics Date: 2019-08-07 Impact factor: 3.291

2. Penguin: A tool for predicting pseudouridine sites in direct RNA nanopore sequencing data.

Authors: Doaa Hassan; Daniel Acevedo; Swapna Vidhur Daulatabad; Quoseena Mir; Sarath Chandra Janga
Journal: Methods Date: 2022-02-16 Impact factor: 4.647

3. RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis.

Authors: Kunqi Chen; Bowen Song; Yujiao Tang; Zhen Wei; Qingru Xu; Jionglong Su; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

4. Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning.

Authors: Hilal Tayara; Kil To Chong
Journal: Cells Date: 2019-12-14 Impact factor: 6.600

5. DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

Authors: Abdul Wahab; Hilal Tayara; Zhenyu Xuan; Kil To Chong
Journal: Sci Rep Date: 2021-01-08 Impact factor: 4.379

6. XG-ac4C: identification of N4-acetylcytidine (ac4C) in mRNA using eXtreme gradient boosting with electron-ion interaction pseudopotentials.

Authors: Waleed Alam; Hilal Tayara; Kil To Chong
Journal: Sci Rep Date: 2020-12-01 Impact factor: 4.379