Literature DB >> 32435427

Computational identification of N6-methyladenosine sites in multiple tissues of mammals.

Fu-Ying Dao¹, Hao Lv¹, Yu-He Yang¹, Hasan Zulfiqar¹, Hui Gao¹, Hao Lin¹.

Abstract

N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Feature extraction and selection; RNA modification; Support vector machine; Webserver; m6A

Year: 2020 PMID： 32435427 PMCID： PMC7229270 DOI： 10.1016/j.csbj.2020.04.015

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

RNA modification occurs in all living organisms, and is one of the most evolutionarily conserved properties of RNAs [1]. It is critical post-transcriptional regulator for gene expression and can affect the activity, localization as well as stability of RNAs. Studies have demonstrated that RNA modification correlates with various of diseases [2]. A recent noteworthy example is N6-methyladenosine (m6A), which could affect the translation and stability of the modified transcripts, thus providing a mechanism to coordinate the regulation of groups of transcripts during cell state maintenance and transition [3]. m6A refers to methylation of the adenosine base at the nitrogen-6 position. It is dynamically reversible and can be regulated in time and space by methyltransferases and demethylases. The distribution of m6A is nonrandom and asymmetric in a way that majority of m6A sites are highly enriched within CDS, 3′ UTR, stop codon, and long introns [4], and are also found in long non-coding RNAs [5]. m6A is one of the most common and abundant modifications on RNA molecules present in eukaryotes [6]. It has been recognized as the most prominent in its range of the regulation functions in eukaryotic mRNA, leading to the significant efforts paid particularly in recent years with invention and application of high-throughput sequencing [7], [8] as well as advances in modern molecular and genetic technologies. Correct recognition of m6A sites contributes to elucidate the biological functions of m6A and the underlying mechanisms. However, the limitations including expensive experimental materials and long experimental period of high-throughput sequencing and wet experiments make it difficult to identify m6A sites at a whole-genome scale. Therefore, computational tools are required to accurately identify m6A modification sites and to help reduce the costs associated with high-throughput sequencing. Recent years, with the development of bioinformatics and the accumulation of biological experimental data, some computational predictors have been developed to recognize m6A sites in eukaryotic organism [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. More than 20 computational approaches developed for identifying m6A sites based on sequences have been summarized in recent review [22]. They introduced prediction model construction in a variety of aspects, including benchmark dataset construction, features employed and software availability and utility. Despite significant research efforts being devoted to the development of computational methods for RNA-modification site prediction, to our best of knowledge, few computational tools were developed especially for predicting m6A in different tissues. In view of the aforementioned descriptions, the present study is devoted to developing a computational tool that can identify m6A modification sites in various tissues of human, mouse and rat. We firstly collected experimentally confirmed m6A sequences and non-m6A sequences to build benchmark dataset based on the experiment results by Zhang et al. [23]. Subsequently, three kinds of sequence encoding features algorithms were proposed to formulate samples. mRMR was proposed to optimize these features. Then, the obtained optimal features were inputted into the SVM to discriminate m6A sequences from non-m6A sequences. The independent datasets were performed to investigate the prediction capability of the proposed method. Finally, on the basis of the proposed method, we established an ensemble predictor called iRNA-m6A. The flowchart of this work is shown in Fig. 1.

Fig. 1

Overall framework of iRNA-m6A.

Materials and methods

Benchmark dataset

Constructing an objective and rigorous benchmark dataset is a key step for establishing a reliable and robust model of m6A sites prediction [24]. Zhang et al. [23] developed m6A-REF-seq protocol to identify the modification sites in the different tissues of human (brain, liver, and kidney), mouse (brain, liver, heart, testis, and kidney) and rat (brain, liver, and kidney). This method is an antibody-independent, high-throughput, and single-base detection method based on m6A sensitive RNA endoribonuclease, which provides a new perspective for single-based m6A identification at the transcriptome level. Due to the high quality of these data, the benchmark dataset was also constructed on such data and download form the paper of Zhang et al. [23]. To further improve the quality of the data, we only selected the fragment that the length of the segment is 41 nt with the m6A site in the center as positive samples. To avoid redundancy and reduce homology bias, positive samples with more than 80% sequence similarity were removed using the CD-HIT program [25], [26]. The negative samples (non-m6A sites) for the above mentioned tissues in three genomes were collected by satisfying the requirement that the 41 nt long sequences with Adenine in the center. At the same time, these samples were not proved to be methylated by experiments. By doing so, large number of negative samples were obtained. If a model is established on an unbalanced benchmark dataset, its performance will bias [27]. Thus, we randomly extracted negative samples with the same number of positive samples in each of the tissues. To objectively evaluate the proposed models, we separated the dataset into two parts: one is used to train the model, another is independent dataset for examining the performance of the proposed models. Details about these benchmark datasets were shown in Table 1.

Table 1

The benchmark datasets for predicting RNA m6A sites.

Species	Tissues	Positive		Negative
Species	Tissues	Training	Testing	Training	Testing
Human	Brain	4605	4604	4605	4604
	Liver	2634	2634	2634	2634
	Kidney	4574	4573	4574	4573
Mouse	Brain	8025	8025	8025	8025
	Liver	4133	4133	4133	4133
	Kidney	3953	3952	3953	3952
	Heart	2201	2200	2201	2200
	Testis	4704	4706	4707	4706
Rat	Brain	2352	2351	2352	2351
	Liver	1762	1762	1762	1762
	Kidney	3433	3432	3433	3432

The benchmark datasets for predicting RNA m6A sites.

Sample formulation

Most of machine learning methods can only handle the data with same vector [28], [29], [30], [31], [32], thus, we applied diverse feature extraction algorithms to encode the RNA m6A site sequences describes as follows.

Physical-chemical property matrix

The first feature extraction algorithm applied in this paper is physical-chemical property matrix which used physicochemical properties of dinucleotides to characterize RNA sequences [33], [34], [35]. Suppose the length of an RNA sequence is L nt as following formula: There are 44 = 16 different dimers in an RNA sequence. Each dimer in a RNA sequence has different physical-chemical (pc) properties. In the study, we considered six pc properties [36]: (1) pc1: rise; (2) pc2: roll; (3) pc3: shift; (4) pc4: slide; (5) pc5: tilt; (6) pc6: twist, which can be obtained from http://lin-group.cn/server/iRNA-m6A/download. Finally, a RNA sequence sample can be transformed into PC matrix as following. Based on Eq. (2), auto-covariance (AC) and cross-covariance (CC) [37] were used to transform the matrix to a length-fixed feature vector. According to the concept of AC, the value was defined as the number of dinucleotides to separate two subsequences for the same pc property, which can be expressed as:where m is the number of pc property including 1, 2, …, 6. λ is an integer between 0 and L − 1. is the mean of the data along the mth row in the matrix of Eq. (3), as given by As we can see from Eq. (3), by means of the auto-covariance approach, we can generate 6 components associated with the physical-chemical properties of an RNA sample in Eq. (1). According to the concept of CC, the correlation between two subsequences each belonging to a different PC property can be formulated bywhere = 1, 2…, 6; =1, 2…, 6 and ≠ . So that there are 65 components associated with the physical-chemical properties of an RNA sample in Eq. (1). According to the formulas of auto-covariance and cross-covariance, a RNA sequence sample can generate a vector of (6 + 65) = 36 dimension.

Mono-nucleotide binary encoding

The second feature extraction technique is to transfer nucleotide to a string of characters which is consisted by 0 and 1 formulated as: For example, the RNA sequence ‘GGAUUCGA’ can be expressed as [00100010 … .…1000]T. Therefore, a RNA sample of 41 nt in length is converted into a 164 (441) dimension vector in this study.

Nucleotide chemical property

The third feature description strategy used three coordinates (x, y, z) to represent the chemical properties of the four nucleotides, and the value of 0 and 1 was assigned to the three coordinates [38]. The x coordinate stands for the ring structure, y for the hydrogen bond, and z for the chemical functionality, a nucleotide in RNA sequence can be encoded by , where Therefore, A, C, G and U can be represented by the coordinates (1, 1, 1), (0, 0, 1), (1, 0, 0) and (0, 1, 0), respectively. Furthermore, the density of nucleotide for extracting nucleotide composition surrounding the modification sites was defined aswhere L is the sequence length, |N| is the length of the ith prefix string in the sequence. From what has been discussed above, each nucleotide was presented by chemical properties and nucleotide frequency, which was converted into a 4-dimensional vector. Accordingly, a RNA sample of L nt long will be encoded by a (4L) dimensional vector.

Model training

Support vector machines (SVM) is a binary classification model and a supervised machine learning method based on statistical learning theory [39], [40], [41], [42], [43], which is widely employed in the recent bioinformatics researches [44], [45], [46], [47], [48], [49], [50], [51], [52]. The basic principle of SVM is to transform the input vector into a high-dimension Hilbert space and find a separating hyperplane to separate samples of different categories. SVM has rigorous mathematical theories, which makes it show the incomparable superiority of other algorithms in solving small sample and high-dimensional data problems [53], [54], [55], [56], [57], [58], [59]. In this study, the implementation of the SVM was conducted by the open source software library LIBSVM developed by Lin [60], which can be downloaded from the website (www.csie.ntu.edu.tw/~cjlin/libsvm). We chose the radial basis kernel function (RBF) to obtain the classification hyperplane, and used the grid search method to optimize the regularization parameter c and kernel parameter g based on 5-fold cross-validation test.

Feature selection technique

High dimension vector may lead to the large calculation, overfitting and low robust of proposed model [61], [62]. Consequently, feature selection is an indispensable step to exclude noise and improve computational efficiency of the proposed models [63], [64], [65]. We applied mRMR algorithm to acquire optimal feature subset. The mRMR is performed easily and efficiently as well as could achieve robust model. It is a filter-based feature selection method proposed by Peng et al. [66]. The probability density functions are defined as p(x) and p(y) for corresponding two random variables x and y, and p (x, y) is the joint probability density. The mutual information between them can be defined as According to mutual information, finding a feature subset S with m optimal features is the purpose of feature screening that has the largest dependency on the target class c. The maximum relevance has the following form: The minimum redundancy is defined as: The final selection criteria are formulated as: It can be seen the essence of mRMR is to use a standard (relevance-redundancy) to rank features to obtain the purest feature subset.

Evaluation metrics

The following indexes [67], [68], [69], [70]: sensitivity (Sn), specificity (Sp), overall accuracy (Acc), and Matthews correlation coefficient (MCC) [71], [72], [73] were used to objectively evaluate the performance of proposed models defined as Eq. (13). In addition, the AUC (area under the receiver operating characteristic curve) was also calculated to objectively evaluate the proposed model [74]. The AUC ranges from 0 to 1. A model with a higher AUC indicates a better performance. Cross-validation test is a statistical analysis method for assessing a classifier [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85]. The basic idea of cross-validation is that the dataset is divided into several data subsets, in which one is used as testing set and the remained subsets as training set. Using training set to train classifier and testing set is used to test the obtained model. This process is repeated utile all data subset was selected as testing set. In this study, to save computational time and source, we used 5-fold cross-validation to examine the anticipated success rates of the predictor on training data. Once the model was established, the independent data was used to evaluate the performance of the model.

Results and discussion

Sequence composition analysis

The potential oligonucleotide distribution patterns of sequences around modification site is an effective step to understand why the site is modified and reveal the biological functions of modifications [86]. In this work, the tool Two Sample Logos [87] (http://www.twosamplelogo.org/cgi-bin/tsl/tsl.cgi) was used to investigate the nucleotide distribution surrounding m6A sites. Fig. 2 was plotted to show the statistical difference of nucleotide occurrence between positive and negative samples by Two Sample Logos for different tissues of three species. In each figure, the top panel of the x axis is for m6A site-containing sequences, whereas the bottom panel of the x axis is for non-m6A site-containing sequences. As shown in Fig. 2, the m6A sequences are significantly different (t test, p value < 0.05) from non-m6A samples in terms of nucleotide distribution. In addition, the flanking sequences of m6A among three species of different tissues all reveal some bias toward GC-rich elements but the flanking of non-m6A are AU-rich regions. Thus, it is reasonable to extract the information of the sequences to construct m6A classification model.

Fig. 2

The nucleotide distribution surrounding m6A and non-m6A sites.

Classification models building

According to the data and features described in the materials and methods, we built models for m6A identification following three steps: First, determining the optimal parameter of in physical-chemical property matrix. For each dataset, we calculated and compared the results by changing from 1 to 5 by using SVM in 5-fold cross-validation test. Then, the best value can be determined. Second, building classification models based on the fusion features descripted by three feature extraction methods [88], [89]. We fused these features extracted by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. And 11 classification models were constructed by using SVM in 5-fold cross-validation test. We noticed that the prediction accuracies of these models are almost concentrated in the range of 70% to 80%, and the values of AUC are between 0.75 and 0.90. Consequently, we looked forward to further improving the performance of models through feature selection. Third, selecting the best features by using mRMR. We used mRMR algorithm to calculate the contribution value of each feature, and ranked the features according to the contribution values from large to small. Based on the incremental feature selection (IFS) strategy, we could obtain the optimal feature subsets for different tissues which could produce the maximum accuracies. The performance metrics of the final models obtained after the feature screening were exhibited in Table 2 and corresponding ROC curves were plotted in Fig. 3. Compared with original results, the prediction performances were not significantly improved for the most of new models. However, the dimension of the optimal feature subsets has been greatly reduced to reach the goal of eliminating the redundant features and reducing calculation time. Therefore, the 11 final prediction models were constructed after feature selecting by mRMR.

Table 2

The performance of models before and after feature selection.

Species	Tissues	lambda	mRMR	Dimension	Acc (%)	Sn(%)	Sp (%)	MCC	AUC
Human	Brain	2	No	400	70.97	73.81	67.56	0.41	0.7789
	Brain	2	Yes	206	71.26	74.79	66.19	0.41	0.7756
	Liver	3	No	436	79.42	79.65	78.63	0.58	0.8683
	Liver	3	Yes	126	80.13	81.32	78.13	0.59	0.8738
	Kidney	2	No	400	78.50	80.72	76.83	0.58	0.8658
	Kidney	2	Yes	92	78.99	80.85	76.34	0.57	0.8634
Mouse	Brain	2	No	400	78.13	79.81	76.45	0.56	0.8612
	Brain	2	Yes	129	78.75	79.32	76.90	0.58	0.8701
	Liver	2	No	400	70.26	75.39	65.81	0.41	0.7781
	Liver	2	Yes	86	70.59	74.93	65.59	0.41	0.7743
	Kidney	2	No	400	79.70	81.18	77.84	0.59	0.8777
	Kidney	2	Yes	184	79.98	82.60	77.31	0.60	0.8726
	Heart	2	No	400	72.19	73.78	69.15	0.43	0.7896
	Heart	2	Yes	88	72.76	75.24	68.97	0.44	0.7948
	Testis	4	No	472	74.05	77.42	70.43	0.48	0.8190
	Testis	4	Yes	97	74.40	78.14	70.02	0.48	0.8156
Rat	Brain	2	No	400	75.06	76.06	72.79	0.49	0.8245
	Brain	2	Yes	72	75.96	77.00	73.47	0.50	0.8282
	Liver	3	No	436	80.05	82.92	77.30	0.60	0.8758
	Liver	3	Yes	109	80.90	83.09	76.33	0.60	0.8766
	Kidney	4	No	472	81.11	82.70	79.03	0.62	0.8839
	Kidney	4	Yes	124	81.78	82.46	80.05	0.63	0.8877

Fig. 3

The ROC curves for optimal feature subsets of 11 final models.

The performance of models before and after feature selection. The ROC curves for optimal feature subsets of 11 final models.

Performance evaluation on independent dataset

To further investigate the robustness and stability of the proposed model, we established the independent datasets for each tissue as shown in Table 1. If the proposed model is suitable for the independent dataset, there is the minimal over fitting occurs. The examined results on 11 independent datasets generated by above models were listed in Table 3. We observed that the accuracies on independent datasets are similar to the results on training set by synthesizing all the evaluation metrics, indicating that our classification models are capable enough to identify the m6A sites for an unknown sequence.

Table 3

The generalization performance of our model on independent dataset.

Species	Tissues	Acc (%)	Sn (%)	Sp (%)	MCC	AUC
Human	Brain	71.1	69.50	72.98	0.42	0.7845
	Liver	79.01	78.19	79.87	0.58	0.8681
	Kidney	77.76	77.13	78.42	0.56	0.8565
Mouse	Brain	78.26	77.20	79.41	0.57	0.8613
	Liver	68.79	67.82	69.86	0.38	0.762
	Kidney	79.31	78.37	80.32	0.59	0.8697
	Heart	71.3	70.52	72.13	0.43	0.7878
	Testis	73.54	72.19	75.08	0.47	0.8182
Rat	Brain	75.14	73.93	76.48	0.50	0.8265
	Liver	79.85	77.74	82.31	0.60	0.8761
	Kidney	81.42	80.18	82.77	0.63	0.8968

The generalization performance of our model on independent dataset.

Cross-species/tissues validation

In the study, we collected 11 benchmark datasets of different tissues from three species. It is necessary to demonstrate whether a model trained with the data from one tissue could recognize the m6A sites in other tissues. Therefore, we applied the knowledge of transfer information [90] to study the relationships of interacting tissues and designed following experiment. The 11 tissues-specific models were first constructed by training datasets from 11 different tissues, respectively. Subsequently, for each model, the 11 tissues’ training datasets were regarded as independent testing datasets to evaluate the performance of the models. A heat map was drawn in Fig. 4 to describe the prediction performance of cross-species/tissues validation based on the AUC values. The models in rows were tested on the other datasets in columns. For the convenience to observe, the different tissues of same species were marked by black box dashed.

Fig. 4

The heat map showing the values of AUC in cross-tissues prediction. Once a tissues-specific model was established on its own training dataset in rows, it was validated on the data from the same tissue as well as the independent data from the other datasets in columns. Overall, there is a well-conserved distribution of m6A at the sequence level in mammals as all calculated AUC values were greater than 0.7 in the heat map. Especially, the datasets of human (liver and kidney), mouse (brain and kidney) and rat (brain, liver and kidney) have obtained superior results (AUCs > 0.8) in almost all models, which indicated the sequences of tissues in these species can be identified potential m6A sites in any models. However, when the 11 models were tested by using the human (brain), mouse (liver, heart and testis) as independent datasets, most of the AUC values produced were below 0.8. These results may be due to the differences of orthologous genes modified by m6A in different tissues of the three species [23].

Compared with published method

It is necessary to compare our proposed method with other published methods to highlight the superiority of the pipeline in this study. Considering the computing resources, the human and mouse benchmark datasets from iRNA-3typeA [21] are more suitable. According to the workflow in Fig. 1, the fusion features descripted by three feature extraction methods were obtained at first. Secondly, the best features set was selected by using mRMR. Third, the m6A classification models were built by SVM. Finally, we compared the results produced by iRNA-m6A with them obtained by iRNA-3typeA by jackknife test based on same benchmark datasets. Table 4 showed the comparison. It is obvious that iRNA-m6A is superior than iRNA-3typeA for identifying m6A. Therefore, the pipeline of this paper was further validated to be effective.

Table 4

Comparative results for identifying m6A on published database.

Species	Methods	Acc (%)	Sn (%)	Sp (%)	MCC
Human	iRNA-3typeA	90.38	81.68	99.11	0.82
Human	iRNA-m6A	97.12	94.34	99.91	0.94
Mouse	iRNA-3typeA	88.39	77.79	100.00	0.80
Mouse	iRNA-m6A	89.17	78.34	100.00	0.80

Comparative results for identifying m6A on published database.

Web-server

Based on the 11 benchmark datasets showed in Table 1, a predictor called iRNA-m6A was established. The step-by-step guide on the web-server is provided as follows: Step 1. Open the web-server at http://lin-group.cn/server/iRNA-m6A/service.html and you will see the webserver page. Click on the “Home” button to see a brief introduction about the server. Step 2. Select “Species” and corresponding “tissues” from the drop-down menu, input the query RNA sequences into the input box or directly upload the FASTA format file. Note that the length of each sequence should be greater than 41nt long. Step 3. Click the “Submit” button, the predicted results (Yes/No) will appear at a new page.

Conclusion

Because of the crucial roles of m6A in many biological processes [1], [2], [3], [4], [6], consequently, accurate identification of m6A sites in genome is essential for fundamentally revealing its regulatory mechanism and providing key clues for drug development as well [91]. The efficient and reliable computational methods can obtain high-precision prediction results and guide for wet-experimental scholars. In the present work, a new predictor, called iRNA-m6A, was developed to identify m6A sites in various tissues of different species, which included 11 m6A classification models based on SVM in 5-fold cross-validation test. Moreover, the results of independent dataset test demonstrated these proposed models were robust and reliable. Finally, we developed a webserver at http://lin-group.cn/server/iRNA-m6A, where users can submit RNA sequences in FASTA format and we can provide the potential m6A sites within the submitted RNA sequences. We anticipate the computational m6A identification platform will be useful for facilitating to reveal the functional mechanisms of m6A sites.

CRediT authorship contribution statement

Fu-Ying Dao: Methodology, Software, Visualization, Writing - original draft. Hao Lv: Conceptualization, Data curation, Methodology, Software. Yu-He Yang: Data curation, Methodology. Hasan Zulfiqar: Data curation. Hui Gao: Methodology, Writing - review & editing. Hao Lin: Conceptualization, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

82 in total

1. Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins.

Authors: Wei Chen; Fulei Nie; Hui Ding
Journal: Protein Pept Lett Date: 2020 Impact factor: 1.890

Review 2. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches.

Authors: Nantao Zheng; Kairou Wang; Weihua Zhan; Lei Deng
Journal: Curr Drug Metab Date: 2019 Impact factor: 3.731

3. Fold-LTR-TCP: protein fold recognition based on triadic closure principle.

Authors: Bin Liu; Yulin Zhu; Ke Yan
Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622

4. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.

Authors: Bin Liu; Xin Gao; Hanyu Zhang
Journal: Nucleic Acids Res Date: 2019-11-18 Impact factor: 16.971

5. PICALM rs3851179 Variant Confers Susceptibility to Alzheimer's Disease in Chinese Population.

Authors: Guiyou Liu; Yining Xu; Yongshuai Jiang; Liangcai Zhang; Rennan Feng; Qinghua Jiang
Journal: Mol Neurobiol Date: 2016-04-05 Impact factor: 5.590

6. Identifying N⁶-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine.

Authors: Pengwei Xing; Ran Su; Fei Guo; Leyi Wei
Journal: Sci Rep Date: 2017-04-25 Impact factor: 4.379

7. Predicting Potential Drugs for Breast Cancer based on miRNA and Tissue Specificity.

Authors: Liang Yu; Jin Zhao; Lin Gao
Journal: Int J Biol Sci Date: 2018-05-22 Impact factor: 6.580

8. TargetAntiAngio: A Sequence-Based Tool for the Prediction and Analysis of Anti-Angiogenic Peptides.

Authors: Vishuda Laengsri; Chanin Nantasenamat; Nalini Schaduangrat; Pornlada Nuchnoi; Virapong Prachayasittikul; Watshara Shoombuatong
Journal: Int J Mol Sci Date: 2019-06-17 Impact factor: 5.923

9. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles.

Authors: Xudong Zhao; Qing Jiao; Hangyu Li; Yiming Wu; Hanxu Wang; Shan Huang; Guohua Wang
Journal: BMC Bioinformatics Date: 2020-02-05 Impact factor: 3.169

10. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning.

Authors: Leyi Wei; Huangrong Chen; Ran Su
Journal: Mol Ther Nucleic Acids Date: 2018-07-09 Impact factor: 8.886

18 in total

1. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation.

Authors: Daiyun Huang; Kunqi Chen; Bowen Song; Zhen Wei; Jionglong Su; Frans Coenen; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal: Nucleic Acids Res Date: 2022-10-14 Impact factor: 19.160

2. Identification of Sub-Golgi protein localization by use of deep representation learning features.

Authors: Zhibin Lv; Pingping Wang; Quan Zou; Qinghua Jiang
Journal: Bioinformatics Date: 2020-12-26 Impact factor: 6.937

3. DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion.

Authors: Lu Zhang; Xinyi Qin; Min Liu; Ziwei Xu; Guangzhong Liu
Journal: Genes (Basel) Date: 2021-02-28 Impact factor: 4.096

4. Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network.

Authors: Qingwen Li; Lei Xu; Qingyuan Li; Lichao Zhang
Journal: Comput Math Methods Med Date: 2020-10-18 Impact factor: 2.238

5. Predicting Cell Wall Lytic Enzymes Using Combined Features.

Authors: Xiao-Yang Jing; Feng-Min Li
Journal: Front Bioeng Biotechnol Date: 2021-01-06

6. ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features.

Authors: Ting Liu; Jia-Mao Chen; Dan Zhang; Qian Zhang; Bowen Peng; Lei Xu; Hua Tang
Journal: Front Cell Dev Biol Date: 2021-01-08

7. Accurate identification of RNA D modification using multiple features.

Authors: Lijun Dou; Wenyang Zhou; Lichao Zhang; Lei Xu; Ke Han
Journal: RNA Biol Date: 2021-03-17 Impact factor: 4.652

8. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors: Rao Zeng; Song Cheng; Minghong Liao
Journal: Front Cell Dev Biol Date: 2021-05-10

9. im6A-TS-CNN: Identifying the N⁶-Methyladenine Site in Multiple Tissues by Using the Convolutional Neural Network.

Authors: Kewei Liu; Lei Cao; Pufeng Du; Wei Chen
Journal: Mol Ther Nucleic Acids Date: 2020-07-31 Impact factor: 8.886

10. Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features.

Authors: Xiao-Yang Jing; Feng-Min Li
Journal: Comput Math Methods Med Date: 2020-09-23 Impact factor: 2.238