| Literature DB >> 32435427 |
Fu-Ying Dao1, Hao Lv1, Yu-He Yang1, Hasan Zulfiqar1, Hui Gao1, Hao Lin1.
Abstract
N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.Entities:
Keywords: Feature extraction and selection; RNA modification; Support vector machine; Webserver; m6A
Year: 2020 PMID: 32435427 PMCID: PMC7229270 DOI: 10.1016/j.csbj.2020.04.015
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Overall framework of iRNA-m6A.
The benchmark datasets for predicting RNA m6A sites.
| Species | Tissues | Positive | Negative | ||
|---|---|---|---|---|---|
| Training | Testing | Training | Testing | ||
| Human | Brain | 4605 | 4604 | 4605 | 4604 |
| Liver | 2634 | 2634 | 2634 | 2634 | |
| Kidney | 4574 | 4573 | 4574 | 4573 | |
| Mouse | Brain | 8025 | 8025 | 8025 | 8025 |
| Liver | 4133 | 4133 | 4133 | 4133 | |
| Kidney | 3953 | 3952 | 3953 | 3952 | |
| Heart | 2201 | 2200 | 2201 | 2200 | |
| Testis | 4704 | 4706 | 4707 | 4706 | |
| Rat | Brain | 2352 | 2351 | 2352 | 2351 |
| Liver | 1762 | 1762 | 1762 | 1762 | |
| Kidney | 3433 | 3432 | 3433 | 3432 | |
Fig. 2The nucleotide distribution surrounding m6A and non-m6A sites.
The performance of models before and after feature selection.
| Species | Tissues | lambda | mRMR | Dimension | |||||
|---|---|---|---|---|---|---|---|---|---|
| Human | Brain | 2 | No | 400 | 70.97 | 73.81 | 67.56 | 0.41 | 0.7789 |
| Yes | 206 | 71.26 | 74.79 | 66.19 | 0.41 | 0.7756 | |||
| Liver | 3 | No | 436 | 79.42 | 79.65 | 78.63 | 0.58 | 0.8683 | |
| Yes | 126 | 80.13 | 81.32 | 78.13 | 0.59 | 0.8738 | |||
| Kidney | 2 | No | 400 | 78.50 | 80.72 | 76.83 | 0.58 | 0.8658 | |
| Yes | 92 | 78.99 | 80.85 | 76.34 | 0.57 | 0.8634 | |||
| Mouse | Brain | 2 | No | 400 | 78.13 | 79.81 | 76.45 | 0.56 | 0.8612 |
| Yes | 129 | 78.75 | 79.32 | 76.90 | 0.58 | 0.8701 | |||
| Liver | 2 | No | 400 | 70.26 | 75.39 | 65.81 | 0.41 | 0.7781 | |
| Yes | 86 | 70.59 | 74.93 | 65.59 | 0.41 | 0.7743 | |||
| Kidney | 2 | No | 400 | 79.70 | 81.18 | 77.84 | 0.59 | 0.8777 | |
| Yes | 184 | 79.98 | 82.60 | 77.31 | 0.60 | 0.8726 | |||
| Heart | 2 | No | 400 | 72.19 | 73.78 | 69.15 | 0.43 | 0.7896 | |
| Yes | 88 | 72.76 | 75.24 | 68.97 | 0.44 | 0.7948 | |||
| Testis | 4 | No | 472 | 74.05 | 77.42 | 70.43 | 0.48 | 0.8190 | |
| Yes | 97 | 74.40 | 78.14 | 70.02 | 0.48 | 0.8156 | |||
| Rat | Brain | 2 | No | 400 | 75.06 | 76.06 | 72.79 | 0.49 | 0.8245 |
| Yes | 72 | 75.96 | 77.00 | 73.47 | 0.50 | 0.8282 | |||
| Liver | 3 | No | 436 | 80.05 | 82.92 | 77.30 | 0.60 | 0.8758 | |
| Yes | 109 | 80.90 | 83.09 | 76.33 | 0.60 | 0.8766 | |||
| Kidney | 4 | No | 472 | 81.11 | 82.70 | 79.03 | 0.62 | 0.8839 | |
| Yes | 124 | 81.78 | 82.46 | 80.05 | 0.63 | 0.8877 |
Fig. 3The ROC curves for optimal feature subsets of 11 final models.
The generalization performance of our model on independent dataset.
| Species | Tissues | |||||
|---|---|---|---|---|---|---|
| Human | Brain | 71.1 | 69.50 | 72.98 | 0.42 | 0.7845 |
| Liver | 79.01 | 78.19 | 79.87 | 0.58 | 0.8681 | |
| Kidney | 77.76 | 77.13 | 78.42 | 0.56 | 0.8565 | |
| Mouse | Brain | 78.26 | 77.20 | 79.41 | 0.57 | 0.8613 |
| Liver | 68.79 | 67.82 | 69.86 | 0.38 | 0.762 | |
| Kidney | 79.31 | 78.37 | 80.32 | 0.59 | 0.8697 | |
| Heart | 71.3 | 70.52 | 72.13 | 0.43 | 0.7878 | |
| Testis | 73.54 | 72.19 | 75.08 | 0.47 | 0.8182 | |
| Rat | Brain | 75.14 | 73.93 | 76.48 | 0.50 | 0.8265 |
| Liver | 79.85 | 77.74 | 82.31 | 0.60 | 0.8761 | |
| Kidney | 81.42 | 80.18 | 82.77 | 0.63 | 0.8968 |
Fig. 4The heat map showing the values of AUC in cross-tissues prediction. Once a tissues-specific model was established on its own training dataset in rows, it was validated on the data from the same tissue as well as the independent data from the other datasets in columns.
Comparative results for identifying m6A on published database.
| Species | Methods | ||||
|---|---|---|---|---|---|
| Human | iRNA-3typeA | 90.38 | 81.68 | 99.11 | 0.82 |
| iRNA-m6A | 97.12 | 94.34 | 99.91 | 0.94 | |
| Mouse | iRNA-3typeA | 88.39 | 77.79 | 100.00 | 0.80 |
| iRNA-m6A | 89.17 | 78.34 | 100.00 | 0.80 |