| Literature DB >> 32858457 |
Kewei Liu1, Lei Cao1, Pufeng Du2, Wei Chen3.
Abstract
N6-methyladenosine (m6A) is the most abundant post-transcriptional modification and involves a series of important biological processes. Therefore, accurate detection of the m6A site is very important for revealing its biological functions and impacts on diseases. Although both experimental and computational methods have been proposed for identifying m6A sites, few of them are able to detect m6A sites in different tissues. With the consideration of the spatial specificity of m6A modification, it is necessary to develop methods able to detect the m6A site in different tissues. In this work, by using the convolutional neural network (CNN), we proposed a new method, called im6A-TS-CNN, that can identify m6A sites in brain, liver, kidney, heart, and testis of Homo sapiens, Mus musculus, and Rattus norvegicus. In im6A-TS-CNN, the samples were encoded by using the one-hot encoding scheme. The results from both a 5-fold cross-validation test and independent dataset test demonstrate that im6A-TS-CNN is better than the existing method for the same purpose. The command-line version of im6A-TS-CNN is available at https://github.com/liukeweiaway/DeepM6A_cnn.Entities:
Keywords: convolution neural network; m6A; one-hot encoding; spatial specificity of gene expression
Year: 2020 PMID: 32858457 PMCID: PMC7473875 DOI: 10.1016/j.omtn.2020.07.034
Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN: 2162-2531 Impact factor: 8.886
Figure 1The Framework of the im6A-TS-CNN
The first step is to collect tissue-specific m6A data from the human, mouse, and rat. The second step is encoding the sequences by using the one-hot scheme. The third step is model construction.
The Performance of im6A-TS-CNN for Identifying m6A Sites
| 5-Fold Cross Validation | Independent Test | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sn (%) | Sp (%) | Acc (%) | MCC | AUC | Sn (%) | Sp (%) | Acc (%) | MCC | AUC | |
| h_b | 75.35 | 69.71 | 72.53 | 0.4523 | 0.8029 | 75.17 | 70.20 | 72.69 | 0.4543 | 0.8056 |
| h_k | 81.70 | 78.25 | 79.98 | 0.6006 | 0.8781 | 79.95 | 78.53 | 79.24 | 0.5848 | 0.8727 |
| h_l | 80.18 | 79.69 | 79.94 | 0.5992 | 0.8811 | 84.81 | 75.02 | 79.92 | 0.6012 | 0.8805 |
| m_b | 81.50 | 75.85 | 78.67 | 0.5749 | 0.8705 | 86.22 | 70.74 | 78.48 | 0.5765 | 0.8722 |
| m_h | 78.37 | 67.60 | 72.99 | 0.4633 | 0.8115 | 75.82 | 71.36 | 73.59 | 0.4723 | 0.8161 |
| m_k | 79.91 | 81.00 | 80.46 | 0.6094 | 0.8842 | 80.52 | 81.00 | 80.76 | 0.6151 | 0.8855 |
| m_l | 72.39 | 70.24 | 71.32 | 0.4288 | 0.7953 | 75.56 | 67.58 | 71.57 | 0.4328 | 0.7927 |
| m_t | 75.21 | 75.61 | 75.41 | 0.5090 | 0.8380 | 83.45 | 68.87 | 76.16 | 0.5288 | 0.8467 |
| r_b | 79.04 | 74.23 | 76.64 | 0.5379 | 0.8469 | 78.05 | 75.84 | 76.95 | 0.5391 | 0.8516 |
| r_k | 84.15 | 80.77 | 82.46 | 0.6500 | 0.9017 | 84.85 | 80.59 | 82.72 | 0.6550 | 0.9077 |
| r_l | 81.56 | 79.63 | 80.59 | 0.6126 | 0.8830 | 84.51 | 75.94 | 80.22 | 0.6067 | 0.8847 |
h, m and r before the hyphen stand for human, mouse, and rat, respectively; after the hyphen stand for brain, heart, kidney, liver, and testis, respectively.
Figure 2The ROC Curves for Identifying m6A in Different Tissues in the Three Species under the 5-Fold Cross-Validation Test and Independent Dataset Test
The value of AUC is given in the right corner of each graph.
Comparative Results between im6A-TS-CNN and iRNA-m6A under the 5-Fold Cross-Validation Test and Independent Test
| 5-Fold Cross Validation (AUC) | Independent Test (AUC) | |||||
|---|---|---|---|---|---|---|
| m6A-TS-CNN | iRNA-m6A | Difference | im6A-TS-CNN | iRNA-m6A | Difference | |
| h_b | 0.8029 | 0.7756 | 0.0273 | 0.8056 | 0.7845 | 0.0211 |
| h_k | 0.8781 | 0.8634 | 0.0147 | 0.8727 | 0.8565 | 0.0162 |
| h_l | 0.8811 | 0.8738 | 0.0073 | 0.8805 | 0.8681 | 0.0124 |
| m_b | 0.8705 | 0.8731 | −0.0026 | 0.8722 | 0.8613 | 0.0109 |
| m_h | 0.8115 | 0.7948 | 0.0167 | 0.8161 | 0.7878 | 0.0283 |
| m_k | 0.8842 | 0.8726 | 0.0116 | 0.8855 | 0.8697 | 0.0158 |
| m_l | 0.7953 | 0.7743 | 0.0210 | 0.7927 | 0.762 | 0.0307 |
| m_t | 0.8380 | 0.8156 | 0.0224 | 0.8467 | 0.8182 | 0.0285 |
| r_b | 0.8469 | 0.8282 | 0.0187 | 0.8516 | 0.8968 | −0.0452 |
| r_k | 0.9017 | 0.8877 | 0.0140 | 0.9077 | 0.8761 | 0.0316 |
| r_l | 0.8830 | 0.8766 | 0.0064 | 0.8847 | 0.8265 | 0.0582 |
h, m and r before the hyphen stand for human, mouse and rat; b, h, k, l, t after the hyphen stand for brain, liver, kidney, heart and testis, respectively.
inidcates the performance of im6A-TS-CNN is better than iRNA-m6A for identifying m6A sites.
Figure 3Heatmap Showing the AUC Values of Cross-Species and Cross-Tissue Validation
The abscissa represents the independent dataset, and the ordinate represents the model.
The Information of Benchmark Datasets for Predicting RNA m6A Sites
| Name | Training | Testing | ||
|---|---|---|---|---|
| Positive | Negative | Positive | Negative | |
| h_b | 4,605 | 4,605 | 4,604 | 4,604 |
| h_k | 2,634 | 2,634 | 2,634 | 2,634 |
| h_l | 4,574 | 4,574 | 4,573 | 4,573 |
| m_b | 8,025 | 8,025 | 8,025 | 8,025 |
| m_h | 4,133 | 4,133 | 4,133 | 4,133 |
| m_k | 3,953 | 3,953 | 3,952 | 3,952 |
| m_l | 2,201 | 2,201 | 2,200 | 2,200 |
| m_t | 4,707 | 4,707 | 4,706 | 4,706 |
| r_b | 2,352 | 2,352 | 2,351 | 2,351 |
| r_k | 1,762 | 1,762 | 1,762 | 1,762 |
| r_l | 3,433 | 3,433 | 3,432 | 3,432 |
h, m and r before the hyphen stand for human, mouse and rat; b, h, k, l, t after the hyphen stand for brain, liver, kidney, heart and testis, respectively.