| Literature DB >> 32718000 |
Xiao-Nan Fan1, Shao-Wu Zhang1, Song-Yao Zhang1, Jin-Jie Ni1.
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.Entities:
Keywords: alignment-free; deep learning; long noncoding RNA; multimodal learning
Year: 2020 PMID: 32718000 PMCID: PMC7432689 DOI: 10.3390/ijms21155222
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Overview of the lncRNA_Mdeep.
Performance of lncRNA_Mdeep and other model architectures in the 10CV test.
| ACC (%) | MCC | |||
|---|---|---|---|---|
| OFH_DNN | 95.74 ± 1.70 | 94.44 ± 4.89 | 97.04 ± 2.15 | 0.9171 ± 0.0307 |
| 96.53 ± 0.41 | 96.40 ± 1.11 | 96.66 ± 0.78 | 0.9307 ± 0.0082 | |
| One-hot_CNN | 95.82 ± 0.33 | 97.01 ± 0.96 | 94.63 ± 1.19 | 0.9169 ± 0.0064 |
| OFH_DNN + | 95.97 ± 2.49 | 96.87 ± 1.05 | 95.06 ± 5.71 | 0.9211 ± 0.0449 |
| k-mer_DNN + One-hot_CNN | 98.36 ± 0.16 | 98.70 ± 0.42 | 98.03 ± 0.50 | 0.9674 ± 0.0033 |
| OFH_DNN + One-hot_CNN | 97.60 ± 1.26 | 97.78 ± 1.58 | 97.43 ± 2.33 | 0.9526 ± 0.0248 |
| Decision fusion | 98.42 ± 1.12 | 99.24 ± 0.45 | 97.60 ± 2.59 | 0.9689 ± 0.0212 |
| lncRNA_Mdeep | 98.73 ± 0.41 | 98.95 ± 0.54 | 98.52 ± 0.92 | 0.9748 ± 0.0080 |
OFH_DNN, the DNN model with the OFH feature as input. k-mer_DNN: the DNN model with the k-mer feature as input. One-hot_CNN: the CNN model with one-hot encoding as input. ACC, accuracy. S, sensitivity. S, specificity. MCC, Matthew’s correlation coefficient.
Figure 2Results of k-mer_DNN and One-hot_CNN with different parameters. (A) Accuracy of k-mer_DNN with different k values. (B) Accuracy of One-hot_CNN with different maxlen values. k-mer_DNN, the DNN model with the k-mer feature as input. One-hot_CNN, the CNN model with one-hot encoding as input.
Performance of lncRNA_Mdeep and other eight methods on humans in an independent test.
| Methods | ACC (%) | MCC | ||
|---|---|---|---|---|
| CNCI | 86.40 | 97.42 | 75.38 | 0.7463 |
| CPAT | 87.98 | 95.22 | 80.73 | 0.7676 |
| PLEK | 77.71 | 97.22 | 58.20 | 0.6019 |
| lncRNA-MFDL | 85.47 | 93.43 | 77.50 | 0.7185 |
| CPC2 | 77.98 | 94.07 | 61.90 | 0.5911 |
| lncRNAnet | 92.18 | 96.63 | 87.73 | 0.8470 |
| lncFinder1 | 86.22 | 95.20 | 77.23 | 0.7363 |
| lncFinder2 | 86.88 | 95.98 | 77.77 | 0.7501 |
| lncRNA_Mdeep | 93.12 | 97.27 | 88.97 | 0.8653 |
CNCI, coding-non-coding index [14]. CPAT, coding-potential assessment tool [15]. PLEK, predictor of lncRNA and messenger RNAs based on an improved k-mer scheme [16]. lncRNA-MFDL, identification of lncRNA by fusing multiple features and using deep learning [17]. CPC2, coding potential calculator 2 [19]. lncRNAnet, lncRNA identification using deep learning [20]. LncFinder1: the LncFinder without the secondary structure; LncFinder2: the LncFinder with the secondary structure [21]. lncRNA_Mdeep, our method.
Accuracy (%) of lncRNA_Mdeep and other eight methods on 11 cross-species datasets.
| Species | CNCI | CPAT | PLEK | lncRNA-MFDL | CPC2 | lncRNAnet | lncFinder1 | lncFinder2 | lncRNA_Mdeep |
|---|---|---|---|---|---|---|---|---|---|
| Mouse | 87.09 | 90.47 | 71.89 | 88.53 | 80.43 | 91.81 | 88.47 | 88.99 |
|
|
| 79.86 | 91.39 | 66.93 |
| 93.36 | 94.60 | 92.45 | 93.77 | 95.73 |
|
| 92.88 | 97.13 | 89.32 | 95.51 | 96.10 | 96.30 | 97.00 | 97.03 |
|
|
| 77.72 | 91.48 | 45.37 | 97.97 | 94.75 | 97.95 | 87.46 | 88.55 |
|
| Chicken | 91.52 |
| 83.95 | 96.87 | 95.22 | 95.56 | 96.82 | 96.64 | 96.06 |
| Chimpanzee | 89.84 | 96.18 | 88.99 | 94.26 | 95.48 | 94.78 | 96.05 | 96.21 |
|
| Frog | 90.60 | 96.40 | 80.90 | 96.14 | 96.34 | 95.53 | 96.92 |
| 96.80 |
| Fruit fly | 92.90 | 96.02 | 74.43 |
| 94.28 | 95.21 | 95.33 | 95.50 | 96.10 |
| Gorilla | 89.37 | 94.99 | 86.75 | 95.12 | 94.12 | 94.31 | 94.72 | 94.87 |
|
| Pig | 91.73 | 96.91 | 87.34 |
| 95.86 | 95.56 | 96.88 | 96.82 | 96.87 |
| Zebrafish | 93.59 | 97.50 | 85.07 | 92.17 | 96.83 | 95.77 | 97.54 |
| 96.76 |