| Literature DB >> 32977679 |
Sen Yang1, Yan Wang1,2, Yu Lin2, Dan Shao1, Kai He1, Lan Huang1.
Abstract
Long non-coding RNA (LncRNA) and microRNA (miRNA) are both non-coding RNAs that play significant regulatory roles in many life processes. There is cumulating evidence showing that the interaction patterns between lncRNAs and miRNAs are highly related to cancer development, gene regulation, cellular metabolic process, etc. Contemporaneously, with the rapid development of RNA sequence technology, numerous novel lncRNAs and miRNAs have been found, which might help to explore novel regulated patterns. However, the increasing unknown interactions between lncRNAs and miRNAs may hinder finding the novel regulated pattern, and wet experiments to identify the potential interaction are costly and time-consuming. Furthermore, few computational tools are available for predicting lncRNA-miRNA interaction based on a sequential level. In this paper, we propose a hybrid sequence feature-based model, LncMirNet (lncRNA-miRNA interactions network), to predict lncRNA-miRNA interactions via deep convolutional neural networks (CNN). First, four categories of sequence-based features are introduced to encode lncRNA/miRNA sequences including k-mer (k = 1, 2, 3, 4), composition transition distribution (CTD), doc2vec, and graph embedding features. Then, to fit the CNN learning pattern, a histogram-dd method is incorporated to fuse multiple types of features into a matrix. Finally, LncMirNet attained excellent performance in comparison with six other state-of-the-art methods on a real dataset collected from lncRNASNP2 via five-fold cross validation. LncMirNet increased accuracy and area under curve (AUC) by more than 3%, respectively, over that of the other tools, and improved the Matthews correlation coefficient (MCC) by more than 6%. These results show that LncMirNet can obtain high confidence in predicting potential interactions between lncRNAs and miRNAs.Entities:
Keywords: LncRNA–miRNA interactions; RNA sequence features; computational frame; deep learning
Mesh:
Substances:
Year: 2020 PMID: 32977679 PMCID: PMC7583909 DOI: 10.3390/molecules25194372
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The overall workflow of LncMirNet. (A) Sub-sequence of a lncRNA/miRNA starting with position 0, 1, 2 respectively; (B) process to construct k-mer, CTD, and doc2vec and graph embedding features; (C) process to convert the lncRNA/miRNA vectors into a matrix; (D) process to predict potential interaction between lncRNA and miRNA by a CNN model.
Figure 2The training and inferring pipeline of doc2vec. (A) 3-mer segmentation process; (B) training process of a doc2vec model; (C) inferring process of doc2vec to encode a RNA sequence to a fixed-size vector.
Effects of feature information in terms of prediction accuracy.
| k-mer | k-mer, | k-mer, | k-mer, CTD, | |
|---|---|---|---|---|
| Training | 0.8609 | 0.8802 | 0.9048 | 0.9140 |
| Test | 0.8004 | 0.8188 | 0.8321 | 0.8534 |
The results of the six methods by five-fold cross validation on all data.
| Sensitivity | Specificity | F1-Score | Accuracy | AUC | MCC | |
|---|---|---|---|---|---|---|
| GEEL | 0.8040 |
| 0.8187 | 0.8220 | 0.8982 | 0.6445 |
| PmliPred | 0.8800 | 0.7118 | 0.8117 | 0.7959 | 0.9030 | 0.6004 |
| BiLSTM | 0.8027 | 0.6263 | 0.7239 | 0.7145 | 0.7876 | 0.4359 |
| SEAL | 0.7650 | 0.8097 | 0.7825 | 0.7874 | 0.8658 | 0.5754 |
| SVD | 0.6548 | 0.6594 | 0.6595 | 0.6571 | 0.7156 | 0.3142 |
| Katz | 0.5969 | 0.5961 | 0.5953 | 0.5964 | 0.6459 | 0.1930 |
| LncMirNet |
| 0.7910 |
|
|
|
|
Figure 3Receiver operating characteristic curves of seven methods on all data by five-fold cross validation.
Different β in negative sample generation.
|
| Number of Positive Samples | Number of Negative Samples | AUC |
|---|---|---|---|
| 0.25 | 15,386 | 3846 | 0.8519 |
| 0.5 | 15,386 | 7693 | 0.8729 |
| 1.0 | 15,386 | 15,386 | 0.9381 |
| 2.0 | 15,386 | 30,772 | 0.9067 |
| 4.0 | 15,386 | 61,544 | 0.8834 |