| Literature DB >> 31703384 |
Zhengfeng Wang1,2, Xiujuan Lei1, Fang-Xiang Wu3.
Abstract
Circular RNAs (circRNAs) are extensively expressed in cells and tissues, and play crucial roles in human diseases and biological processes. Recent studies have reported that circRNAs could function as RNA binding protein (RBP) sponges, meanwhile RBPs can also be involved in back-splicing. The interaction with RBPs is also considered an important factor for investigating the function of circRNAs. Hence, it is necessary to understand the interaction mechanisms of circRNAs and RBPs, especially in human cancers. Here, we present a novel method based on deep learning to identify cancer-specific circRNA-RBP binding sites (CSCRSites), only using the nucleotide sequences as the input. In CSCRSites, an architecture with multiple convolution layers is utilized to detect the features of the raw circRNA sequence fragments, and further identify the binding sites through a fully connected layer with the softmax output. The experimental results show that CSCRSites outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data. In addition, the features learnt by CSCRSites are converted to sequence motifs, some of which can match to human known RNA motifs involved in human diseases, especially cancer. Therefore, as a deep learning-based tool, CSCRSites could significantly contribute to the function analysis of cancer-associated circRNAs.Entities:
Keywords: RNA binding protein; cancer-specific; circRNA; convolutional neural network
Mesh:
Substances:
Year: 2019 PMID: 31703384 PMCID: PMC6891306 DOI: 10.3390/molecules24224035
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The distribution of AUCs across various kernels.
Figure 2The distribution of AUCs across various parameters and structures.
Figure 3Receiver operating characteristics (ROC) curves to show the superior performance of CSCRSites (cancer-specific circRNA–RBP binding sites) over multilayer perception (MLP), support vector machine (SVM), and random forest (RF) on the test dataset. RBP, RNA binding protein.
CSCRSites outperforms other deep learning-based models on the test dataset. Accuracy, Acc.; precision, Prec.
| Acc. | Prec. | AUC | |
|---|---|---|---|
| CSCRSites | 0.74 | 0.76 | 0.83 |
| DeepBind | 0.68 | 0.68 | 0.75 |
| iDeepS | 0.61 | 0.64 | 0.65 |
| Zeng’s method | 0.59 | 0.59 | 0.62 |
Figure 4ROC curves to show the superior performance of CSCRSites over other deep learning-based methods on the test dataset.
Some motifs learnt by CSCRSites are aligned with the known motifs and the associated genes.
| Associated Genes | Known Motifs ID | Known Sequence | Learnt Motifs ID | Learnt Sequence | Overlap | E-Value |
|---|---|---|---|---|---|---|
| DAZAP1 | RNCMPT00013 | UAGGUAG | KER_29 | UAGGUAGG | 7 | 0.0031 |
| FMR1 | RNCMPT00016 | GGACAAG | KER_632 | GGCACAGG | 7 | 0.0290 |
| HNRNPK | RNCMPT00026 | CCAACCC | KER_959 | CAACCAGU | 6 | 0.0429 |
| HNRNPL | RNCMPT00027 | ACACACA | KER_793 | ACACACAG | 7 | 0.0019 |
| HNRPLL | RNCMPT00178 | ACACACA | KER_793 | ACACACAG | 7 | 0.0030 |
| HuR | RNCMPT00032 | UUAUUUU | KER_78 | UUUAUUUU | 7 | 0.0054 |
| RNCMPT00112 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0098 | |
| RNCMPT00117 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0070 | |
| RNCMPT00136 | UUGGUUU | KER_395 | AUUGAUUU | 7 | 0.0202 | |
| IGF2BP2 | RNCMPT00033 | ACAAACA | KER_512 | AAACACAG | 7 | 0.0401 |
| IGF2BP3 | RNCMPT00172 | ACAAACA | KER_793 | ACACACAG | 7 | 0.0110 |
| KHDRBS1 | RNCMPT00169 | AUAAAAG | KER_837 | UAUUAAAG | 7 | 0.0254 |
| MATR3 | RNCMPT00037 | AAUCUUG | KER_801 | GAAUCUUG | 7 | 0.0021 |
| PABPC5 | RNCMPT00171 | AGAAAAU | KER_113 | AGAAAGUG | 7 | 0.0060 |
| PABPN1 | RNCMPT00157 | AGAAGAC | KER_183 | AGAAAACA | 7 | 0.0109 |
| PCBP1 | RNCMPT00186 | CCUUUCC | KER_577 | CCUUCCCU | 7 | 0.0055 |
| PCBP2 | RNCMPT00044 | CCUUCCC | KER_577 | CCUUCCCU | 7 | 0.0021 |
| PTBP1 | RNCMPT00268 | CUUUUCU | KER_366 | UUUUCUUU | 6 | 0.0208 |
| RNCMPT00269 | ACUUUCU | KER_269 | UACUUCCC | 7 | 0.0051 | |
| RBM46 | RNCMPT00054 | AAUCAAU | KER_153 | GAAUCAAU | 7 | 0.0208 |
| SAMD4A | RNCMPT00063 | GCUGGAC | KER_608 | UGCUGGCC | 7 | 0.0347 |
| SNRNP70 | RNCMPT00070 | GAUCAAG | KER_197 | GAAUCAAG | 7 | 0.0065 |
| SRSF1 | RNCMPT00107 | GGAGGAA | KER_37 | GGGAGGAA | 7 | 0.0391 |
| SRSF10 | RNCMPT00019 | AGAGAAA | KER_824 | AGAGAAAA | 7 | 0.0373 |
| RNCMPT00089 | AGAGAAA | KER_824 | AGAGAAAA | 7 | 0.0299 | |
| TIA1 | RNCMPT00165 | UUUUUUC | KER_842 | UUCCUUCU | 7 | 0.0122 |
| U2AF2 | RNCMPT00079 | UUUUUUC | KER_842 | UUCCUUCU | 7 | 0.0036 |
| ZC3H14 | RNCMPT00086 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0111 |
Figure 5Some sequence logos of matched motifs whose associated genes are involved in human cancer. For each plot, the motifs learnt by CSCRSites (bottom) is aligned with the known motif (top) from Homo sapiens database by TOMTOM. The gene name associated with the known motif is shown in Table 2.
Application, merit, and demerit of comparative methods.
| Methods | Application | Motifs | Merit | Demerit |
|---|---|---|---|---|
| CSCRSites | circRNA binding sites | YES | Discovery of various length motifs | The rate of convergence is relatively slow |
| DeepBind | DNA/RNA binding sites | YES | Scales well to ChIP-seq and HT-SELEX data sets | Low prediction accuracy on circRNA data sets |
| iDeepS | RNA | YES | Integrates RNA secondary structure | Predict binding targets for specific RBP |
| Zeng’s method | DNA | YES | Motif occupancy task | Motif length is fixed |
Figure 6Schematic diagram of CSCRSites model construction.