| Literature DB >> 30697228 |
Cong Shen1, Yijie Ding2, Jijun Tang1,3, Fei Guo1.
Abstract
Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction.Entities:
Keywords: fast kernel learning; gene ontology; kernel ridge regression; lncRNA-protein interactions; multiple kernel learning
Year: 2019 PMID: 30697228 PMCID: PMC6340980 DOI: 10.3389/fgene.2018.00716
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Technical flow chart of our LPI prediction model. (A) LncRNAs and proteins belong to two separated and independent spaces, respectively; (B) Fast kernel learning is applied to estimate the weight of each kernel in the corresponding space; (C) Kronecker Product is adopted in generating the final kernel matrix; (D) Kernel Ridge Regression (KRR) is applied in LPI prediction.
Figure 2Schematic diagram of two-step Kernel Ridge Regression. (A) An intermediate prediction of LPI is conducted using an lncRNA KRR model. (B) Protein KRR is trained using the last step information for predicting new proteins.
Fast Kernel Learning based on Kernel Ridge Regression (LPI-FKLKRR).
| 1: Calculate |
| 2: Calculate |
| 3: Calculate the prediction value in matrix |
| 4: Adjust the parameters λ |
The AUPR and AUC of different kernels on benchmark dataset.
| GIP kernel | 0.6429 | 0.8671 |
| Sequence feature kernel | 0.4885 | 0.8250 |
| Sequence similarity kernel | 0.5024 | 0.8342 |
| Gene expression & protein GO | 0.2663 | 0.6626 |
| Multiple kernels with mean weighted | 0.6433 | 0.8840 |
| Multiple kernels with FastKL weighted |
Bold values represent the best value in columns.
Figure 3The ROC and PR curve of different models.
Figure 4The kernel weights in the experiment of LPI-FKLKRR on benchmark dataset.
Comparison to existing methods via 5-fold CV on benchmark dataset.
| LPI-FKLKRR | 0.9063 | |
| Integrated LPLNP | 0.4584 | |
| RWR | 0.2827 | 0.8134 |
| CF | 0.2357 | 0.7686 |
| LPIHN | 0.2299 | 0.8451 |
| LPBNI | 0.3302 | 0.8569 |
Results are derived from Zhang et al. (.
Figure 5The ROC and PR curve by local LOOCV on benchmark dataset.
The AUPR and AUC of different kernels by local LOOCV on benchmark dataset.
| GIP kernel | 0.1690 | 0.5189 |
| Sequence feature kernel | 0.2814 | 0.6800 |
| Sequence similarity kernel | 0.3546 | 0.7333 |
| Gene expression & protein GO | 0.3101 | 0.7301 |
| Multiple kernels with mean weighted | 0.4956 | 0.7898 |
| Multiple kernels with FastKL weighted |
Bold values represent the best value in columns.
Top 20 interactions rank on protein ENSP00000309558 and ENSP00000401371.
| NONHSAT011652 | ENSP00000309558 | 1 | Confirmed | NONHSAT002344 | ENSP00000401371 | 1 | Confirmed |
| NONHSAT027070 | ENSP00000309558 | 2 | Confirmed | NONHSAT104639 | ENSP00000401371 | 2 | – |
| NONHSAT104991 | ENSP00000309558 | 3 | Confirmed | NONHSAT027070 | ENSP00000401371 | 3 | Confirmed |
| NONHSAT001511 | ENSP00000309558 | 4 | Confirmed | NONHSAT104991 | ENSP00000401371 | 4 | Confirmed |
| NONHSAT079374 | ENSP00000309558 | 5 | – | NONHSAT101154 | ENSP00000401371 | 5 | – |
| NONHSAT009703 | ENSP00000309558 | 6 | Confirmed | NONHSAT041921 | ENSP00000401371 | 6 | Confirmed |
| NONHSAT138142 | ENSP00000309558 | 7 | Confirmed | NONHSAT042032 | ENSP00000401371 | 7 | – |
| NONHSAT104639 | ENSP00000309558 | 8 | Confirmed | NONHSAT131038 | ENSP00000401371 | 8 | Confirmed |
| NONHSAT135796 | ENSP00000309558 | 9 | Confirmed | NONHSAT084827 | ENSP00000401371 | 9 | – |
| NONHSAT077129 | ENSP00000309558 | 10 | – | NONHSAT021830 | ENSP00000401371 | 10 | Confirmed |
| NONHSAT023404 | ENSP00000309558 | 11 | – | NONHSAT001953 | ENSP00000401371 | 11 | Confirmed |
| NONHSAT063901 | ENSP00000309558 | 12 | Confirmed | NONHSAT145923 | ENSP00000401371 | 12 | Confirmed |
| NONHSAT099046 | ENSP00000309558 | 13 | – | NONHSAT039675 | ENSP00000401371 | 13 | – |
| NONHSAT031489 | ENSP00000309558 | 14 | – | NONHSAT135796 | ENSP00000401371 | 14 | Confirmed |
| NONHSAT041921 | ENSP00000309558 | 15 | Confirmed | NONHSAT011652 | ENSP00000401371 | 15 | Confirmed |
| NONHSAT013639 | ENSP00000309558 | 16 | – | NONHSAT044002 | ENSP00000401371 | 16 | – |
| NONHSAT027206 | ENSP00000309558 | 17 | – | NONHSAT112849 | ENSP00000401371 | 17 | – |
| NONHSAT134595 | ENSP00000309558 | 18 | – | NONHSAT114444 | ENSP00000401371 | 18 | Confirmed |
| NONHSAT054716 | ENSP00000309558 | 19 | – | NONHSAT007429 | ENSP00000401371 | 19 | Confirmed |
| NONHSAT122291 | ENSP00000309558 | 20 | Confirmed | NONHSAT123220 | ENSP00000401371 | 20 | – |
Top 10 interactions rank on lncRNA NONHSAT145960 and NONHSAT031708.
| NONHSAT145960 | ENSP00000258962 | 1 | – | NONHSAT031708 | ENSP00000385269 | 1 | Confirmed |
| NONHSAT145960 | ENSP00000240185 | 2 | Confirmed | NONHSAT031708 | ENSP00000258962 | 2 | – |
| NONHSAT145960 | ENSP00000385269 | 3 | – | NONHSAT031708 | ENSP00000240185 | 3 | Confirmed |
| NONHSAT145960 | ENSP00000349428 | 4 | Confirmed | NONHSAT031708 | ENSP00000349428 | 4 | – |
| NONHSAT145960 | ENSP00000379144 | 5 | Confirmed | NONHSAT031708 | ENSP00000258729 | 5 | Confirmed |
| NONHSAT145960 | ENSP00000338371 | 6 | Confirmed | NONHSAT031708 | ENSP00000338371 | 6 | – |
| NONHSAT145960 | ENSP00000401371 | 7 | Confirmed | NONHSAT031708 | ENSP00000379144 | 7 | – |
| NONHSAT145960 | ENSP00000254108 | 8 | – | NONHSAT031708 | ENSP00000254108 | 8 | Confirmed |
| NONHSAT145960 | ENSP00000258729 | 9 | Confirmed | NONHSAT031708 | ENSP00000401371 | 9 | Confirmed |
| NONHSAT145960 | ENSP00000413035 | 10 | – | NONHSAT031708 | ENSP00000371634 | 10 | Confirmed |
Comparison of running time between LPI-FKLKRR and LPLNP in 10 times.
| LPI-FKLKRR | ||
| LPLNP | 352.93 | 2.6656 |
The address of LPLNP is given by Zhang et al. (.
The information of two datasets in the experiment.
| benchmark dataset | 990 | 27 | 4,158 |
| novel dataset | 1,050 | 84 | 4,467 |
The benchmark dataset and the novel dataset come from the paper of Zhang et al. (.
The AUPR and AUC of different methods on novel dataset.
| LPI-FKLKRR | ||
| PPSNs | – | 0.9098 |
| NRLMF | 0.4010 | 0.8287 |
| CF | 0.4267 | 0.8103 |
AUPR is not exploited by Zheng et al. (.
Figure 6The ROC and PR curve of different models with novel dataset by 5-fold CV.
Figure 7The ROC and PR curve by local LOOCV on novel dataset.