| Literature DB >> 31824563 |
Yingjun Ma1,2, Tingting He2,3, Xingpeng Jiang2,3.
Abstract
Many long ncRNAs (lncRNA) make their effort by interacting with the corresponding RNA-binding proteins, and identifying the interactions between lncRNAs and proteins is important to understand the functions of lncRNA. Compared with the time-consuming and laborious experimental methods, more and more computational models are proposed to predict lncRNA-protein interactions. However, few models can effectively utilize the biological network topology of lncRNA (protein) and combine its sequence structure features, and most models cannot effectively predict new proteins (lncRNA) that do not interact with any lncRNA (proteins). In this study, we proposed a projection-based neighborhood non-negative matrix decomposition model (PMKDN) to predict potential lncRNA-protein interactions by integrating multiple biological features of lncRNAs (proteins). First, according to lncRNA (protein) sequences and lncRNA expression profile data, we extracted multiple features of lncRNA (protein). Second, based on protein GO ontology annotation, lncRNA sequences, lncRNA(protein) feature information, and modified lncRNA-protein interaction network, we calculated multiple similarities of lncRNA (protein), and fused them to obtain a more accurate lncRNA(protein) similarity network. Finally, combining the similarity and various feature information of lncRNA (protein), as well as the modified interaction network, we proposed a projection-based neighborhood non-negative matrix decomposition algorithm to predict the potential lncRNA-protein interactions. On two benchmark datasets, PMKDN showed better performance than other state-of-the-art methods for the prediction of new lncRNA-protein interactions, new lncRNAs, and new proteins. Case study further indicates that PMKDN can be used as an effective tool for lncRNA-protein interaction prediction.Entities:
Keywords: feature projection; graph non-negative matrix factorization; kernel neighborhood similarity; lncRNA-protein interaction; neighborhood completion
Year: 2019 PMID: 31824563 PMCID: PMC6880730 DOI: 10.3389/fgene.2019.01148
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flow chart of lncRNA-protein interaction prediction by PMDKN algorithm. As shown in the figure, we first calculated three features of lncRNAs and two features of proteins, and then calculated five similarities of lncRNAs and four similarities of proteins according to lncRNA sequence, protein GO annotation and their features.
Figure 2The influence of parameters on the AUPR value of PMDKN. Among them, (A) represents the influence of the projection parameters µ and η on the AUPR value. (B) shows the effect of the neighborhood Laplacian regularization parameter λ. (C) shows the effect of the feature regularization parameter γ. (D) indicates the effect of observing the important level parameter δ.
Comparison of predicted performance of new lncRNA-protein interactions based on DATASET1 and DATASET2.
| DATA | Method | AUPR | AUC | F1 value |
|---|---|---|---|---|
| DATASET 1 | LPBNI | 0.3296 | 0.8546 | 0.3881 |
| LPLNP | 0.4576 | 0.9095 | 0.4520 | |
| LKSNF | 0.4754 | 0.9150 | 0.4629 | |
| SFPEL-LPI | 0.4675 | 0.9201 | 0.4657 | |
| PMDKN | ||||
| DATASET 2 | LPBNI | 0.3418 | 0.9340 | 0.3977 |
| LPLNP | 0.4693 | 0.9700 | 0.4606 | |
| LKSNF | 0.4528 | 0.9710 | 0.4637 | |
| SFPEL-LPI | 0.4215 | 0.9728 | 0.4448 | |
| PMDKN |
In the above table, the best results under the current metric are shown in bold on each data set.
Comparison of predicted performance of new lncRNAs and new proteins based on DATASET1 and DATASET2.
| DATA | Method | ||||||
|---|---|---|---|---|---|---|---|
| AUPR | AUC | F1 value | AUPR | AUC | F1 value | ||
| DATASET 1 | SFPEL-LPI | 0.4813 | 0.8284 | 0.4931 | 0.3285 | 0.6666 | 0.3779 |
| PMDKN | |||||||
| DATASET 2 | SFPEL-LPI | 0.4756 | 0.9446 | 0.1208 | 0.6546 | 0.1940 | |
| PMDKN | 0.4864 | ||||||
In the above table, the best results under the current metric are shown in bold on each data set.
Figure 3Prediction performance of the model on disturbed data set. Among them, (A) shows the ROC curve and AUC value of the five methods after DATASET1 adds noise. (B) shows the P-R curve and AUPR values of the five methods after DATASET1 is added with noise. (C) shows the ROC curve and AUC value of the five methods after DATASET2 adds noise. (D) indicates the P-R curve and AUPR value of the five methods after DATASET2 is added with noise.
Figure 4Comparison of SFPEL-LPI and PMDKN prediction results for new proteins. The AUPR and AUC values in the figure represent the average AUPR and average AUC values predicted by PMDKN for 79 proteins, respectively. Top-10 hitrate, Top-20 hitrate, Top-50 hitrate, and Top-100 hitrate represent the mean hit rates of the first 10, 20, 50, and 100 candidate lncRNAs, respectively.