| Literature DB >> 30349558 |
Zhao-Hui Zhan1, Zhu-Hong You2, Li-Ping Li2, Yong Zhou1, Hai-Cheng Yi2.
Abstract
Non-coding RNA (ncRNA) plays a crucial role in numerous biological processes including gene expression and post-transcriptional gene regulation. The biological function of ncRNA is mostly realized by binding with related proteins. Therefore, an accurate understanding of interactions between ncRNA and protein has a significant impact on current biological research. The major challenge at this stage is the waste of a great deal of redundant time and resource consumed on classification in traditional interaction pattern prediction methods. Fortunately, an efficient classifier named LightGBM can solve this difficulty of long time consumption. In this study, we employed LightGBM as the integrated classifier and proposed a novel computational model for predicting ncRNA and protein interactions. More specifically, the pseudo-Zernike Moments and singular value decomposition algorithm are employed to extract the discriminative features from protein and ncRNA sequences. On four widely used datasets RPI369, RPI488, RPI1807, and RPI2241, we evaluated the performance of LGBM and obtained an superior performance with AUC of 0.799, 0.914, 0.989, and 0.762, respectively. The experimental results of 10-fold cross-validation shown that the proposed method performs much better than existing methods in predicting ncRNA-protein interaction patterns, which could be used as a useful tool in proteomics research.Entities:
Keywords: LightGBM; PSSM; Pseudo-Zernike moments; k-mers; ncRNA-protein interactions
Year: 2018 PMID: 30349558 PMCID: PMC6186793 DOI: 10.3389/fgene.2018.00458
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
The specific composition of four required datasets.
| RPI369 | 369 | 369 | 338 | 332 |
| RPI488 | 243 | 245 | 247 | 25 |
| RPI1807 | 1,807 | 1,436 | 1,807 | 1,078 |
| RPI2241 | 2,241 | 943 | 2,043 | 842 |
Figure 1Step-wise work flow for the purposed LGBM machine learning model.
Ten-fold cross-validation results on dataset RPI369.
| 1 | 75.00 | 73.17 | 81.08 | 68.57 | 50.12 |
| 2 | 70.83 | 73.53 | 67.57 | 74.29 | 41.90 |
| 3 | 76.39 | 76.32 | 78.28 | 74.29 | 52.73 |
| 4 | 76.39 | 75.68 | 77.78 | 75.00 | 52.80 |
| 5 | 76.39 | 71.11 | 88.89 | 63.89 | 54.51 |
| 6 | 69.44 | 65.91 | 80.56 | 58.33 | 39.89 |
| 7 | 73.61 | 72.97 | 75.00 | 72.22 | 47.24 |
| 8 | 75.00 | 72.50 | 80.56 | 69.44 | 50.31 |
| 9 | 74.65 | 71.43 | 83.33 | 65.71 | 49.89 |
| 10 | 70.42 | 69.23 | 75.00 | 65.71 | 40.91 |
| Average | 73.81 | 72.18 | 68.75 | 78.81 | 48.03 |
Ten-fold cross-validation results on dataset RPI488.
| 1 | 91.84 | 100.0 | 83.33 | 100.0 | 84.76 |
| 2 | 87.76 | 94.00 | 77.27 | 96.30 | 75.91 |
| 3 | 87.76 | 88.89 | 88.89 | 86.36 | 75.25 |
| 4 | 95.92 | 100.0 | 92.00 | 100.0 | 92.15 |
| 5 | 75.51 | 75.00 | 60.00 | 86.21 | 48.43 |
| 6 | 91.84 | 91.30 | 91.30 | 92.31 | 83.61 |
| 7 | 93.75 | 100.0 | 86.36 | 100.0 | 87.99 |
| 8 | 87.50 | 96.30 | 83.87 | 94.12 | 75.19 |
| 9 | 91.60 | 95.24 | 86.96 | 96.00 | 83.54 |
| 10 | 91.67 | 91.67 | 91.67 | 91.67 | 83.83 |
| Average | 89.52 | 93.28 | 94.30 | 84.17 | 79.02 |
Performance evaluation on different classifiers.
| RPI369 | LGBM | 68.75 | ||||
| SVM | 71.60 | 71.70 | 72.51 | 43.62 | ||
| GBDT | 71.74 | 71.79 | 72.79 | 43.90 | ||
| RPI488 | LGBM | |||||
| SVM | 86.22 | 88.62 | 89.86 | 82.27 | 72.44 | |
| GBDT | 86.01 | 88.54 | 89.86 | 81.81 | 72.04 |
The bold value indicates this measure performance is the best among the compared methods.
Figure 2The ROC curve of dataset RPI369 on three classifiers.
Figure 3The ROC curve of dataset RPI488 on three classifiers.
Comparison between LGBM and other methods in RPI488, RPI1807, and RPI2241.
| RPI488 | LGBM | |||||
| RPISeq-RF | 88.00 | 93.20 | 92.60 | 82.20 | 76.20 | |
| IncPro | 87.00 | 91.00 | 90.00 | 82.70 | 74.00 | |
| RPI1807 | LGBM | 96.42 | 95.20 | 97.40 | 92.76 | |
| RPI-Pred | 93.00 | 94.00 | 95.00 | N/A | N/A | |
| IncPro | 95.50 | |||||
| RPI2241 | LGBM | 61.50 | ||||
| RPISeq-RF | 63.96 | 65.37 | 64.83 | 62.59 | 27.98 | |
| IncPro | 65.40 | 66.90 | 65.90 | 31.00 |
The bold value indicates this measure performance is the best among the compared methods.
Figure 4The ROC curve of dataset RPI488 on 10-fold cross- validation.
Figure 6The ROC curve of dataset RPI2241 on 10-fold cross- validation.