| Literature DB >> 29858068 |
Hai-Cheng Yi1, Zhu-Hong You2, De-Shuang Huang3, Xiao Li4, Tong-Hai Jiang4, Li-Ping Li4.
Abstract
The interactions between non-coding RNAs (ncRNAs) and proteins play an important role in many biological processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological techniques are used to identify protein molecules bound with specific ncRNA, but they are usually expensive and time consuming. Deep learning provides a powerful solution to computationally predict RNA-protein interactions. In this work, we propose the RPI-SAN model by using the deep-learning stacked auto-encoder network to mine the hidden high-level features from RNA and protein sequences and feed them into a random forest (RF) model to predict ncRNA binding proteins. Stacked assembling is further used to improve the accuracy of the proposed method. Four benchmark datasets, including RPI2241, RPI488, RPI1807, and NPInter v2.0, were employed for the unbiased evaluation of five established prediction tools: RPI-Pred, IPMiner, RPISeq-RF, lncPro, and RPI-SAN. The experimental results show that our RPI-SAN model achieves much better performance than other methods, with accuracies of 90.77%, 89.7%, 96.1%, and 99.33%, respectively. It is anticipated that RPI-SAN can be used as an effective computational tool for future biomedical researches and can accurately predict the potential ncRNA-protein interacted pairs, which provides reliable guidance for biological research.Entities:
Keywords: PSSM; RNA-protein interactions; Zernike moment; deep learning; non-coding RNA; stacked auto-encoder
Year: 2018 PMID: 29858068 PMCID: PMC5992449 DOI: 10.1016/j.omtn.2018.03.001
Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN: 2162-2531 Impact factor: 8.886
Figure 1Prediction Performance Comparison Between SA-FT-RF, SA-RF, RPISeq-RF, Average Assembling, and Stacked Assembling on ncRNA-Protein Dataset RPI2241
Comparing RPI-SAN with Other Methods on RPI4888, RPI1807 and RPI2241 Datasets
| Datasets | Methods | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | MCC (%) | AUC |
|---|---|---|---|---|---|---|---|
| RPI488 | IPMiner | 89.1 | 93.9 | 83.1 | 94.5 | 78.4 | 0.914 |
| RPISeq-RF | 88.0 | 92.6 | 82.2 | 93.2 | 76.2 | 0.903 | |
| lncPro | 87.0 | 90.0 | 82.7 | 91.0 | 74.0 | 0.901 | |
| RPI-SAN | 89.7a | 94.3a | 83.7 | 95.2a | 79.3a | 0.920a | |
| RPI1807 | RPI-Pred | 93.0 | 95.0 | N/A | 94.0 | N/A | 0.97 |
| IPMiner | 98.6a | 98.2a | 99.3 | 97.8a | 97.2a | 0.998 | |
| RPISeq-RF | 97.3 | 96.8 | 98.4 | 96.0 | 94.6 | 0.0996 | |
| lncPro | 96.9 | 96.5 | 98.1 | 95.5 | 93.8 | 0.994 | |
| RPI-SAN | 96.1 | 93.6 | 99.9 | 91.4 | 92.4 | 0.999a | |
| RPI2241 | RPI-Pred | 84.0 | 78.0 | N/A | 88.0a | N/A | 0.89 |
| IPMiner | 82.4 | 83.3 | 81.2 | 83.6 | 65.0 | 0.906 | |
| RPISeq-RF | 63.96 | 64.83 | 62.59 | 65.37 | 27.98 | 0.690 | |
| lncPro | 65.4 | 65.9 | 64.0 | 66.9 | 31.0 | 0.722 | |
| RPI-SAN | 90.77a | 86.17a | 97.37a | 84.05 | 82.27a | 0.962a |
aThis measure of performance is the best among the compared methods for the individual dataset.
Predicted Performance of the RPI488 Trained Model on NPInter v2.0 Dataset
| Organism | Number of Interaction Pairs | Predicted Number of Interaction Pairs | Accuracy (%) |
|---|---|---|---|
| 6,975 | 6,928 | 99.33 | |
| 36 | 29 | 80.56 | |
| 91 | 90 | 98.90 | |
| 910 | 897 | 98.56 | |
| 2,198 | 2,153 | 97.95 | |
| 202 | 177 | 87.62 | |
| Total | 10,412 | 10,274 | 98.67 |
Confirmed RNA-Protein Interactions with High Ranks in the Dataset of Homo sapiens
| Protein ID | RNA ID | Probability |
|---|---|---|
| HNRNPA1 | EPB41 | 0.867 |
| TARDBP | CFTR | 0.866 |
| MBNL1 | DMPK | 0.863 |
| PTBP1 | CD40LG | 0.859 |
| SRP19 | RN7SL1 | 0.857 |
| SRSF1 | TNNT2 | 0.856 |
| ELAVL4 | MYCN | 0.853 |
| ELAVL2 | ID1 | 0.851 |
| HNRNPC | CSF2 | 0.848 |
| HNRNPD | ADRB1 | 0.847 |
| EIF5A | RNU6-1 | 0.845 |
| HNRNPD | AGTR1 | 0.842 |
| ELAVL3 | VEGFA | 0.838 |
| YBX1 | CSF2 | 0.833 |
| ZBP1 | ACTB | 0.831 |
The Details of the ncRNA-Protein Interaction Datasets
| Dataset | Interaction Pairs | Number of Proteins | Number of RNAs |
|---|---|---|---|
| RPI488 | 243 | 25 | 247 |
| RPI1807 | 1,807 | 1,807 | 1,078 |
| RPI2241 | 2,241 | 2,043 | 332 |
| NPInter v2.0 | 10,412 | 449 | 4,636 |
RPI488 is lncRNA-protein interactions based on structure complexes. PI369, RPI2241, and RPI1807 are RNA-protein interactions. NPInter2.0 and RPI13254 are ncRNA-protein interactions from non-structure-based source.
Figure 2The Construction of Stacked Auto-Encoder Network
Figure 3The Construction of Sparse Auto-Encoder