| Literature DB >> 36225547 |
Yue Ji1, Lu Bai2, Menglong Li1.
Abstract
Understanding the protein-RNA interaction mechanism can help us to further explore various biological processes. The experimental techniques still have some limitations, such as the high cost of economy and time. Predicting protein-RNA-binding sites by using computational methods is an excellent research tool. Here, we developed a universal method for predicting protein-specific RNA-binding sites, so one general model for a given protein was constructed on a fixed dataset by fusing the data of different experimental techniques. At the same time, information theory was employed to characterize the sequence conservation of RNA-binding segments. Conversation difference profiles between binding and nonbinding segments were constructed by information entropy (IE), which indicates a significant difference. Finally, the 19 proteins-specific models based on random forest (RF) were built based on IE encoding. The performance on the independent datasets demonstrates that our method can obtain competitive results when compared with the current best prediction model.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36225547 PMCID: PMC9550406 DOI: 10.1155/2022/8626628
Source DB: PubMed Journal: Comput Intell Neurosci
The prediction performance of the models constructed with a single information entropy feather across 31 original experiment datasets.
| Training groups | Protein | SE | SP | ACC | MCC | AUC_train | AUC_test |
|---|---|---|---|---|---|---|---|
| A | 1 Ago/EIF | 0.500 | 0.504 | 0.502 | 0.004 | 0.605 | 0.583 |
| 3 Ago2-1 | 0.523 | 0.444 | 0.490 | −0.032 | 0.623 | ||
| 4 Ago2-2 | 0.515 | 0.548 | 0.528 | 0.062 | 0.656 | ||
| 5 Ago2 | 0.473 | 0.508 | 0.489 | −0.019 | 0.554 | ||
| B | 6 eIF4AIII-1 | 0.745 | 0.845 | 0.795 | 0.593 | 0.851 | 0.858 |
| 7 eIF4AIII-2 | 0.780 | 0.805 | 0.793 | 0.585 | 0.873 | ||
| C | 8 ELAVL1-1 | 0.860 | 0.118 | 0.501 | −0.033 | 0.750 | 0.715 |
| 10 ELAVL1A | 0.763 | 0.119 | 0.460 | −0.153 | 0.626 | ||
| 11 ELAVL1-2 | 0.869 | 0.090 | 0.501 | −0.065 | 0.703 | ||
| D | 12 ESWR1 | 0.870 | 0.735 | 0.803 | 0.611 | 0.858 | 0.896 |
| E | 13 FUS | 0.885 | 0.855 | 0.870 | 0.740 | 0.916 | 0.914 |
| 14 Mut FUS | 0.890 | 0.840 | 0.865 | 0.731 | 0.912 | ||
| F | 15 IGFBP1-3 | 0.690 | 0.675 | 0.683 | 0.365 | 0.628 | 0.771 |
| G | 16 hnRNPC-1 | 0.595 | 0.840 | 0.718 | 0.449 | 0.947 | 0.830 |
| 17 hnRNPC-2 | 0.685 | 0.835 | 0.760 | 0.526 | 0.844 | ||
| H | 18 hnRNPL-1 | 0.680 | 0.750 | 0.715 | 0.431 | 0.754 | 0.810 |
| 19 hnRNPL-2 | 0.740 | 0.730 | 0.735 | 0.470 | 0.805 | ||
| 20 hnRNPL-like | 0.625 | 0.710 | 0.668 | 0.336 | 0.732 | ||
| I | 21 MOV10 | 0.960 | 0.535 | 0.748 | 0.547 | 0.776 | 0.835 |
| J | 22 Nsun2 | 0.685 | 0.745 | 0.715 | 0.431 | 0.832 | 0.796 |
| K | 23 PUM2 | 0.920 | 0.795 | 0.858 | 0.721 | 0.887 | 0.915 |
| L | 24 QKI | 0.840 | 0.890 | 0.865 | 0.731 | 0.920 | 0.937 |
| M | 25 SRSF1 | 0.705 | 0.795 | 0.750 | 0.502 | 0.785 | 0.848 |
| N | 26 TAF15 | 0.885 | 0.855 | 0.870 | 0.740 | 0.906 | 0.930 |
| O | 27 TDP-43 | 0.815 | 0.845 | 0.830 | 0.660 | 0.819 | 0.889 |
| P | 28 TIA1 | 0.760 | 0.875 | 0.818 | 0.639 | 0.999 | 0.900 |
| 29 TIAL1 | 0.645 | 0.840 | 0.743 | 0.494 | 0.842 | ||
| Q | 30 U2AF2 | 0.780 | 0.855 | 0.818 | 0.637 | 0.901 | 0.903 |
| 31 U2AF2 (KD) | 0.700 | 0.865 | 0.783 | 0.573 | 0.874 | ||
| R | 2 Ago2-MNase | 0.239 | 0.659 | 0.451 | −0.113 | 0.605 | 0.405 |
| S | 9 ELAVL1-MNase | 0.719 | 0.245 | 0.491 | −0.040 | 0.605 | 0.499 |
Figure 1Conversion diagram of RNA secondary structure.
Figure 2A two-sample logo to show position-specific distribution difference of base and secondary structure between binding and nonbinding sequence. (a) The difference of base in the QKI dataset; (b) the difference of secondary structure in the QKI dataset; (c) the difference of base in the MOV10 dataset; and (d) the difference of secondary structure in the MOV10 dataset.
Figure 3Comparison of conservation in each position between binding and nonbinding sequence through Information entropy value.
Figure 4The AUCs of single information entropy feather across 31 original experiment datasets.
Figure 5The AUCs of single information entropy feather across 19 merged experiment datasets.
The prediction performance of the models is constructed with a single information entropy feather across 19 merged experiment datasets.
| Training groups | SE | SP | ACC | MCC | AUC_train | AUC_test |
|---|---|---|---|---|---|---|
| A | 0.635 | 0.607 | 0.621 | 0.217 | 0.639 | 0.659 |
| B | 0.765 | 0.820 | 0.793 | 0.465 | 0.847 | 0.865 |
| C | 0.734 | 0.708 | 0.721 | 0.368 | 0.777 | 0.792 |
| D | 0.870 | 0.735 | 0.803 | 0.482 | 0.858 | 0.896 |
| E | 0.885 | 0.848 | 0.866 | 0.557 | 0.916 | 0.914 |
| F | 0.690 | 0.675 | 0.683 | 0.312 | 0.628 | 0.771 |
| G | 0.640 | 0.838 | 0.739 | 0.400 | 0.947 | 0.837 |
| H | 0.682 | 0.730 | 0.706 | 0.347 | 0.754 | 0.783 |
| I | 0.960 | 0.535 | 0.748 | 0.447 | 0.776 | 0.835 |
| J | 0.685 | 0.745 | 0.715 | 0.360 | 0.832 | 0.796 |
| K | 0.920 | 0.795 | 0.858 | 0.550 | 0.887 | 0.915 |
| L | 0.840 | 0.890 | 0.865 | 0.556 | 0.920 | 0.937 |
| M | 0.705 | 0.795 | 0.750 | 0.410 | 0.785 | 0.848 |
| N | 0.885 | 0.855 | 0.870 | 0.561 | 0.906 | 0.930 |
| O | 0.815 | 0.845 | 0.830 | 0.513 | 0.819 | 0.889 |
| P | 0.700 | 0.865 | 0.783 | 0.458 | 0.999 | 0.874 |
| Q | 0.743 | 0.860 | 0.801 | 0.479 | 0.901 | 0.888 |
| R | 0.239 | 0.659 | 0.451 | −0.113 | 0.605 | 0.405 |
| S | 0.719 | 0.245 | 0.491 | −0.040 | 0.605 | 0.499 |
AUCs of 6 selected proteins with a single K-mer.
| Protein | AUC of 3-mer | AUC of 4-mer | AUC of 5-mer |
|---|---|---|---|
| QKI | 0.9 | 0.899 | 0.919 |
| U2AF2 | 0.871 | 0.8871 | 0.886 |
| SRSF1 | 0.853 | 0.856 | 0.867 |
| Mut FUS | 0.829 | 0.836 | 0.841 |
| Nsun2 | 0.76 | 0.763 | 0.767 |
| IGFBP1-3 | 0.657 | 0.675 | 0.686 |
The prediction performance of the models is constructed with a single secondary structure feather across 31 original experiment datasets.
| Training groups | Protein | SE | SP | ACC | MCC | AUC train | AUC test |
|---|---|---|---|---|---|---|---|
| A | 1 Ago/EIF | 0.505 | 0.500 | 0.503 | 0.005 | 0.537 | 0.513 |
| 3 Ago2-1 | 0.420 | 0.555 | 0.488 | −0.025 | 0.509 | ||
| 4 Ago2-2 | 0.440 | 0.580 | 0.510 | 0.020 | 0.510 | ||
| 5 Ago2 | 0.530 | 0.420 | 0.475 | −0.050 | 0.464 | ||
| B | 6 eIF4AIII-1 | 0.035 | 0.965 | 0.500 | 0.000 | 0.560 | 0.547 |
| 7 eIF4AIII-2 | 0.350 | 0.745 | 0.548 | 0.103 | 0.600 | ||
| C | 8 ELAVL1-1 | 0.895 | 0.090 | 0.493 | −0.025 | 0.632 | 0.498 |
| 10 ELAVL1A | 0.635 | 0.410 | 0.523 | 0.046 | 0.533 | ||
| 11 ELAVL1-2 | 0.620 | 0.620 | 0.620 | 0.240 | 0.662 | ||
| D | 12 ESWR1 | 0.460 | 0.640 | 0.550 | 0.102 | 0.582 | 0.555 |
| E | 13 FUS | 0.565 | 0.550 | 0.558 | 0.115 | 0.630 | 0.578 |
| 14 Mut FUS | 0.565 | 0.710 | 0.638 | 0.278 | 0.642 | ||
| F | 15 IGFBP1-3 | 0.565 | 0.710 | 0.638 | 0.278 | 0.523 | 0.642 |
| G | 16 hnRNPC-1 | 0.500 | 0.745 | 0.623 | 0.253 | 0.679 | 0.661 |
| 17 hnRNPC-2 | 0.335 | 0.915 | 0.625 | 0.307 | 0.670 | ||
| H | 18 hnRNPL-1 | 0.540 | 0.500 | 0.520 | 0.040 | 0.573 | 0.551 |
| 19 hnRNPL-2 | 0.505 | 0.600 | 0.553 | 0.105 | 0.564 | ||
| 20 hnRNPL-like | 0.465 | 0.510 | 0.488 | −0.025 | 0.470 | ||
| I | 21 MOV10 | 0.440 | 0.580 | 0.510 | 0.020 | 0.521 | 0.520 |
| J | 22 Nsun2 | 0.615 | 0.530 | 0.573 | 0.146 | 0.610 | 0.595 |
| K | 23 PUM2 | 0.600 | 0.590 | 0.595 | 0.190 | 0.565 | 0.625 |
| L | 24 QKI | 0.600 | 0.590 | 0.595 | 0.190 | 0.724 | 0.625 |
| M | 25 SRSF1 | 0.085 | 0.910 | 0.498 | −0.009 | 0.522 | 0.464 |
| N | 26 TAF15 | 0.580 | 0.625 | 0.603 | 0.205 | 0.609 | 0.618 |
| O | 27 TDP-43 | 0.580 | 0.625 | 0.603 | 0.205 | 0.553 | 0.618 |
| P | 28 TIA1 | 0.155 | 0.955 | 0.555 | 0.183 | 1.000 | 0.636 |
| 29 TIAL1 | 0.120 | 0.950 | 0.535 | 0.126 | 0.567 | ||
| Q | 30 U2AF2 | 0.120 | 0.950 | 0.535 | 0.126 | 0.602 | 0.567 |
| 31 U2AF2 (KD) | 0.330 | 0.705 | 0.518 | 0.038 | 0.548 | ||
| R | 2 Ago2-MNase | 0.545 | 0.470 | 0.508 | 0.015 | 0.499 | 0.516 |
| S | 9 ELAVL1-MNase | 0.485 | 0.550 | 0.518 | 0.035 | 0.519 | 0.506 |
The comparison results among the model of single information entropy, the model of single information entropy with 4-mer, and iDeepS.
| Protein | Information entropy | Information entropy + 4-mer | iDeepS |
|---|---|---|---|
| 1 Ago/EIF | 0.583 | 0.708 | 0.773 |
| 3 Ago2-1 | 0.623 | 0.832 | 0.865 |
| 4 Ago2-2 | 0.656 | 0.839 | 0.868 |
| 5 Ago2 | 0.554 | 0.592 | 0.634 |
| 6 eIF4AIII-1 | 0.858 | 0.932 | 0.950 |
| 7 eIF4AIII-2 | 0.873 | 0.934 | 0.953 |
| 8 ELAVL1-1 | 0.715 | 0.921 | 0.932 |
| 10 ELAVL1A | 0.626 | 0.875 | 0.893 |
| 11 ELAVL1-2 | 0.703 | 0.907 | 0.919 |
| 12 ESWR1 | 0.896 | 0.904 | 0.917 |
| 13 FUS | 0.914 | 0.936 | 0.934 |
| 14 Mut FUS | 0.912 | 0.920 | 0.958 |
| 15 IGFBP1-3 | 0.771 | 0.709 | 0.717 |
| 16 hnRNPC-1 | 0.830 | 0.929 | 0.960 |
| 17 hnRNPC-2 | 0.844 | 0.966 | 0.975 |
| 18 hnRNPL-1 | 0.810 | 0.827 | 0.756 |
| 19 hnRNPL-2 | 0.805 | 0.802 | 0.769 |
| 20 hnRNPL-like | 0.732 | 0.746 | 0.711 |
| 21 MOV10 | 0.835 | 0.839 | 0.813 |
| 22 Nsun2 | 0.796 | 0.811 | 0.835 |
| 23 PUM2 | 0.915 | 0.963 | 0.962 |
| 24 QKI | 0.937 | 0.945 | 0.966 |
| 25 SRSF1 | 0.848 | 0.873 | 0.887 |
| 26 TAF15 | 0.930 | 0.934 | 0.964 |
| 27 TDP-43 | 0.889 | 0.913 | 0.930 |
| 28 TIA1 | 0.900 | 0.911 | 0.930 |
| 29 TIAL1 | 0.842 | 0.856 | 0.893 |
| 30 U2AF2 | 0.903 | 0.921 | 0.953 |
| 31 U2AF2 (KD) | 0.874 | 0.891 | 0.931 |
| 2 Ago2-MNase | 0.405 | 0.615 | 0.591 |
| 9 ELAVL1-MNase | 0.499 | 0.566 | 0.613 |
The prediction performance of the models is constructed with information entropy and 4-mer feather across 31 original experiment datasets.
| Training groups | Protein | SE | SP | ACC | MCC | AUC_train | AUC_test |
|---|---|---|---|---|---|---|---|
| A | 1 Ago/EIF | 0.530 | 0.730 | 0.630 | 0.265 | 0.750 | 0.708 |
| 3 Ago2-1 | 0.815 | 0.680 | 0.748 | 0.500 | 0.750 | 0.832 | |
| 4 Ago2-2 | 0.805 | 0.700 | 0.753 | 0.508 | 0.750 | 0.839 | |
| 5 Ago2 | 0.365 | 0.720 | 0.543 | 0.091 | 0.750 | 0.592 | |
| B | 6 eIF4AIII-1 | 0.810 | 0.875 | 0.843 | 0.686 | 0.924 | 0.932 |
| 7 eIF4AIII-2 | 0.870 | 0.835 | 0.853 | 0.705 | 0.924 | 0.934 | |
| C | 8 ELAVL1-1 | 0.925 | 0.755 | 0.840 | 0.690 | 0.897 | 0.921 |
| 10 ELAVL1A | 0.850 | 0.755 | 0.803 | 0.608 | 0.897 | 0.875 | |
| 11 ELAVL1-2 | 0.890 | 0.720 | 0.805 | 0.619 | 0.897 | 0.907 | |
| D | 12 ESWR1 | 0.850 | 0.800 | 0.825 | 0.651 | 0.845 | 0.904 |
| E | 13 FUS | 0.870 | 0.875 | 0.873 | 0.745 | 0.895 | 0.936 |
| 14 Mut FUS | 0.830 | 0.885 | 0.858 | 0.716 | 0.895 | 0.920 | |
| F | 15 IGFBP1-3 | 0.690 | 0.560 | 0.625 | 0.252 | 0.695 | 0.709 |
| G | 16 hnRNPC-1 | 0.865 | 0.850 | 0.858 | 0.715 | 0.952 | 0.929 |
| 17 hnRNPC-2 | 0.960 | 0.870 | 0.915 | 0.833 | 0.952 | 0.966 | |
| H | 18 hnRNPL-1 | 0.775 | 0.725 | 0.750 | 0.501 | 0.765 | 0.827 |
| 19 hnRNPL-2 | 0.780 | 0.715 | 0.748 | 0.496 | 0.765 | 0.802 | |
| 20 hnRNPL-like | 0.645 | 0.735 | 0.690 | 0.382 | 0.765 | 0.746 | |
| I | 21 MOV10 | 0.875 | 0.655 | 0.765 | 0.543 | 0.793 | 0.839 |
| J | 22 Nsun2 | 0.725 | 0.735 | 0.730 | 0.460 | 0.832 | 0.811 |
| K | 23 PUM2 | 0.920 | 0.870 | 0.895 | 0.791 | 0.927 | 0.963 |
| L | 24 QKI | 0.895 | 0.855 | 0.875 | 0.751 | 0.941 | 0.945 |
| M | 25 SRSF1 | 0.785 | 0.790 | 0.788 | 0.575 | 0.858 | 0.873 |
| N | 26 TAF15 | 0.870 | 0.890 | 0.880 | 0.760 | 0.897 | 0.934 |
| O | 27 TDP-43 | 0.750 | 0.875 | 0.813 | 0.630 | 0.888 | 0.913 |
| P | 28 TIA1 | 0.775 | 0.900 | 0.838 | 0.680 | 0.998 | 0.911 |
| 29 TIAL1 | 0.645 | 0.875 | 0.760 | 0.534 | 0.998 | 0.856 | |
| Q | 30 U2AF2 | 0.865 | 0.810 | 0.838 | 0.676 | 0.895 | 0.921 |
| 31 U2AF2 (KD) | 0.855 | 0.760 | 0.808 | 0.618 | 0.895 | 0.891 | |
| R | 2 Ago2-MNase | 0.545 | 0.625 | 0.585 | 0.171 | 0.605 | 0.615 |
| S | 9 ELAVL1-MNase | 0.280 | 0.755 | 0.518 | 0.040 | 0.593 | 0.566 |