| Literature DB >> 33142754 |
Dmitry Karasev1, Boris Sobolev1, Alexey Lagunin1,2, Dmitry Filimonov1, Vladimir Poroikov1.
Abstract
Computationally predicting the interaction of proteins and ligands presents three main directions: the search of new target proteins for ligands, the search of new ligands for targets, and predicting the interaction of new proteins and new ligands. We proposed an approach providing the fuzzy classification of protein sequences based on the ligand structural features to analyze the latter most complicated case. We tested our approach on five protein groups, which represented promised targets for drug-like ligands and differed in functional peculiarities. The training sets were built with the original procedure overcoming the data ambiguity. Our study showed the effective prediction of new targets for ligands with an average accuracy of 0.96. The prediction of new ligands for targets displayed the average accuracy 0.95; accuracy estimates were close to our previous results, comparable in accuracy to those of other methods or exceeded them. Using the fuzzy coefficients reflecting the target-to-ligand specificity, we provided predicting interactions for new proteins and new ligands; the obtained accuracy values from 0.89 to 0.99 were acceptable for such a sophisticated task. The protein kinase family case demonstrated the ability to account for subtle features of proteins and ligands required for the specificity of protein-ligand interaction.Entities:
Keywords: local sequence similarity; prediction of target proteins; protein–ligand interactions; proteochemometrics
Mesh:
Substances:
Year: 2020 PMID: 33142754 PMCID: PMC7663273 DOI: 10.3390/ijms21218152
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Characteristics of the training sets representing the protein–ligand interactions.
| Protein Group | Parameter | Cutoff µmol | Number of Target Proteins | Number of Ligands |
|---|---|---|---|---|
| GPCR * | IC50 | 1 | 126 | 546 |
| 10 | 130 | 839 | ||
| Kd | 1 | 30 | 8 | |
| 10 | 32 | 15 | ||
| Ki | 1 | 110 | 4754 | |
| 10 | 112 | 6411 | ||
| Protein kinases | IC50 | 1 | 200 | 3014 |
| 10 | 215 | 3883 | ||
| Kd | 1 | 233 | 111 | |
| 10 | 307 | 120 | ||
| Ki | 1 | 72 | 277 | |
| 10 | 77 | 339 | ||
| Ion channel (ligand-gated) | Ki | 1 | 16 | 15 |
| 10 | 16 | 20 | ||
| Ion channel (voltage-gated) | IC50 | 1 | 29 | 75 |
| 10 | 35 | 163 | ||
| Nuclear receptors | IC50 | 1 | 26 | 121 |
| 10 | 28 | 340 | ||
| Kd | 1 | 13 | 15 | |
| 10 | 13 | 19 | ||
| Ki | 1 | 22 | 55 | |
| 10 | 23 | 89 |
* G protein-coupled receptors.
Figure 1Scenarios simulating prediction of new ligands for known proteins (a), new targets for known ligands (b), and the interaction between new ligands and new targets (c).
Predictive accuracy for ligands’ affinities to proteins (first scenario).
| Protein Group | Parameter | Cutoff µmol | AUC * |
|---|---|---|---|
| GPCR | IC50 | 1 | 0.986 |
| 10 | 0.981 | ||
| Kd | 1 | 0.979 | |
| 10 | 0.981 | ||
| Ki | 1 | 0.987 | |
| 10 | 0.984 | ||
| Protein kinases | IC50 | 1 | 0.963 |
| 10 | 0.954 | ||
| Kd | 1 | 0.808 | |
| 10 | 0.803 | ||
| Ki | 1 | 0.980 | |
| 10 | 0.981 | ||
| Ion channel (ligand-gated) | Ki | 1 | 0.986 |
| 10 | 0.991 | ||
| Ion channel (voltage-gated) | IC50 | 1 | 0.986 |
| 10 | 0.968 | ||
| Nuclear receptors | IC50 | 1 | 0.984 |
| 10 | 0.981 | ||
| Kd | 1 | 0.969 | |
| 10 | 0.972 | ||
| Ki | 1 | 0.993 | |
| 10 | 0.996 |
* Area Under ROC Curve.
Predictive accuracy (AUC) for target affinities to ligands (second scenario) and both uncharacterized interactors (third scenario).
| Protein Group | Parameter | Cutoff µmol | Second Scenario | Third Scenario | ||
|---|---|---|---|---|---|---|
| Frame = 7 | Frame = 30 | Frame = 7 | Frame = 30 | |||
| GPCR | IC50 | 1 | 0.959 | 0.968 | 0.885 | 0.904 |
| 10 | 0.944 | 0.953 | 0.890 | 0.910 | ||
| Kd | 1 | 0.918 | 0.875 | 0.806 | 0.805 | |
| 10 | 0.967 | 0.962 | 0.874 | 0.882 | ||
| Ki | 1 | 0.963 | 0.976 | 0.881 | 0.901 | |
| 10 | 0.966 | 0.977 | 0.899 | 0.918 | ||
| Protein kinases | IC50 | 1 | 0.896 | 0.924 | 0.824 | 0.857 |
| 10 | 0.866 | 0.902 | 0.769 | 0.838 | ||
| Kd | 1 | 0.686 | 0.790 | 0.642 | 0.655 | |
| 10 | 0.710 | 0.797 | 0.647 | 0.650 | ||
| Ki | 1 | 0.930 | 0.956 | 0.869 | 0.893 | |
| 10 | 0.907 | 0.937 | 0.866 | 0.856 | ||
| Ion channel (ligand-gated) | Ki | 1 | 0.979 | 0.979 | 0.948 | 0.957 |
| 10 | 0.970 | 0.973 | 0.958 | 0.963 | ||
| Ion channel (voltage-gated) | IC50 | 1 | 0.985 | 0.985 | 0.913 | 0.932 |
| 10 | 0.886 | 0.898 | 0.787 | 0.839 | ||
| Nuclear receptors | IC50 | 1 | 0.966 | 0.973 | 0.817 | 0.852 |
| 10 | 0.984 | 0.988 | 0.924 | 0.943 | ||
| Kd | 1 | 0.995 | 0.995 | 0.973 | 0.961 | |
| 10 | 0.995 | 0.995 | 0.982 | 0.972 | ||
| Ki | 1 | 0.994 | 0.988 | 0.986 | 0.976 | |
| 10 | 0.993 | 0.987 | 0.992 | 0.989 | ||
Figure 2The accuracy prediction obtained for datasets related to five protein–ligand groups. Each point corresponds to the AUC value from Table 2 and Table 3. The blue, orange, and gray curves are related to the first, second, and third scenario, respectively. Each AUC value equals to maximum of ones calculated at frames of 7 and 30.