| Literature DB >> 31861473 |
Dmitry Karasev1, Boris Sobolev1, Alexey Lagunin1,2, Dmitry Filimonov1, Vladimir Poroikov1.
Abstract
The affinity of different drug-like ligands to multiple protein targets reflects general chemical-biological interactions. Computational methods estimating such interactions analyze the available information about the structure of the targets, ligands, or both. Prediction of protein-ligand interactions based on pairwise sequence alignment provides reasonable accuracy if the ligands' specificity well coincides with the phylogenic taxonomy of the proteins. Methods using multiple alignment require an accurate match of functionally significant residues. Such conditions may not be met in the case of diverged protein families. To overcome these limitations, we propose an approach based on the analysis of local sequence similarity within the set of analyzed proteins. The positional scores, calculated by sequence fragment comparisons, are used as input data for the Bayesian classifier. Our approach provides a prediction accuracy comparable or exceeding those of other methods. It was demonstrated on the popular Gold Standard test sets, presenting different sequence heterogeneity and varying from the group, including different protein families to the more specific groups. A reasonable prediction accuracy was also found for protein kinases, displaying weak relationships between sequence phylogeny and inhibitor specificity. Thus, our method can be applied to the broad area of protein-ligand interactions.Entities:
Keywords: local sequence similarity; prediction of protein targets; protein–ligand interaction
Mesh:
Substances:
Year: 2019 PMID: 31861473 PMCID: PMC6981593 DOI: 10.3390/ijms21010024
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Evaluation of our approach on the datasets received from Gold Standard and PASS Targets [40]. AUC (Area Under Curve) amounts were calculated at two F values.
| Data Source | Dataset | ||
|---|---|---|---|
| 7 | 30 | ||
| Gold Standard [ | Ion Channel | 0.796 | 0.921 |
| GPCR | 0.897 | 0.939 | |
| Nuclear receptors | 0.989 | 0.991 | |
| Enzymes | 0.874 | 0.941 | |
| PASS Targets [ | GPCR | 0.879 | 0.902 |
| Ion Channel | 0.729 | 0.81 | |
| Nuclear receptors | 0.963 | 0.966 | |
Figure 1Results of ROC (Receiver Operating Characteristic) analysis obtained on Gold Standard datasets. x- and y-axes are coincided with False Positive and True Positive Rates, respectively. The green solid and red dashed lines depict the results obtained at F values 30 and 7, respectively. The results for Nuclear Receptors are displayed by the single line as practically identical.
Figure 2Results of ROC analysis obtained on the protein kinase set from Karaman and coworkers [42] at the 0.1 µM cutoff. The solid and dashed lines depict the results obtained at F values 30 and 7, respectively.
AUC values obtained in testing our approach on protein kinases.
| Dataset | ||
|---|---|---|
| 7 | 30 | |
| Karaman et al., 2008 [ | 0.815 | 0.835 |
| Karaman et al., 2008 [ | 0.790 | 0.801 |
| PASS Targets dataset [ | 0.694 | 0.779 |
| Gao et al., 2013 [ | 0.753 | 0.765 |
Figure 3Protein targets of an inhibitor (VX-680) on the phylogenetic tree of human protein kinases. Triangles and circles depict the inhibited and not inhibited kinases, respectively. The data on inhibition at the 1.0 µM threshold was retrieved from [42]. The tree is reproduced from Cell Signaling Technology Inc. (www.cellsignal.com).
The AUC values calculated by our approach and other methods on the Gold Standard datasets.
| Dataset | Method | |||||||
|---|---|---|---|---|---|---|---|---|
| NetLapRLS [ | WNN-GIP [ | RLScore [ | KBMF2K [ | CMF [ | NRLMF [ | TMF [ | Our Approach * | |
| Enzymes | 0.905 | 0.947 | 0.931 | 0.876 | 0.915 | 0.966 | 0.976 | 0.941 |
| Ion Channels | 0.914 | 0.950 | 0.937 | 0.938 | 0.905 | 0.964 | 0.972 | 0.921 |
| GPCR | 0.770 | 0.926 | 0.853 | 0.882 | 0.837 | 0.930 | 0.959 | 0.939 |
| Nuclear Receptors | 0.655 | 0.935 | 0.736 | 0.668 | 0.680 | 0.851 | 0.929 | 0.991 |
* The better results for each dataset are presented.
The numbers of sequences and ligands used for testing of our method.
| Data Sources | Target Type | Sequences | Ligands |
|---|---|---|---|
| Gold Standard [ | Enzymes | 581 | 97 |
| Ion channels | 201 | 90 | |
| GPCR | 59 | 35 | |
| Nuclear receptors | 13 | 4 | |
| PASS Targets dataset [ | Protein kinases | 374 | 527 |
| Ion channels | 14 | 3 | |
| GPCR | 76 | 197 | |
| Nuclear receptors | 15 | 14 | |
| Karaman M., 0.1 μM [ | Protein kinases | 220 | 26 |
| Karaman M., 1 μM [ | Protein kinases | 261 | 30 |
| Gao Y. [ | Protein kinases | 123 | 74 |