| Literature DB >> 36098536 |
Jannis Born1,2, Yoel Shoshan3, Tien Huynh4, Wendy D Cornell4, Eric J Martin5, Matteo Manica1.
Abstract
Recent work showed that active site rather than full-protein-sequence information improves predictive performance in kinase-ligand binding affinity prediction. To refine the notion of an "active site", we here propose and compare multiple definitions. We report significant evidence that our novel definition is superior to previous definitions and better models of ATP-noncompetitive inhibitors. Moreover, we leverage the discontiguity of the active site sequence to motivate novel protein-sequence augmentation strategies and find that combining them further improves performance.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36098536 PMCID: PMC9516689 DOI: 10.1021/acs.jcim.2c00840
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 6.162
Figure 1Overview of active site site definitions and representations. A) Visualization of cAMP-dependent protein kinase catalytic subunit alpha (P17612). Residues unique to the active site definitions of refs (8) and (10) are colored in orange and green, respectively. Residues contained in both definitions are shown in red. B) Partial amino acid sequence (residues 48–62) of the same kinase. The upper gray panel displays the four kinase sequence representations examined in this work. The lower gray panel visualizes three kinase augmentation strategies, exemplified on the “combined” active site definition: flipping (i.e., reversing) the entire sequence, flipping contiguous subsequences, and swapping neighboring subsequences. Residues affected by the augmentation are encircled in black.
Results on Validation and Test Data (Ligand Split)a
| RMSE
( | Pearson
( | ||||
|---|---|---|---|---|---|
| data | config | BiMCA | BiMCA-pre | BiMCA | BiMCA-pre |
| val. | full sequence | 0.908±0.01 | 0.848±0.01 | 0.748±0.00 | 0.782±0.01 |
| AS (Sheridan) | 0.829±0.01 | 0.821±0.01 | 0.794±0.00 | 0.797±0.01 | |
| AS (Martin) | 0.839±0.01 | 0.813±0.01 | 0.791±0.00 | ||
| AS (combined) | |||||
| test | full sequence | 0.912±0.01 | 0.863±0.01 | 0.744±0.00 | 0.774±0.01 |
| AS (Sheridan) | 0.826±0.01 | 0.792±0.01 | 0.795±0.01 | ||
| AS (Martin) | 0.842±0.01 | 0.818±0.01 | 0.789±0.01 | 0.801±0.01 | |
| AS (combined) | |||||
10-fold cross-validation results on kinase data from BindingDB. For each model and data partition, we show mean and standard deviation across 10 folds and mark the best representation in bold.
Figure 2RMSE in affinity prediction for kinase split on validation and test data. 10-fold cross-validation results on kinase data from BindingDB. Performance of validation (A) and test data (B) is shown. Statistically significant differences between the three different active site configurations are marked with a star.
Results of Sequence Augmentation (Kinase Split)a
| RMSE
( | Pearson
( | ||||
|---|---|---|---|---|---|
| data | augmentation | BiMCA | BiMCA-pre | BiMCA | BiMCA-pre |
| val. | none | 1.32±0.16 | 1.20±0.12 | 0.438±0.08 | 0.489±0.09 |
| flip (F) | 1.25±0.13 | 1.19±0.13 | 0.463±0.08 | 0.502±0.08 | |
| flip subseq (FS) | 1.28±0.12 | 0.431±0.11 | |||
| swap subseq (SS) | 1.28±0.17 | 0.443±0.11 | 0.511±0.09 | ||
| FS + SS | 1.27±0.11 | 0.444±0.09 | 0.508±0.09 | ||
| F + FS + SS | 0.505±0.09 | ||||
| test | none | 1.33±0.08 | 1.23±0.08 | 0.431±0.06 | 0.505±0.07 |
| flip (F) | 1.28±0.05 | 1.23±0.07 | 0.478±0.04 | 0.515±0.06 | |
| flip subseq (FS) | 1.32±0.09 | 1.22±0.04 | 0.444±0.08 | 0.516±0.04 | |
| swap subseq (SS) | 1.28±0.04 | 1.23±0.03 | 0.506±0.06 | ||
| FS + SS | 1.29±0.06 | 1.22±0.07 | 0.469±0.04 | 0.526±0.05 | |
| F + FS + SS | |||||
All models were used the Combined active site definition.