| Literature DB >> 35958111 |
Hongyan Du1,2, Dejun Jiang1,2, Junbo Gao1, Xujun Zhang1, Lingxiao Jiang1, Yundian Zeng1, Zhenxing Wu1, Chao Shen1, Lei Xu3, Dongsheng Cao4, Tingjun Hou1,2, Peichen Pan1.
Abstract
Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.Entities:
Year: 2022 PMID: 35958111 PMCID: PMC9343084 DOI: 10.34133/2022/9873564
Source DB: PubMed Journal: Research (Wash D C) ISSN: 2639-5274
Figure 1The workflow of DeepCoSI. (a) The PocketGNNLayer for message passing and atom state update which is the same as in PriDeepCoSI. (b) Another graph G is constructed to encode the noncovalent interaction between the thiol group and other atoms in pockets. V and E denote the set of nodes (atoms) and edges (bonds) in G, respectively. CysInteractLayer accepts the final node features from PocketGNNLayer and aggregates the interaction information. (c) The readout from PocketGNNLayer to represent pocket outline and the readout from CysInteractLayer to represent cysteine reactivity are combined to predict the cysteine ligandability (the ability of the cysteine to be targeted by a covalent ligand, which was represented by a probability value output by model).
Figure 2(a) The performance comparison between DeepCoSI and PriDeepCoSI. (b) The performance of DeepCoSI with different interaction thresholds.
Figure 3(a) The performance comparison between DeepCoSI and SVM model. (b) The distribution of the predicted probabilities by the DeepDoSI and SVM models.
Figure 4Changes in the predicted value after structure modification. (a) The deprotonation of cysteines before covalent linking with ligands. (b) Structure modification on PDB 6QHO to decrease the electrostatic repulsion between Cys147 and Asp277. (c) Structure modification on PDB 6QHO to decrease the electrostatic attraction between Cys147 and Lys165. (d) Structure modification on PDB 6I0X to change the orientation of cysteine from towards the pocket cavity to towards the pocket edge. (e) Statistics study on model's response to the changes in electrostatic interaction. The distance between charge centers represents the strength of interactions.
Figure 5The performance of DeepCoSI in real application scenarios. (a) External test set 1: the distribution of the normalized ranking according to the probability predicted by DeepCoSI. (b) External test set 1: the cumulative curve of the success rate when setting different criteria. (c) External test set 2: the distribution of the normalized ranking according to the probability predicted by DeepCoSI. (d) External test set 2: the cumulative curve of the success rate when setting different criteria.
The result from the profiled database by DeepCoSI.
| Protein | PDB | Cys | Ranking | Num_Cysa | Reference |
|---|---|---|---|---|---|
| O43318 | 7NTH | A-174 | 1 | 9 | Ref. [ |
| P14900 | 2Y67 | A-413 | 1 | 5 | Ref. [ |
| P16455 | 1QNT | A-145 | 1 | 4 | Ref. [ |
| P20582 | 3H76 | A-112 | 1 | 5 | Ref. [ |
| P29350 | 4HJP | A-453 | 1 | 5 | Ref. [ |
| P35968 | 2P2H | A-1045 | 1 | 8 | Ref. [ |
| P61077 | 1X23 | A-85 | 1 | 4 | Ref. [ |
| Q9BY41 | 5THV | A-153 | 1 | 9 | Ref. [ |
| P10828 | 6KKB | X-309 | 3 | 7 | Ref. [ |
| P41182 | 6TOK | A-53 | 2 | 5 | Ref. [ |
| Q15118 | 2Q8G | A-240 | 4 | 4 | Ref. [ |
aTotal number of the flexible cysteines in structure.