Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model.

Literature DB >> 34905358

Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model.

Jannis Born^1,2, Tien Huynh³, Astrid Stroobants⁴, Wendy D Cornell³, Matteo Manica¹.

Abstract

Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.

Entities: Chemical

Mesh：

Substances：
Ligands
Proteins

Year: 2021 PMID： 34905358 DOI： 10.1021/acs.jcim.1c00889

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

2 in total

1. XLPFE: A Simple and Effective Machine Learning Scoring Function for Protein-Ligand Scoring and Ranking.

Authors: Lina Dong; Xiaoyang Qu; Binju Wang
Journal: ACS Omega Date: 2022-06-13

2. On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction.

Authors: Jannis Born; Yoel Shoshan; Tien Huynh; Wendy D Cornell; Eric J Martin; Matteo Manica
Journal: J Chem Inf Model Date: 2022-09-13 Impact factor: 6.162

2 in total