Literature DB >> 33757283

MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization.

Tian Cai1, Hansaim Lim2, Kyra Alyssa Abbu3, Yue Qiu4, Ruth Nussinov5,6, Lei Xie1,2,3,4,7.   

Abstract

Small molecules play a critical role in modulating biological systems. Knowledge of chemical-protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical-protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.

Entities:  

Year:  2021        PMID: 33757283     DOI: 10.1021/acs.jcim.0c01285

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  3 in total

1.  Sequence-based prediction of protein binding regions and drug-target interactions.

Authors:  Ingoo Lee; Hojung Nam
Journal:  J Cheminform       Date:  2022-02-08       Impact factor: 5.514

2.  GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.

Authors:  Shaofu Lin; Chengyu Shi; Jianhui Chen
Journal:  BMC Bioinformatics       Date:  2022-09-07       Impact factor: 3.307

3.  DeepREAL: A Deep Learning Powered Multi-scale Modeling Framework for Predicting Out-of-distribution Ligand-induced GPCR Activity.

Authors:  Tian Cai; Kyra Alyssa Abbu; Yang Liu; Lei Xie
Journal:  Bioinformatics       Date:  2022-03-11       Impact factor: 6.931

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.