Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Cross-modality and self-supervised protein embedding for compound-protein affinity and contact prediction.

Literature DB >> 36124802

Cross-modality and self-supervised protein embedding for compound-protein affinity and contact prediction.

Abstract

MOTIVATION: Computational methods for compound-protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound-protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound-protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models.
RESULTS: To overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins.
AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/Shen-Lab/CPAC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2022 PMID： 36124802 PMCID： PMC9486597 DOI： 10.1093/bioinformatics/btac470

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.931

Keyword Cloud
References

19 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. The art and practice of structure-based drug design: a molecular modeling perspective.

Authors: R S Bohacek; C McMartin; W C Guida
Journal: Med Res Rev Date: 1996-01 Impact factor: 12.944

3. Distance-based protein folding powered by deep learning.

Authors: Jinbo Xu
Journal: Proc Natl Acad Sci U S A Date: 2019-08-09 Impact factor: 11.205

4. DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.

Authors: Mostafa Karimi; Di Wu; Zhangyang Wang; Yang Shen
Journal: Bioinformatics Date: 2019-09-15 Impact factor: 6.937

5. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences.

Authors: Masashi Tsubaki; Kentaro Tomii; Jun Sese
Journal: Bioinformatics Date: 2019-01-15 Impact factor: 6.937

6. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

Authors: Chuming Chen; Darren A Natale; Robert D Finn; Hongzhan Huang; Jian Zhang; Cathy H Wu; Raja Mazumder
Journal: PLoS One Date: 2011-04-27 Impact factor: 3.240

7. DeepDTA: deep drug-target binding affinity prediction.

Authors: Hakime Öztürk; Arzucan Özgür; Elif Ozkirimli
Journal: Bioinformatics Date: 2018-09-01 Impact factor: 6.937

Review 8. A comprehensive map of molecular drug targets.

Authors: Rita Santos; Oleg Ursu; Anna Gaulton; A Patrícia Bento; Ramesh S Donadi; Cristian G Bologa; Anneli Karlsson; Bissan Al-Lazikani; Anne Hersey; Tudor I Oprea; John P Overington
Journal: Nat Rev Drug Discov Date: 2016-12-02 Impact factor: 84.694

9. Pfam: The protein families database in 2021.

Authors: Jaina Mistry; Sara Chuguransky; Lowri Williams; Matloob Qureshi; Gustavo A Salazar; Erik L L Sonnhammer; Silvio C E Tosatto; Lisanna Paladin; Shriya Raj; Lorna J Richardson; Robert D Finn; Alex Bateman
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

10. Highly accurate protein structure prediction with AlphaFold.

Authors: John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli
Journal: Nature Date: 2021-07-15 Impact factor: 49.962