Literature DB >> 36124802

Cross-modality and self-supervised protein embedding for compound-protein affinity and contact prediction.

Yuning You1, Yang Shen1,2.   

Abstract

MOTIVATION: Computational methods for compound-protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound-protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound-protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models.
RESULTS: To overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins.
AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/Shen-Lab/CPAC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 36124802      PMCID: PMC9486597          DOI: 10.1093/bioinformatics/btac470

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  19 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  The art and practice of structure-based drug design: a molecular modeling perspective.

Authors:  R S Bohacek; C McMartin; W C Guida
Journal:  Med Res Rev       Date:  1996-01       Impact factor: 12.944

3.  Distance-based protein folding powered by deep learning.

Authors:  Jinbo Xu
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-09       Impact factor: 11.205

4.  DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.

Authors:  Mostafa Karimi; Di Wu; Zhangyang Wang; Yang Shen
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

5.  Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences.

Authors:  Masashi Tsubaki; Kentaro Tomii; Jun Sese
Journal:  Bioinformatics       Date:  2019-01-15       Impact factor: 6.937

6.  Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

Authors:  Chuming Chen; Darren A Natale; Robert D Finn; Hongzhan Huang; Jian Zhang; Cathy H Wu; Raja Mazumder
Journal:  PLoS One       Date:  2011-04-27       Impact factor: 3.240

7.  DeepDTA: deep drug-target binding affinity prediction.

Authors:  Hakime Öztürk; Arzucan Özgür; Elif Ozkirimli
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

Review 8.  A comprehensive map of molecular drug targets.

Authors:  Rita Santos; Oleg Ursu; Anna Gaulton; A Patrícia Bento; Ramesh S Donadi; Cristian G Bologa; Anneli Karlsson; Bissan Al-Lazikani; Anne Hersey; Tudor I Oprea; John P Overington
Journal:  Nat Rev Drug Discov       Date:  2016-12-02       Impact factor: 84.694

9.  Pfam: The protein families database in 2021.

Authors:  Jaina Mistry; Sara Chuguransky; Lowri Williams; Matloob Qureshi; Gustavo A Salazar; Erik L L Sonnhammer; Silvio C E Tosatto; Lisanna Paladin; Shriya Raj; Lorna J Richardson; Robert D Finn; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

10.  Highly accurate protein structure prediction with AlphaFold.

Authors:  John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli
Journal:  Nature       Date:  2021-07-15       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.