| Literature DB >> 34536380 |
Samuel Sledzieski1, Rohit Singh1, Lenore Cowen2, Bonnie Berger3.
Abstract
We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.Entities:
Keywords: cow rumen; deep learning; embedding; function prediction; genome to phenome; interpretability; language models; metabolism; module detection; protein-protein interaction
Mesh:
Substances:
Year: 2021 PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 11.091