Jeffrey N Law1, Kyle Akers1, Nure Tasnina2, Catherine M Della Santina3, Shay Deutsch4, Meghana Kshirsagar5, Judith Klein-Seetharaman6, Mark Crovella7, Padmavathy Rajagopalan8, Simon Kasif3, T M Murali2. 1. Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA. 2. Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA. 3. Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA. 4. Department of Mathematics, University of California, Los Angeles, CA 90095, USA. 5. AI for Good Lab, Microsoft, Redmond, WA 98052, USA. 6. Department of Chemistry, Colorado School of Mines, 1500 Illinois St, Golden, CO 80401, USA. 7. Department of Computer Science, Boston University, Boston, MA 02215, USA. 8. Department of Chemical Engineering, Virginia Tech, Blacksburg, VA 24061, USA.
Abstract
BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
Authors: David van Dijk; Roshan Sharma; Juozas Nainys; Kristina Yim; Pooja Kathail; Ambrose J Carr; Cassandra Burdziak; Kevin R Moon; Christine L Chaffer; Diwakar Pattabiraman; Brian Bierie; Linas Mazutis; Guy Wolf; Smita Krishnaswamy; Dana Pe'er Journal: Cell Date: 2018-06-28 Impact factor: 41.582
Authors: Marta L DeDiego; Jose L Nieto-Torres; Jose M Jiménez-Guardeño; Jose A Regla-Nava; Enrique Alvarez; Juan Carlos Oliveros; Jincun Zhao; Craig Fett; Stanley Perlman; Luis Enjuanes Journal: PLoS Pathog Date: 2011-10-20 Impact factor: 6.823
Authors: Damian Szklarczyk; John H Morris; Helen Cook; Michael Kuhn; Stefan Wyder; Milan Simonovic; Alberto Santos; Nadezhda T Doncheva; Alexander Roth; Peer Bork; Lars J Jensen; Christian von Mering Journal: Nucleic Acids Res Date: 2016-10-18 Impact factor: 16.971
Authors: Victor G Puelles; Marc Lütgehetmann; Maja T Lindenmeyer; Jan P Sperhake; Milagros N Wong; Lena Allweiss; Silvia Chilla; Axel Heinemann; Nicola Wanner; Shuya Liu; Fabian Braun; Shun Lu; Susanne Pfefferle; Ann S Schröder; Carolin Edler; Oliver Gross; Markus Glatzel; Dominic Wichmann; Thorsten Wiech; Stefan Kluge; Klaus Pueschel; Martin Aepfelbacher; Tobias B Huber Journal: N Engl J Med Date: 2020-05-13 Impact factor: 91.245