Ahmad Borzou1, Rovshan G Sadygov1. 1. Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA.
Abstract
MOTIVATION: Inferring the direct relationships between biomolecules from omics datasets is essential for the understanding of biological and disease mechanisms. Gaussian Graphical Model (GGM) provides a fairly simple and accurate representation of these interactions. However, estimation of the associated interaction matrix using data is challenging due to a high number of measured molecules and a low number of samples. RESULTS: In this article, we use the thermodynamic entropy of the non-equilibrium system of molecules and the data-driven constraints among their expressions to derive an analytic formula for the interaction matrix of Gaussian models. Through a data simulation, we show that our method returns an improved estimation of the interaction matrix. Also, using the developed method, we estimate the interaction matrix associated with plasma proteome and construct the corresponding GGM and show that known NAFLD-related proteins like ADIPOQ, APOC, APOE, DPP4, CAT, GC, HP, CETP, SERPINA1, COLA1, PIGR, IGHD, SAA1 and FCGBP are among the top 15% most interacting proteins of the dataset. AVAILABILITY AND IMPLEMENTATION: The supplementary materials can be found in the following URL: http://dynamic-proteome.utmb.edu/PrecisionMatrixEstimater/PrecisionMatrixEstimater.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Inferring the direct relationships between biomolecules from omics datasets is essential for the understanding of biological and disease mechanisms. Gaussian Graphical Model (GGM) provides a fairly simple and accurate representation of these interactions. However, estimation of the associated interaction matrix using data is challenging due to a high number of measured molecules and a low number of samples. RESULTS: In this article, we use the thermodynamic entropy of the non-equilibrium system of molecules and the data-driven constraints among their expressions to derive an analytic formula for the interaction matrix of Gaussian models. Through a data simulation, we show that our method returns an improved estimation of the interaction matrix. Also, using the developed method, we estimate the interaction matrix associated with plasma proteome and construct the corresponding GGM and show that known NAFLD-related proteins like ADIPOQ, APOC, APOE, DPP4, CAT, GC, HP, CETP, SERPINA1, COLA1, PIGR, IGHD, SAA1 and FCGBP are among the top 15% most interacting proteins of the dataset. AVAILABILITY AND IMPLEMENTATION: The supplementary materials can be found in the following URL: http://dynamic-proteome.utmb.edu/PrecisionMatrixEstimater/PrecisionMatrixEstimater.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg Journal: Nature Date: 2000-02-10 Impact factor: 49.962
Authors: Swetha Vasudevan; Efrat Flashner-Abramson; F Remacle; R D Levine; Nataly Kravchenko-Balasha Journal: Proc Natl Acad Sci U S A Date: 2018-07-05 Impact factor: 11.205
Authors: Hannah Johnston; Paul Dickinson; Alasdair Ivens; Amy H Buck; R D Levine; Francoise Remacle; Colin J Campbell Journal: Proc Natl Acad Sci U S A Date: 2019-09-10 Impact factor: 11.205
Authors: Anatoliy I Yashin; Deqing Wu; Konstantin Arbeev; Arseniy P Yashkin; Igor Akushevich; Olivia Bagley; Matt Duan; Svetlana Ukraintseva Journal: J Transl Genet Genom Date: 2021-10-19