| Literature DB >> 29862358 |
Andrea De Martino1,2, Daniele De Martino3.
Abstract
A cornerstone of statistical inference, the maximum entropy framework is being increasingly applied to construct descriptive and predictive models of biological systems, especially complex biological networks, from large experimental data sets. Both its broad applicability and the success it obtained in different contexts hinge upon its conceptual simplicity and mathematical soundness. Here we try to concisely review the basic elements of the maximum entropy principle, starting from the notion of 'entropy', and describe its usefulness for the analysis of biological systems. As examples, we focus specifically on the problem of reconstructing gene interaction networks from expression data and on recent work attempting to expand our system-level understanding of bacterial metabolism. Finally, we highlight some extensions and potential limitations of the maximum entropy approach, and point to more recent developments that are likely to play a key role in the upcoming challenges of extracting structures and information from increasingly rich, high-throughput biological data.Entities:
Keywords: Bioinformatics; Computational biology; Mathematical bioscience; Molecular biology; Systems biology
Year: 2018 PMID: 29862358 PMCID: PMC5968179 DOI: 10.1016/j.heliyon.2018.e00596
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Sketch of the two examples of applications of the maximum entropy principle to biological data analysis discussed in the text. (A) Inference of gene interaction networks from empirical expression data (see Sec. 3 for details). (B) Inference of genome-scale metabolic flux patterns from empirical growth rate distributions in bacteria (see Sec. 4 for details). For each case, we describe schematically the empirical input (left column), the formulation of the maximum entropy inference problem (middle column), and an example of the inferred biological insight (right column).
Figure 2Maximum entropy modeling of growth rate distributions describing E. coli growth at different temperatures. Data are taken from [76]. (A) Empirical distributions (markers) are shown together with the MaxEnt distributions obtained by fitting β to match the corresponding means (continuous lines), for three different temperatures. For comparison, the dashed lines and the corresponding shaded areas describe the growth rate distributions corresponding to uniform samplings of the solution spaces of the metabolic model, Eq. (12), in the three cases. In each case, such distributions are described by Eq. (14), with a = 0, b = 22 and different values of λmax. (B) and (C): inferred distributions of the ATP synthase flux (B) and of the flux through phosphofructokinase (PFK) (C) at different temperatures.