Literature DB >> 27668170

Metabolic modeling with Big Data and the gut microbiome.

Jaeyun Sung¹, Vanessa Hale², Annette C Merkel³, Pan-Jun Kim⁴, Nicholas Chia⁵.

Abstract

The recent advances in high-throughput omics technologies have enabled researchers to explore the intricacies of the human microbiome. On the clinical front, the gut microbial community has been the focus of many biomarker-discovery studies. While the recent deluge of high-throughput data in microbiome research has been vastly informative and groundbreaking, we have yet to capture the full potential of omics-based approaches. Realizing the promise of multi-omics data will require integration of disparate omics data, as well as a biologically relevant, mechanistic framework - or metabolic model - on which to overlay these data. Also, a new paradigm for metabolic model evaluation is necessary. Herein, we outline the need for multi-omics data integration, as well as the accompanying challenges. Furthermore, we present a framework for characterizing the ecology of the gut microbiome based on metabolic network modeling.

Entities: Chemical Disease Gene Species

Keywords: Big Data; COMM, community-scale metabolic modeling; Data integration; GEMs, genome scale metabolic models; Gut microbiome; HMP, Human Microbiome Project; Metabolic modeling; Microbial community

Year: 2016 PMID： 27668170 PMCID： PMC5025471 DOI： 10.1016/j.atg.2016.02.001

Source DB: PubMed Journal: Appl Transl Genom ISSN： 2212-0661

Introduction

The promise of the Big Data revolution has yielded an ever-increasing array of data and data types in many fields. In the medical field, the sequencing of the human genome in 2003 opened the door to truly individualized medicine, tailored to our genetic predispositions and risk factors (Collins et al., 2003). The first manifestations of the Big Data promise in medicine were necessarily surveys to identify biological markers of disease risk. While this resulted in databases upon databases of genetic events that explained risk behind hundreds of diseases, we quickly learned that genetics alone was not able to provide a full understanding of many health conditions (Lander, 2011). Researchers began to examine other factors, including the role of environmental influences such as the microbiome (Bultman, 2013, Zackular et al., 2013). In 2008, the Human Microbiome Project (HMP) was established to characterize the role of human-associated microbial communities in human health and disease (Methé et al., 2012, The Human Microbiome Project, C., 2012). Efforts led by the HMP consortium thus far have yielded numerous insights regarding the microbial composition of the human body and the ecological structure and function of the human microbiome. However, a shift from this “profiling” paradigm to one of mechanistic examination is now both warranted and feasible through the integration of multi-omics data onto a framework based on biomolecular pathways and networks. The gut microbial community is increasingly well-characterized by various omics technologies – metagenomics, metatranscriptomics, metabolomics, metaproteomics – and offers much promise for data integration within a mechanistic framework (Erickson et al., 2012, Haiser et al., 2013a, Haiser et al., 2013b, Weir et al., 2013). Gut microbes act as chemical transformers, converting host-acquired or host-produced nutrients into a milieu of metabolites (Lee and Hase, 2014). At the same time, the structure and function of the microbial community respond to changes in host diet or physiology (David et al., 2014, Kashyap et al., 2013, Liou et al., 2013), making microbes both modulators and reflections of the gut environment. The gut microbiome contains over 3 million genes, or approximately 150-fold more than the human genome (Qin et al., 2010); thus, it becomes virtually impossible to obtain more independent samples than there are measured values within one individual's microbiome. The large data sets generated by the most recent omics technologies call for new methods of analysis. No longer can we afford to use a paradigm of statistical power where our insight dwindles with the amount of data we collect. Instead, we should rely on the fact that these variables are not independent of one another and therefore establish a more practical model for assessing the role of the microbiome. A systems approach that utilizes metabolic networks may offer a potential solution. Network reconstruction is one such means of creating a scaffold for synthesizing multiple data types (Feist and Palsson, 2008, Lee et al., 2012, Reed et al., 2006, Töpfer et al., 2015). Metabolic models are composed of a collection of individual chemical reactions that are governed by the fundamental laws of mass conservation and thermodynamics. These models represent large-scale complex cellular dynamics and imply a network whose mechanistic chain of events can be computed to produce an outcome. Models are capable of converting large amounts of data – genetic, metabolic, biochemical – into phenotypes and interactions. The value of metabolic modeling for understanding the complex environment of the gut microbiome is in resolving biochemical relationships within and between microbial species and potentially predicting the effect of ecosystem-wide perturbations, such as antibiotics or pathogen invasion. There have been many recent efforts to model metabolic processes within microbial communities (Heinken and Thiele, 2015, Henry et al., 2009). However, the wealth of data available through multiple omics technologies remains underutilized in these models. In this review, we discuss the promises and limitations offered by current mathematical paradigms for integrating disparate, yet complementary omics data, while pointing out the challenges that remain to be resolved. Finally, we offer our viewpoint on the need for an updated network-aware mathematical framework for statistical power — one that synthesizes multiple channels of information into a biological picture.

The Big Data paradox

The mathematical formalization of our knowledge is one of the most important aspects of any scientific study or clinical trial. As a practical tool, math is a means for taking pattern recognition and systematizing it. It is also a way for us to provide some form of communication and standard for comparing the results of different studies, and in the case of statistical significance, is meant to provide a measure of certainty against a null hypothesis. Historically, clinical trials were developed around randomized treatment arms that were designed to answer the question, “Which treatment (A, B, or C) is better?” By selecting a straight forward metric, such as survival outcomes, statisticians could compare the efficacy of different treatments (Marubini and Valsecchi, 2004); however, this precluded our ability to ask what would happen if we combined treatment A and B, or B and C, or all 3 treatments, except by running yet another clinical trial. At the center of these often long and laborious trials was the notion of statistical power (Lachin, 1981). Just how many cases and controls does one need to ensure we can achieve significance? It is a simple question, but an important one that has been the subject of many sophisticated refinements. Here, there is a fundamental clash between Big Data science and classic clinical trial statistics. Paradoxically, the more data we collect on each subject, the more we decrease our likelihood of identifying statistically significant parameters as a result of multiple hypothesis correction. This is a fundamental flaw in the way that current statistical power calculations deal with large datasets. Approaches to obtaining information from Big Data are different. Big Data is characterized by high volume, variety, and velocity of data generation (Costa, 2013). The strength of multi-omics is not merely the observation of many data points, but the discovery of biological mechanism through observation. Multi-omic Big Data grants us the power to examine disease in a human biological context, rather than extensively relying on murine models, which are limited in relevance to the human gut microbiome (Nguyen et al., 2015). In order to succeed, the Big Data movement in individualized medicine will require a holistic merger between large-scale data and biological mechanism.

Metabolic models for Big Data synthesis

To identify specific biological markers of disease, many studies utilize statistical correlations, which fall short of identifying underlying mechanisms (de Vos and de Vos, 2012). In the past, using Big Data to elucidate a biological mechanism involved generating a limited set of hypotheses that were then tested in the lab. While this approach has great value, it becomes less tenable as the number of measurements grows. The massive data sets generated from high-throughput omics technologies guarantee us more correlations arising purely from random chance. In the gut microbiome, this is especially problematic. The number of potential correlations increases with the hundreds of species and thousands of genes. Furthermore, the number of identified factors contributing to microbial composition including diet (David et al., 2014, De Filippo et al., 2010, Turnbaugh et al., 2009), sex (Chen et al., 2016), and even preservation method of the sample (Sinha et al., 2015), continue to grow and make it more difficult to differentiate the confounding from the causal. A metabolic network provides a global picture of how metabolites and biochemical reactions are interconnected within a particular organism (Thiele and Palsson, 2010). Flux balance analysis on genome-scale metabolic models (GEMs) can be used to simulate microbial growth or to predict the production rate of a particular metabolite (Palsson, 2015). The power of this approach is not only that it recapitulates the mechanistic chemical flow through an entire organism, but also that it has the potential to integrate multiple data types. As the example shown in Fig. 1 indicates, reactions can be linked to genes, which are informed by DNA or RNA sequencing. RNA expression informs the amount of flux a reaction can carry, and metabolomics is a direct measurement of the metabolites. This makes metabolic models an ideal platform for organismal and community-scale data synthesis.

Fig. 1

Subset of a microbial metabolic network with integrated genome, metabolomics, and RNA data. This network is one portion of a cysteine/methionine metabolic network for one bacterial species. The model is constructed based on the bacterial genome. Each box represents a reaction. The numbers within the boxes are KEGG Enzyme Commission (EC) number and code for specific enzymes present in each reaction. Gray boxes represent reactions that occur in this bacteria, as predicted by its genome. Red boxes denote reactions that are not predicted by the genome. Circles represent metabolites consumed and produced within the reaction network. Arrows represent reaction pathways that do (green) or do not (red) occur in this bacteria, as predicted by the model. Black dashed arrows indicate input or output from or to other metabolic networks. Synthesis of omics data is used to inform and improve the model. For example, RNA transcriptomic data reveal what enzymes are being transcribed. In pathways that contain 2 possible enzymes that carry out the same reaction, RNA transcripts help us distinguish which of the enzyme(s) are active. In pathways catalyzed by more than 1 enzyme, yellow boxes indicate reactions/enzymes supported by RNA data. RNA data also quantifies flux which allows us to weight the reaction pathways accordingly: in this model, pathways with the greatest flux have the thickest arrows. Metabolomic data is also used to inform the model. Blue circles represent metabolites present and quantified through metabolomics. Red circles indicate metabolites that were not present or quantifiable. Peach circles represent metabolites that cannot currently be identified using metabolomics.

Increasing evidence suggests that integrating disparate, but complementary, data types can increase the power of one's analysis. Examples of this include the use of whole genome sequencing as a scaffold for RNA data (Wang et al., 2013a, Wang et al., 2013b) and the use of phosphorylation data to understand changes in metabolite concentration (Yugi et al., 2014). Within the microbiome field,16S rRNA data is combined with metagenomics to identify representative genomes and genome characteristics (PICRUSt: Langille et al., 2013; HUMAnN: Abubucker et al., 2012). Recent microbiome studies have also combined metagenome and metatranscriptome data to enable comparison between functional potential (metagenomic abundance/gene copy number) and functional activity (transcriptome level) (Franzosa et al., 2015). This provides insights regarding host–microbial dynamics (Franzosa et al., 2014), highlights functional changes in the microbial community in response to diet (McNulty et al., 2011), and suggests potential disease mechanisms, as in the case with periodontitis (Wang et al., 2013a, Wang et al., 2013b). Metagenome or metatranscriptome data alone would not yield these insights. Other examples of omic data integration include the combination of microbiome and metabolome data in the study of colon cancer (Weir et al., 2013), proteome and metagenome data related to Crohn's disease (Erickson et al., 2012), and metabolome, metagenome, and metatranscriptome data to examine the relationship between the gut microbiome and the xenobiotic metabolism of digoxin (Haiser et al., 2013a, Haiser et al., 2013b). The use of network-based approaches offers a promising avenue to extend beyond the omics integration strategies discussed above. In particular, overlaying high-throughput data onto a mechanistic framework–notably a metabolic model–can serve as a platform for making data integration more biologically meaningful. To fully understand the global picture of how organisms modulate the biochemical environment within our gut, especially in health or disease conditions, it is essential to obtain a clear evaluation of the complex interactions characterizing microbial ecology. Recently, there has been growing interest in using metagenomic information for characterizing community-wide metabolic interactions (Levy et al., 2015, Zelezniak et al., 2015). Fig. 2 provides an overview of such modeling approaches. In the past, assembly of microbial genomes relied exclusively on culture of individual microbes. While high-throughput culture methods are rapidly developing, the majority of gut microbial species remain uncultured (Lagier et al., 2015). Deep metagenomic sequencing allows assembly of full microbial genomes without the need for culture (Jeraldo et al., 2015). Notably, not all cultured microbes have fully-assembled genomes, and not all fully-assembled genomes are completely annotated. However, culture-free metagenomic assembly provides the most complete picture of gastrointestinal microbial genomes currently available. These assemblies allow us to infer metabolic functions of particular organisms, enabling us to model the metabolic activity of an individual microbe. Genome scale models (GEMs) can then be used as the building blocks for community-scale metabolic modeling (COMM) to determine the microbial interactions between members of a community. COMMs provide both a map of the microbial interactions and the global community dynamics, allowing us to examine ecologically relevant traits such as robustness and stability (Proulx et al., 2005). Metabolic models have already demonstrated great potential for modeling the metabolism of the gut microbes (Bauer et al., 2015). However, current COMM reconstruction is based on pre-existing GEMs and not on data from specific microbial communities. Efforts are needed to incorporate new data onto these models as they become available.

Fig. 2

The who, what, and how of microbial community metabolism. 16S rRNA deep microbial community profiling lets us rapidly and cheaply survey which microbes are present and in what abundances. Metagenome sequencing and genome assembly tells us the biological functions each microbial species can potentially perform. Metabolomics and metabolic network reconstructions allow us to understand the biochemical mechanisms of each microbe, and to make quantitative predictions regarding its metabolic activity. Finally, species interaction networks can be used to identify relationships between microbes within the same community.

The challenges of Big Data

While the value of Big Data synthesis is readily evident, implementation is not simple. Issues with data synthesis generally fall into one of three categories: identifying entities to include in the model, integrating the myriad databases, and statistically assessing the final networks (Fondi and Liò, 2015, Imam et al., 2015, Samal and Martin, 2011). The process of entity identification is a seemingly straightforward process whereby one compares data to what is already known. The problem is that all sequence alignment and pattern-matching algorithms will always produce a result, even if the match is poor. How does one know a metabolite/reaction/organism truly belongs in a GEM or COMM? False inclusions result in errors that will potentially propagate and lead to false results while false exclusions may leave a network reconstruction incomplete and therefore unusable from a computational modeling perspective. These sorts of errors are unavoidable, especially in the gut microbiome, where there are millions of signals from hundreds of potential sources. Database integration requires that the entities from one database can be related to the entities in another through a set of meaningful relationships, gene to protein, protein to metabolite, regulon to gene. The centrality of this to any Big Data synthesis can be seen in the growing number of calls for data standardization in the omics sciences (Alivisatos et al., 2015, Dräger and Palsson, 2014, Dubilier et al., 2015). The lack of centralized storage and management of multi-omics data has lead to increasingly large hurdles and analytical bottlenecks as studies, now capable of measuring DNA, RNA, proteomics, and metabolomics, must struggle within individual labs for their own individual solutions. The recent scientific call for a more worldwide view of data gathering has been gaining traction in the microbiome field; this is also a call for more global data management and data unity for future integration (Dubilier et al., 2015). Lastly, one of the biggest shifts will be in how we identify statistically significant results when reconstructing or simulating a GEM or COMM. In other words, how do we determine if a network is statistically different from expected? Most methods utilize graph randomizations to generate networks for statistical comparison. This process randomly exchanges edges within a network, without regard to biochemical structure; thus, network significance is often grossly overestimated (Samal and Martin, 2011). Network significance must be assessed from a set of plausible or at least possible structures if we are to be able to assess the true significance of the results from a network model. While these three hurdles may be unavoidable, they can be favorably embraced. For example, uncertainty can be incorporated through measures of statistical significance or likelihood (Benedict et al., 2014, Chia and Bundschuh, 2006). Instead of asking which reactions are to be included, the better question may be what is the likelihood of a reaction being present in a GEM? By ranking inclusion based on certainty we allow for alternate inclusions when supporting data, from the rest of the potential pathways, is weak. Similarly, errors in translating between the numerous data sources can be tolerated as long as data types can be correctly merged into the model. Finally, a statistical test of network accuracy or the resulting predictions needs to be measured against a set of realistic “random” networks or predictions. One potential way to do this is to limit “random” networks to real biochemical reactions that functionally produce biomass components necessary for cell growth (Samal and Martin, 2011). This eliminates “random” networks that violate mass, charge, or atomic element balance (Samal and Martin, 2011). Assessing the likelihood of the metabolic network predictions versus “random” network predictions, could allow us to assess the reliability of our results.

Future direction

By synthesizing multiple data types onto metabolic networks, we can better capture and elucidate the emergent, macro-level complexity within a microbial world. In Fig. 1, we described absolute values for the sake of simplicity. The complex reality is that each type of omics technology not only measures different entities, but also propagates some amount of error or ambiguity. To circumvent this, a likelihood-based approach would allow us to incorporate a measure of certainty as weighting in a network reconstruction (Benedict et al., 2014). These could, in principle, come from a variety of data types and be combined so as to produce the maximum-likelihood metabolic model. Such a framework would allow us to prioritize consistency between different data types and the overall network structure. This type of platform is already capable of improved gene annotation (Benedict et al., 2014) and gap-filling of metabolic networks (Benedict et al., 2014). In the future, one could use such a platform for improved metabolic modeling. Data synthesis allows us to maximize information to gain mechanistic insight into microbial community and host–microbe dynamics. A multi-omics modeling approach has the potential for elucidating the intricate relationships between host and microbe. Mechanistic models at this front are critical to understanding how therapeutics like diet modulation or probiotics could impact diseases such as inflammatory bowel disease (Nell et al., 2010) or autism (Kang et al., 2013), or diseases linked with long-term environmental/microbial exposures such as colon or pancreatic cancer (Louis et al., 2014, Zambirinis et al., 2014). Moreover, a holistic and multi-omics microbial metabolic model provides the ideal scaffold for the addition of other host systems, such as the immune system or the nervous system. Metabolic modeling of microbial communities is a rapidly emerging research field with a wide range of approaches (Abubucker et al., 2012, Borenstein, 2012, Heinken et al., 2013, Klitgord and Segrè, 2010) that would all benefit from multi-omics synthesis.

56 in total

1. A practical approach to significance assessment in alignment with gaps.

Authors: Nicholas Chia; Ralf Bundschuh
Journal: J Comput Biol Date: 2006-03 Impact factor: 1.479

Review 2. Pancreatic cancer, inflammation, and microbiome.

Authors: Constantinos P Zambirinis; Smruti Pushalkar; Deepak Saxena; George Miller
Journal: Cancer J Date: 2014 May-Jun Impact factor: 3.360

Review 3. The gut microbiota, bacterial metabolites and colorectal cancer.

Authors: Petra Louis; Georgina L Hold; Harry J Flint
Journal: Nat Rev Microbiol Date: 2014-09-08 Impact factor: 60.633

4. Relating the metatranscriptome and metagenome of the human gut.

Authors: Eric A Franzosa; Xochitl C Morgan; Nicola Segata; Levi Waldron; Joshua Reyes; Ashlee M Earl; Georgia Giannoukos; Matthew R Boylan; Dawn Ciulla; Dirk Gevers; Jacques Izard; Wendy S Garrett; Andrew T Chan; Curtis Huttenhower
Journal: Proc Natl Acad Sci U S A Date: 2014-05-19 Impact factor: 11.205

5. Improving metabolic flux predictions using absolute gene expression data.

Authors: Dave Lee; Kieran Smallbone; Warwick B Dunn; Ettore Murabito; Catherine L Winder; Douglas B Kell; Pedro Mendes; Neil Swainston
Journal: BMC Syst Biol Date: 2012-06-19

6. Randomizing genome-scale metabolic networks.

Authors: Areejit Samal; Olivier C Martin
Journal: PLoS One Date: 2011-07-14 Impact factor: 3.240

7. Draft genome sequences of 24 microbial strains assembled from direct sequencing from 4 stool samples.

Authors: Patricio Jeraldo; Álvaro Hernández; Bryan A White; Daniel O'Brien; David Ahlquist; Lisa Boardman; Nicholas Chia
Journal: Genome Announc Date: 2015-05-28

Review 8. Improving collaboration by standardization efforts in systems biology.

Authors: Andreas Dräger; Bernhard Ø Palsson
Journal: Front Bioeng Biotechnol Date: 2014-12-08

9. Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires.

Authors: Eugen Bauer; Cedric Christian Laczny; Stefania Magnusdottir; Paul Wilmes; Ines Thiele
Journal: Microbiome Date: 2015-11-30 Impact factor: 14.650

10. Reduced incidence of Prevotella and other fermenters in intestinal microflora of autistic children.

Authors: Dae-Wook Kang; Jin Gyoon Park; Zehra Esra Ilhan; Garrick Wallstrom; Joshua Labaer; James B Adams; Rosa Krajmalnik-Brown
Journal: PLoS One Date: 2013-07-03 Impact factor: 3.240

8 in total

1. Synthesis of multi-omic data and community metabolic models reveals insights into the role of hydrogen sulfide in colon cancer.

Authors: Vanessa L Hale; Patricio Jeraldo; Michael Mundy; Janet Yao; Gary Keeney; Nancy Scott; E Heidi Cheek; Jennifer Davidson; Megan Greene; Christine Martinez; John Lehman; Chandra Pettry; Erica Reed; Kelly Lyke; Bryan A White; Christian Diener; Osbaldo Resendis-Antonio; Jaime Gransee; Tumpa Dutta; Xuan-Mai Petterson; Lisa Boardman; David Larson; Heidi Nelson; Nicholas Chia
Journal: Methods Date: 2018-04-26 Impact factor: 3.608

Review 2. Community metabolic modeling approaches to understanding the gut microbiome: Bridging biochemistry and ecology.

Authors: Helena Mendes-Soares; Nicholas Chia
Journal: Free Radic Biol Med Date: 2016-12-15 Impact factor: 7.376

Review 3. Metabolite and Microbiome Interplay in Cancer Immunotherapy.

Authors: Caroline H Johnson; Mary E Spilker; Laura Goetz; Scott N Peterson; Gary Siuzdak
Journal: Cancer Res Date: 2016-10-11 Impact factor: 12.701

4. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis.

Authors: Jaeyun Sung; Seunghyeon Kim; Josephine Jill T Cabatbat; Sungho Jang; Yong-Su Jin; Gyoo Yeol Jung; Nicholas Chia; Pan-Jun Kim
Journal: Nat Commun Date: 2017-06-06 Impact factor: 14.919

5. Association network analysis identifies enzymatic components of gut microbiota that significantly differ between colorectal cancer patients and healthy controls.

Authors: Dongmei Ai; Hongfei Pan; Xiaoxin Li; Min Wu; Li C Xia
Journal: PeerJ Date: 2019-07-29 Impact factor: 2.984

6. A novel graph theoretical approach for modeling microbiomes and inferring microbial ecological relationships.

Authors: Suyeon Kim; Ishwor Thapa; Ling Zhang; Hesham Ali
Journal: BMC Genomics Date: 2019-12-20 Impact factor: 3.969

7. Systems View of Deconditioning During Spaceflight Simulation in the PlanHab Project: The Departure of Urine ¹ H-NMR Metabolomes From Healthy State in Young Males Subjected to Bedrest Inactivity and Hypoxia.

Authors: Robert Šket; Leon Deutsch; Zala Prevoršek; Igor B Mekjavić; Janez Plavec; Joern Rittweger; Tadej Debevec; Ola Eiken; Blaz Stres
Journal: Front Physiol Date: 2020-12-07 Impact factor: 4.566

8. Enhancing Microbiome Research through Genome-Scale Metabolic Modeling.

Authors: Nana Y D Ankrah; David B Bernstein; Matthew Biggs; Maureen Carey; Melinda Engevik; Beatriz García-Jiménez; Meiyappan Lakshmanan; Alan R Pacheco; Snorre Sulheim; Gregory L Medlock
Journal: mSystems Date: 2021-12-14 Impact factor: 6.496

8 in total