Literature DB >> 35359802

Mathematical models to study the biology of pathogens and the infectious diseases they cause.

Joao B Xavier¹, Jonathan M Monk², Saugat Poudel², Charles J Norsigian², Anand V Sastry², Chen Liao¹, Jose Bento³, Marc A Suchard⁴, Mario L Arrieta-Ortiz⁵, Eliza J R Peterson⁵, Nitin S Baliga⁵, Thomas Stoeger^6,7, Felicia Ruffin⁸, Reese A K Richardson^6,7, Catherine A Gao^7,9, Thomas D Horvath^10,11, Anthony M Haag^10,11, Qinglong Wu^10,11, Tor Savidge^10,11, Michael R Yeaman¹².

Abstract

Mathematical models have many applications in infectious diseases: epidemiologists use them to forecast outbreaks and design containment strategies; systems biologists use them to study complex processes sustaining pathogens, from the metabolic networks empowering microbial cells to ecological networks in the microbiome that protects its host. Here, we (1) review important models relevant to infectious diseases, (2) draw parallels among models ranging widely in scale. We end by discussing a minimal set of information for a model to promote its use by others and to enable predictions that help us better fight pathogens and the diseases they cause.

Entities: Chemical

Keywords: Computer modeling; Infection control in health technology; Microbiology

Year: 2022 PMID： 35359802 PMCID： PMC8961237 DOI： 10.1016/j.isci.2022.104079

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

How does an outbreak spread in a population of people that interact frequently? How does a pathogen grow in a new host using the nutrients available in the infected tissue to make the energy, the proteins, and all the building blocks it needs to replicate? How does the gut microbiome protect its host against invasion by an enteric pathogen? Why does a drug that kills a pathogen in vitro fail to cure an infected patient? These questions seem widely different, but they share commonalities: Answering them is key to fighting infectious diseases. Infection results from a combination of processes that range widely in scale: from the biological molecules involved in the intracellular metabolism of pathogens to the interactions between individual people who may host and spread a virus across a population (Figure 1). Despite the wide range of scales, in each of these processes the behavior of the system results from the interactions between many elements. Understanding how each element works in isolation is important. But that knowledge alone can be insufficient if the behavior of the full system is an emergent property of the interactions among its elements (Casadevall et al., 2011). Mathematical modeling provides a way to quantify interactions among elements and dissect their relative contribution to infection dynamics.

Figure 1

Mathematical models in infectious diseases study the interaction between elements defined at a range of scales, from the molecules inside a pathogen’s cell all the way to the people infected by pathogen as the disease spreads across a population Mathematical models can offer insights by switching the focus away from the elements—metabolites, bacteria, viruses, people, etc.—to the system of interactive elements. In many cases, the elements become nodes in a network, and their interactions become edges. We can then ask how the network structure—how the nodes connect to each other to create positive and negative feedback—determines how the system functions. The structure of the network can explain functions of the system, such as robustness to changes in individual elements (Bhalla and Iyengar, 1999), which are emergent properties. This is key to identifying vulnerabilities that could be targeted by novel therapies, such as an enzyme essential for the pathogen that could be precisely targeted to stop its replication. To illustrate how network structure can determine function, consider the bacterial chemotaxis system. This system is essential for the fitness of bacterial pathogens because it helps the pathogen cell move toward nutrients. But the chemotaxis system of a bacterium like Escherichia coli is difficult to understand by investigating each protein involved in the signal transduction that makes the system in isolation. A key function of the chemotaxis is its robustness: its ability to adapt. When a bacterium swims toward a nutrient source, the nutrient concentration increases. The signal transduction system must adapt its ability to continue sensing relative changes in the nutrient concentration even though the absolute levels in the nutrient concentration have increased. A system lacking robustness would be overwhelmed by higher levels and the pathogen cell would stop swimming toward the nutrient source. Mathematical modeling played a key role in understanding how robustness emerges from the network of interacting proteins: The scientists used parallel computer simulations to determine how small changes in individual proteins—such as alterations in enzyme kinetics caused by point mutations—altered the system’s ability to adapt to higher nutrient levels. The simulations showed that rather than being a function of any single protein, robustness is an emergent property of the system that results from the network’s connectivity. The ability of E. coli’s chemotaxis system to adapt gradually to higher nutrient concentrations is robust to small changes in any of the network’s elements (Barkai and Leibler, 1997). Fighting infectious diseases often means finding weak spots in a system that supports pathogen life and causes its collapse. Examples include drugs designed to block key elements in a biochemical network, such as a key protein in the bacterial chemotaxis network, or population lockdowns imposed to reduce person-to-person transmission and mitigate a pandemic (Schlosser et al., 2020). Any perturbation should consider the system’s properties; otherwise, it can have unwanted results. Designing a combination of antibiotics to treat a bacterial infection, for example, must consider unwanted consequences of simultaneously impacting two distinct processes in bacterial physiology. The antibiotic spiramycin inhibits protein synthesis, but the antibiotic trimethoprim inhibits DNA synthesis. Combining the two would seem like a good idea: it would impact two key bacterial processes simultaneously. In reality, combining the two drugs has an unwanted consequence: they suppress each other, and the pathogen grows better relative to separate antibiotic administration (Chait et al., 2007). Understanding the system governing bacterial growth helps make sense of this unexpected consequence. Bacterial growth requires a certain balance. Inhibiting DNA synthesis with spiramycin creates an imbalance—protein synthesis outpaces DNA synthesis level—and impacts growth. Spiramycin slows protein synthesis which creates the reverse imbalance. Combining the two corrects the imbalance: The drug combination, by restoring system balance, is antagonistic (Bollenbach et al., 2009). This principle of system balance, exemplified by DNA versus protein synthesis, can be grasped qualitatively. But mathematical modeling can lead to quantitative insights, and enable precision interventions that would otherwise be inaccessible. For example, computer models allow in silico systematic analysis of synergistic and antagonistic interactions between drugs (Yeh et al., 2009). The models can be used to study drug combinations in situations where treatment efficacy must be balanced with a high risk of resistance (Torella et al., 2010). Beyond designing drug interventions, mathematical models can help explain counterintuitive observations at different scales, such as why probiotics impair the recovery of the gut microbiota after antibiotic treatment (Suez et al., 2018), or why some public health interventions limit the spread of a pandemic better than others (Brauner et al., 2021). It is hard to find common features among all the diverse models used for infectious diseases. It was interesting to find that several models use mass-action kinetics as their central assumption, despite their widely different scales. We conclude by suggesting practices that we acquired from our experiences of publishing mathematical models that facilitate their use by other researchers and further our understanding of infectious diseases.

Models relevant to infectious diseases

The SIR model: modeling disease spread

The SIR model is perhaps the first that comes to mind when thinking about infectious disease. This compartmental model describes how a disease spreads across a population. In its deterministic form, the SIR model is a system of three coupled ordinary differential equations (Kermack and McKendrick, 1927). Each equation describes individuals in a different state: susceptible (S), infected (I), and resistant (R) (Figure 2). On their right-hand side, the equations have one or two terms of two kinds: sources (positive terms) and sinks (negative terms).

Figure 2

The SIR model is a simple mathematical model of epidemics

(A) In its classic form, the SIR model considers the individuals in three states: susceptible, infectious, and recovered.

(B) Individuals transition between the states and those state transitions model the processes of infection and recovery. Their dynamics assume mass-action kinetics.

(C) Despite its simplicity, the model can be used to study real-world scenarios such as outbreaks, and compare the outcome of interventions to slow an epidemic such as social distancing. A variation of the model that includes an additional state (“immunized”) and a new process of immunization that converts individuals to a new state could be used to predict how a vaccination campaign can contribute to stopping the spread.

(D) The classical assumption of mass action assumes that encounters are random in the population.

(E) Expansions of the SIR model include spatial structure and other refinements to increase the realism of the model and the accuracy of the predictions.

The SIR model is a simple mathematical model of epidemics (A) In its classic form, the SIR model considers the individuals in three states: susceptible, infectious, and recovered. (B) Individuals transition between the states and those state transitions model the processes of infection and recovery. Their dynamics assume mass-action kinetics. (C) Despite its simplicity, the model can be used to study real-world scenarios such as outbreaks, and compare the outcome of interventions to slow an epidemic such as social distancing. A variation of the model that includes an additional state (“immunized”) and a new process of immunization that converts individuals to a new state could be used to predict how a vaccination campaign can contribute to stopping the spread. (D) The classical assumption of mass action assumes that encounters are random in the population. (E) Expansions of the SIR model include spatial structure and other refinements to increase the realism of the model and the accuracy of the predictions. The SIR model assumes mass-action kinetics. In this case, the elements are individual hosts, and the kinetics quantifies host transition between states. The terms for the infection rate () and the recovery rate () are assumed to follow kinetics similar to those used to model chemical reactions, where the reaction rate is proportional to the concentrations of its reagents. This means that susceptible individuals become infected at a rate proportional to the number of infected individuals, and that infected individuals recover at a constant rate. The mass-action assumption is valid for reactions that occur when reagent molecules collide randomly because the rate of random collisions increases proportionally to the concentrations of the reagents. People do not interact randomly in a population as molecules do in a chemical flask, but the SIR model is still useful despite the simplification. SIR can compute infection curves during periods when an epidemic spreads through a population undisturbed, as well as compare scenarios when the rate of encounters changes (Schlosser et al., 2020). The assumptions may break down when measures imposed for mitigation or seasonal changes in population activity affect the likelihood of certain encounters. For example, a lockdown during a pandemic could decrease the number of encounters between the population at large, but increase the rate of select encounters, such as between healthcare and other essential workers or between individuals confined in the same house. The model assumptions might need to change to forecast the success of proposed confinement strategies accurately. Some versions of the SIR expand on the classic version (Equation 1) to account for social networks, geography, and other constraints (Keeling and Danon, 2009). Some additions include stochasticity (Tornatore et al., 2005), spatial structure (Keeling, 1999), heterogeneity within the population such as different age groups (Franceschetti and Pugliese, 2008), and vaccination strategies (Shulgin et al., 1998). To forecast demand for healthcare resources, the SIR model may also be expanded to account for symptomatology, disease surveillance, hospitalization, and mortality (Armstrong et al., 2021; Wong et al., 2020). Agent-based simulations, which describe the behaviors of individuals moving geographically and contacting each other using rules that mimic the behaviors of people in the real world, can be used to study more complex scenarios (Carley et al., 2006).

Modeling intracellular metabolism to study pathogen virulence

The virulence of a pathogen—its ability to thrive in a living host and cause infection—depends on the metabolic abilities of the pathogen (Brown et al., 2008). The host tissues provide the growth media, but it is the system of metabolic reactions encoded by each pathogen’s genome that determines whether the pathogenic cell can grow on that media, by making all the building blocks needed for new copies of itself (Fuchs et al., 2012). Genome-scale network reconstructions unify knowledge about a specific organism from a wide array of sources— its genome sequence, assays that determine which metabolites it can grow on, its transcriptome measured in different conditions, and more—to show how the molecules in the network work together to enable pathogen proliferation (Sertbas and Ulgen, 2020). Genome-scale metabolic networks can be represented as mathematical expressions and then used to compute metabolic fluxes (Figure 3). One way to make these computations is by flux balance analysis (FBA), a simplification that assumes the system is in balanced growth (Orth et al., 2010). FBA simplifies a system of coupled ordinary differential equations—where each differential equation describes one biochemical reaction—into a system of algebraic equations. But balanced growth in the system still has too many equations to produce a unique solution. Additional assumptions can be added as upper or lower bound constraints on reactions such as rates of nutrient consumption measured experimentally. FBA tends to use an objective function that maximizes the rate of biomass production. This assumption is valid, for example, when the bacteria were selected to grow as fast as possible.

Figure 3

Genome-scale models provide a way to study how a pathogen can thrive in different environments, for example by using the nutrients available in the various organs of its human host

(A) A model describes the intracellular metabolism of the pathogen as a network of metabolites connected by biochemical reactions.

(B) Each reaction has a stoichiometry and a rate.

(C) Flux balance analysis (FBA) assumes that the pathogen is in balanced growth, which means that the concentration of each intracellular metabolite is in a steady state and its time derivative equals 0. Balanced growth is a simplifying assumption that turns a system of differential equations into a system of algebraic linear equations. The system is then solved using FBA by adding experimental constraints and an objective function, such as the biomass equation.

(D) A high-quality genome-scale network can describe a reference strain with a well annotated genome, and be used to predict its growth phenotypes.

(E) The model of the reference strain may also be customized to model other closely related strains. A set of network models can be used to compare the strains' abilities to use different nutrients.

Genome-scale models provide a way to study how a pathogen can thrive in different environments, for example by using the nutrients available in the various organs of its human host (A) A model describes the intracellular metabolism of the pathogen as a network of metabolites connected by biochemical reactions. (B) Each reaction has a stoichiometry and a rate. (C) Flux balance analysis (FBA) assumes that the pathogen is in balanced growth, which means that the concentration of each intracellular metabolite is in a steady state and its time derivative equals 0. Balanced growth is a simplifying assumption that turns a system of differential equations into a system of algebraic linear equations. The system is then solved using FBA by adding experimental constraints and an objective function, such as the biomass equation. (D) A high-quality genome-scale network can describe a reference strain with a well annotated genome, and be used to predict its growth phenotypes. (E) The model of the reference strain may also be customized to model other closely related strains. A set of network models can be used to compare the strains' abilities to use different nutrients. A high-quality model requires a well-annotated genome. Recent advances in the field use any given bacterial genome sequence and produce a draft genome-scale reconstruction network for that organism automatically (Mendoza et al., 2019). A network produced automatically is often incomplete, even when the genome is well annotated but even more so when the genome belongs to a poorly studied organism with many genes of unknown function. Some methods make additional assumptions to gap-fill incomplete pathways. Manual curation and validation can take weeks or months of work by trained researchers (Norsigian et al., 2020b). Fortunately, the process tends to speed up exponentially as the number of high-quality genome-scale networks increases. The BiGG Models knowledge base centralizes high-quality genome-scale metabolic models. The website allows users to browse and search models. This makes the process of adapting a high-quality reference network to a variant that differs in a few metabolic genes relatively easier than starting a model from scratch for a poorly studied organism that is only distantly related to existing models. The approach of adapting a high-quality reference model has been used to quickly generate multi-strain models and predict metabolic differences among a range of new variants. A recent update to the BiGG database includes multi-strain models (Norsigian et al., 2020c). A key part of creating a high-quality whole genome-scale network model is its validation and refinement of new experiments. This strategy was used recently for the pathogen Clostridoides difficile, an intestinal pathogen resilient to many common antibiotics, which causes severe intestinal infection in humans and can lead to medical emergencies such as pseudomembranous colitis, toxic megacolon, and perforation (Czepiel et al., 2019; Farooq et al., 2015). C. difficile metabolism is crucial to understanding its outbreaks. Two epidemic strains of this pathogen (RT027 and RT078) acquired distinct ways to grow on low concentrations of trehalose. The two epidemic lineages emerged after trehalose entered the human diet, which could have been selected for the ability to grow on the threshold and may have helped their emergence (Collins et al., 2018). A genome-scale network model of C. difficile built on two previous versions was further refined and curated with experimental data (Norsigian et al., 2020a). The new data included BIOLOG phenotype microarrays, which are multi-well plates pre-loaded with many nutrients such as different sugars. BIOLOG data are commonly reported as a binary table: the bacteria either grow or do not on a certain nutrient. The genome-scale model can predict if the bacteria grows on that nutrient or not, and this binary prediction is compared to the BIOLOG data. When a prediction disagrees with the data, the model can be refined. For example, if there is a false negative—the model predicts no growth but the data show the pathogen can indeed grow—that could mean that a gene encoding for a transporter of that nutrient is mis-annotated. A literature search and an investigation of transporters in related organisms may reveal the transporter that had been annotated incorrectly. Non-targeted proteomics can be combined with BIOLOG microbial phenotyping microarray-based experiments to detail the metabolic pathways necessary for a gut bacterium such as Bifidobacterium dentium to metabolize dietary sugars typical of the human diet (Engevik et al., 2021a).

Modeling gene regulation networks to understand pathogen adaptive strategies

Each microbe has only a fixed number of genes and a limited number of regulators. But the connections between those genes and their regulators form gene regulatory networks. Gene regulation networks enable microbes to turn genes on and off in practically infinite ways. Natural selection acts on network variations introduced by horizontally acquired genes, gene duplications, gene deletions, and gene mutations, and favors regulatory circuitry that controls gene expression to optimize fitness (Brooks et al., 2011; Yan et al., 2017). Gene regulation networks enable pathogens to respond to different stimuli with dynamic behaviors (eg, sporulation (de Hoon et al., 2010) and dormancy (Peterson et al., 2020)) while still using a relatively small number of components. Mathematical modeling requires transcriptional data acquired across multiple conditions and perturbations—different growth media, in isogenic mutants, etc. The models describe the influence of transcription factors (TFs) on individual target genes or clusters of co-regulated genes and uses machine learning methods such as linear regression (Bonneau et al., 2006) and decision tree ensembles (Huynh-Thu et al., 2010), among others. In the case of gene clusters, the network reconstruction is achieved in essentially two steps. The first step clusters the genes into sets of co-regulated genes; the clustering algorithm is often semi-supervised and can integrate prior knowledge from the literature (Reiss et al., 2006). The second step models the expression of the co-regulated gene sets as a combination of TFs and environmental factors that influence the cluster (Bonneau et al., 2006). Interactions between TFs and gene clusters are inferred based on the mRNA profiles of gene clusters and TFs. Recent work has shown the advantages of estimating the regulatory activity of TFs and post-transcriptional regulators by leveraging the expression profile of known and putative targets when reconstructing gene regulation networks (Arrieta-Ortiz et al., 2015; Arrieta-Ortiz et al., 2020, 2021; Fu et al., 2011). A recent improvement organizes gene regulatory elements—the DNA sequences within each gene promoter that TFs specifically bind to—of every gene using their distributions across the entire genome to explain what TFs bind the sequences, in what contexts they are bound, and the consequence of TF binding on activating or repressing transcription of downstream genes (Brooks et al., 2014). This approach reveals environmental context whereby genes from different regulons can be co-regulated and special situations where genes of the same regulon, or even the same operon, can be conditionally and differentially expressed. These approaches have been recently applied to epidemiologically important pathogens such as Mycobacterium tuberculosis (Peterson et al., 2021) and Clostridoides difficile (Arrieta-Ortiz et al., 2021). An alternative approach to infer gene regulatory networks is by decomposition methods such as independent component analysis (ICA) (Sastry et al., 2019). ICA extracts regulatory signals underlying transcriptomic data and yields both the composition of the transcriptional regulatory network (TRN) (termed “iModulons”) and the activity of the iModulons in the samples used to build the model. The iModulons represent a set of genes whose expression levels vary with each other, but are independent of all other genes in the genome. Their activities represent the collective expression level of the iModulon genes in a given sample. Therefore, both the static structure and the condition-specific dynamics of the transcriptional regulation network can be inferred simultaneously. A number of iModulons for E. coli, Bacillus subtilis, and Staphylococcus aureus have already been inferred and methods now exist to expand these to other bacteria by leveraging the rapidly growing transcriptomic profiles in public repositories (Poudel et al., 2020; Rychel et al., 2020, 2021; Sastry et al., 2021). Models of gene regulation networks have been expanded to include the networks of small non-coding RNAs (sRNAs) and discover new sRNA targets and interactions between sRNAs and TFs that were then experimentally validated (Modi et al., 2011). Other models integrate gene regulation networks with metabolic networks to show how information flows between the different types of networks (Chandrasekaran and Price, 2010; Covert et al., 2001; Immanuel et al., 2021). Gene regulatory network models integrated with metabolism are immensely useful in interpreting the causal, mechanistic, and physiological drivers of pathogen response to host-relevant stresses and drug treatment. In so doing, the models are useful to uncover mechanisms of drug action at a systems level and also discover novel vulnerabilities that can be exploited as new drug targets, or means to identify synergistic drug combinations (Immanuel et al., 2021; Peterson et al., 2016, 2020). Gene regulation networks, and even of integrated regulatory-metabolic models, can be expanded to model the coordinated behavior of more than one organism, for example, host-pathogen interactions to predict the outcomes of infection. Those models benefit from experimental resources such as transcriptional regulator deletion mutants from the fungal pathogen Candida glabrata, which revealed genetic and epigenetic pathways involved in stress resistance and virulence (Filler et al., 2021). RNA sequencing can also be used to study the transcriptome of the infected hosts and the pathogens simultaneously (Peterson et al., 2019; Westermann et al., 2017). This type of experimental work can provide new data for network models that combine host and pathogen gene regulation networks to understand interactions between host and pathogen (Schulze et al., 2016). Similarly, integrated regulatory and metabolic models are also useful to investigate complex host and microbiome influences on the success of a pathogen to infect and colonize. The models can be used to formulate nutritional and probiotic interventions to prevent and treat infections by pathogens such as C. difficile (Arrieta-Ortiz et al., 2021; Girinathan et al., 2020). Gene regulation models can also explain phenotypic heterogeneity, a strategy that helps pathogens escape host attack and drug treatments. The models can compute growth characteristics, such as the growth rate and the carrying capacity, as well as drug susceptibility of different phenotypic states that emerge within the same monoclonal population. Integrating regulatory and metabolic processes models with quantitative systems pharmacology models can predict the clearance rate of a heterogenous pathogen population in complex contexts such as granuloma formed by tuberculosis (Azer et al., 2021).

Bringing models to a clinical setting

Predicting the susceptibility of a pathogen to a drug is essential to guide clinical intervention. This is traditionally done using in vitro assays in artificial media. But the sensitivity of these assays depends on the realism of the artificial conditions used: susceptibility in situ is often quite different, and a drug that works in vitro can easily fail to cure a patient. Mathematical models can help fill gaps between laboratory experiments and clinical intervention. Antibiotic-persistent bacteremia caused by methicillin-resistant Staphylococcus aureus (MRSA) exemplifies this problem well. Often, the antibiotic minimum inhibitory or bactericidal concentrations determined in the laboratory indicate that an S. aureus infection is susceptible in vitro, but then the infection persists despite treatment in vivo (Lewis, 2020; Li et al., 2019). There are several reasons for the disagreement, including the emergence of new antimicrobial resistance, complex phenomena such as heteroresistance, tolerance, and persistence (Balaban et al., 2019). Despite distinct mechanisms, they have the same impact: each lowers the efficacy of the antimicrobial therapy and its ability to treat the infection. Physicochemical parameters such as pH, salt, temperature, oxygen, carbon dioxide, and other co-factors such as ionic strength of the biofluid or tissue compartment can also impact the susceptibility of microbes to antibiotic therapy (Goode et al., 2021). Most in vitro assays lack host immune effectors such as kinocidins, defensins, or others that influence outcomes in situ (Sakoulas et al., 2014; Yeaman et al., 2002; Yount et al., 2011). Cells of the host immune system, such as neutrophils, macrophages, and other professional defense cells, interact with and can be influenced by interactions with antimicrobial agents (Algorri and Wong-Beringer, 2020). Not all immune system interactions with antimicrobial agents are additive or synergistic (Yang et al., 2017). Some antibiotics can have off-target effects on host immune cells and impair their trafficking, opsonophagocytic, or intracellular killing functions. Also, how the immune system defends against infection varies among anatomical locations in the body. For example, the mechanisms of host defense are very different on relatively inert epithelial or cutaneous surfaces such as skin compared with the central nervous system (Lee et al., 2021). The human body represents a diverse array of anatomic, physiologic, and microbiologic ecosystems that influence the impact of the antimicrobial agent on the target pathogen. These different habitats may influence the pharmacodynamics and pharmacokinetics of antimicrobial agents, including access and activity. Over and above these factors, infections are dynamic systems. The contexts in which microbes and antimicrobial agents interact change over the course of the infection as the pathogen and the host act and counteract in relation to one another and complicate the use of models to predict the outcome of a drug intervention. Predicting the outcome of antimicrobial therapy in vivo resembles solving a multidimensional equation involving the target pathogen, antimicrobial agent, immune effectors, the context of the host (including pharmacology), and time. Mathematical modeling to study the outcome of anti-infective therapy in vivo has proven challenging, but recent work has led to important progress in the field. A recent computational tool integrates host immunity and antibiotic dynamics to investigate treatment outcomes in tuberculosis (Pienaar et al., 2015). These findings suggested that unforeseen antibiotic gradients and hypoxic microenvironments exist within granulomas that enable mycobacterial subpopulations to evade susceptibility. Pharmacodynamic modeling has also emerged as a tool to understand static versus dynamic aspects of infection. A model of bacterial phenotypes compared susceptible vegetative cells, resting cells, constitutively resistant cells, and adaptively resistant cells (Jacobs et al., 2016). Importantly, the model indicated that the significant differences between static and dynamic contexts can explain uncertainties in therapy outcomes. A “P-system” modeling paradigm was used to study the real-time evolution of antimicrobial resistance (Campos et al., 2019). This work pointed to the complex and embedded parameters that intersect to determine net susceptibility or non-susceptibility of a pathogen to antimicrobial therapy across orders of magnitude—from resistance plasmid to microbial community. Another recent mathematical model explored the dynamics of MRSA persistence in the face of host immunity and typical antibiotic regimens (Mikkaichi et al., 2019). Using an ensemble approach, computational models integrating selected parameter sets were generated under conditions simulating vancomycin therapy as used in clinical practice. Models distinguishing persistent versus resolving MRSA bacteremia outcomes were identified. Next, machine learning was implemented to identify specific parameters most contributing to either persistence or resolution. Parameters that correlated with persistent outcomes included bacterial growth rate, e.g. low-energy, slow growth rate associated with small colony variant phenotypes (Bates et al., 2003; Manuse et al., 2021), and immune clearance, e.g. reduced molecular and cellular immune effector efficacy. Lastly, specific pharmacological strategies were modeled to investigate interventions that may avoid persistence outcomes in MRSA bacteremia. Results predicted that microbicidal agents effective against persistent cells, but not agents that prevent the emergence of persistent phenotypes, were more likely to have efficacy in persistent MRSA bacteremia. These examples illustrate the recent efforts to model antimicrobial therapy in vivo. These approaches are becoming more sophisticated, integrating multiple variables and dynamic processes relevant to the in vivo setting that were traditionally missing in conventional antimicrobial susceptibility assays in vitro. Challenges remain, but the models of complex interaction that accurately describe and predict efficacy versus failure of antimicrobial regimens in clinical infection would result in a quantum leap toward improving therapeutic success, reduce antimicrobial resistance, and identify novel anti-infective agents and strategies (Talebi Bezmin Abadi et al., 2019).

Modeling ecology and evolution to understand virulence

Pathogens rarely exist in isolation. In their environmental reservoirs, they encounter many types of resident microbes and when they colonize the host they encounter the host’s microbiome. Pathogens are constantly interacting with other microbes within an ecosystem: this ecology is another crucial aspect of infectious disease. The human microbiome is a collection of microbes that naturally colonize the human body. Some of these microorganisms make up the first line of our defense against pathogen invasion (Becattini et al., 2017; Buffie et al., 2012). In the gut microbiome, the densest of our microbiomes, bacteria naturally keep pathogens out by competing for nutrients (Oliveira et al., 2020), producing molecules that inhibit or kill pathogen cells (Kim et al., 2019), or by training and stimulating the human immune system (Schluter et al., 2020). High-throughput sequencing can help study which microbes are present and how abundant they are (Jovel et al., 2016). Longitudinal microbiome sequencing of samples taken from the same person over time reveals how the community responds to perturbations such as changes in diet (David et al., 2014), antibiotic therapy (Dethlefsen and Relman, 2011), and immune therapies (Schluter et al., 2020). The community response allows us to parameterize mathematical models of the ecosystem network (Bucci et al., 2016; Buffie et al., 2015; Stein et al., 2013). Understanding the ecology governing the gut microbiome, and how it contributes to our defense against pathogens, can benefit from mathematical modeling. The gut microbiome can be described by systems of coupled differential equations that make mass-action assumptions. The Lotka-Volterra model was originally derived to describe predator-prey dynamics in animal ecosystems (Figure 4). This simple set of equations can be expanded to describe any number of microbial species interacting in a microbiome. Each differential equation in this system describes the dynamics of a microbe as a sum of terms. The model can be written in a matrix formwhere X is a vector of absolute abundances of all species at a certain time, μmax is a vector containing the maximum specific growth rates of each species, M is a matrix of interactions that represents the network, and ε is a vector containing the impact of a perturbation (for example, an antibiotic) on each species. The matrix can be parameterized using timeseries data using constraints derived from machine learning (Bucci et al., 2016; Stein et al., 2013).

Figure 4

Lotka-Volterra equations model ecosystem dynamics in a multispecies system like the gut microbiome

(A) The dynamics of predator and prey species in animal ecosystems inspired the Lotka-Volterra model, a system of two coupled ordinary differential equations that assumes that the law of mass action applies. The right-hand side of each differential equation has terms that represent the gains (sources) or losses (sinks) of each species.

(B) The Lotka-Volterra system can be generalized to any number of species (n).

(C) Network models inferred from timeseries data can be used to study the relationship between network structure and function in the gut microbiome. One application is in finding gut bacteria such as Clostridium scindens that inhibit the growth of pathogens like Clostridioides difficile.

Lotka-Volterra equations model ecosystem dynamics in a multispecies system like the gut microbiome (A) The dynamics of predator and prey species in animal ecosystems inspired the Lotka-Volterra model, a system of two coupled ordinary differential equations that assumes that the law of mass action applies. The right-hand side of each differential equation has terms that represent the gains (sources) or losses (sinks) of each species. (B) The Lotka-Volterra system can be generalized to any number of species (n). (C) Network models inferred from timeseries data can be used to study the relationship between network structure and function in the gut microbiome. One application is in finding gut bacteria such as Clostridium scindens that inhibit the growth of pathogens like Clostridioides difficile. This approach to model networks in microbiome ecology has applications in infectious diseases. A network model parameterized from mouse experiments identified a gut bacterium, Clostridium scindens, that is a key player in resisting colonization by the pathogen C. difficile (Buffie et al., 2015). The same approach can also include the fungi in the gut microbiome to describe multi-kingdom ecological interactions (Rao et al., 2021), and to include the host white blood cells to quantify how gut bacteria impact systemic immunity (Schluter et al., 2020). Identifying the molecular mechanisms of those ecological interactions requires approaches that go beyond population profiling. There are considerable efforts underway to develop metabolomic approaches that are capable of measuring microbially derived metabolites (i.e. bile acids/salts, amino acids and biogenic amines, short-chain fatty acids, and neurotransmitters) in a number of in vitro and in vivo models. Metabolomics measurements can be made on growth media collected from in vitro or ex vivo culture systems used to grow microbes or intestinal organoids (Engevik et al., 2021a, 2021b, 2021c; Horvath et al., 2020; Ihekweazu et al., 2021; Luck et al., 2021). There is an increasing number of protocols for metabolomics using extracts of cecal content and fecal samples, and homogenized gastrointestinal tract and brain tissues harvested from gnotobiotic mice (Engevik et al., 2021a, 2021b, 2021c; Horvath et al., 2020; Ihekweazu et al., 2021; Luck et al., 2021). Besides ecology, evolution too is important to understand processes important for infectious diseases such as the emergence of antibiotic resistance. High-throughput sequencing has inspired new models to describe antibiotic response evolution over the unobserved network that characterizes the relationships of pathogens to each other over time (Cybis et al., 2015; Mather et al., 2013). Modern phylogenetics and phylodynamics provide a framework to reconstruct these evolutionary relationships (Suchard et al., 2018). These approaches start with molecular sequence alignments from the pathogens of interest, assume continuous-time Markov chain models that describe how the sequences and antibiotic response mutate over the unobserved history, and solve the inverse problem of inferring this history and changes of antibiotic response from the data. Recent examples in infectious diseases incorporate further pathogen phenotypes and epidemiological models of the host population (Dellicour et al., 2018).

Making comparisons and drawing parallels between models

Network representation to facilitate system analysis

The dynamics of epidemics, the biochemical reactions occurring inside pathogen cells, and the species-specific interactions in the microbiome are all examples of processes in infectious diseases that can be mathematically modeled as networks. In many problems, there are often many networks that fit the data comparably well. The need to compare network models arises from different areas of science and technology—social sciences, economics, biology, computer science, telecommunications, transportation, and others. Some of those methods have been reviewed elsewhere (Tantardini et al., 2019) and an in-depth review is outside our scope. Instead, here, we discuss different methods to represent gene expression networks with examples of when one network representation or another works better in a specific situation. Most network models of gene regulation discretize the interaction between regulated genes and regulating factors (Karlebach and Shamir, 2012). For small networks, discretized networks actually outperform more sophisticated models (Garcia-Alonso et al., 2019). The size of a network, for example, is a simple feature that distinguishes networks that can only regulate specific tasks because—they tend to be small—from networks that can regulate cellular physiology more globally—they tend to be larger (Rocks et al., 2019). A discretized network model can infer whether a cellular system would be able to recover from an arbitrary perturbed state back to its initial state (Liu et al., 2011). Discretized networks could in principle be used to study whether regulators or regulated sites are under stronger evolutionary pressure, which would correlate with their functional relevance, similar to the way the covariation of amino acid residues across protein homologs informs of links that can reveal new protein structures (Ovchinnikov et al., 2017). Discretized networks can identify which elements of the network would be most vulnerable to perturbations (Greenwood et al., 2020). Gene regulatory networks that include the directionality and strength of gene-gene interaction can be used to study network motifs: patterns of wiring between multiple genes or regulators observed repeatedly across a network, and their structure encodes distinct functionality (Milo et al., 2002). Motifs have been used to compare transcriptional networks within individual organisms. Aside from network motifs, directionality also allows one to predict molecular processes that could become co-compartmentalized. A directed network that extends beyond gene regulators can be used to predict which microbial systems will contain distinct microbes that all specialize in different metabolic pathways (Perez-Garcia et al., 2016). The dynamic behavior of gene regulatory networks may also be compared. Without any information on the rate constants of the gene in the network genes, for example, we may not predict whether negative feedback will lead to oscillations (Elowitz and Leibler, 2000). Therefore, networks can be compared through their temporal dynamics or the relative dynamics of different edges within each network. However, to accurately describe the function of small gene regulatory networks, we need more parameters. For instance, varying temperature—and therefore kinetic constant of all proteins encoded by genes—can render an essential gene redundant even if neither the gene nor the network has changed (Ben-Aroya et al., 2008; Cassidy et al., 2019). Many transcript molecules in mammals and microbes are low abundance. Low abundance means that relative changes in the copy number of molecules will lead to noise (Paulsson, 2004). According to the internal confirmations of individual promoters, the conformations of regulatory factors, and the transition probabilities between all those, cells with the same kinetic rates will demonstrate cell-to-cell variability (Elowitz et al., 2002). Notably, interferons and other host defense genes contain promoter states that seem to ensure high levels of variability between single cells (Shalek et al., 2013), which may allow hosts to reduce the adaptation of pathogens (Avraham et al., 2015).

Minimum information to publish a mathematical model

We discussed diverse mathematical models applied to infectious diseases. These models illustrate the wide range of scales that mathematics can model: from molecular processes happening inside the cells of bacterial pathogens to the social interactions between people that can fuel an outbreak across a population. It also demonstrates the challenge of drawing parallels across those models. To conclude this overview, we would like to dedicate some time to a common challenge that many of us encountered as practitioners of mathematical modeling: how to share mathematical models with others in the scientific community. Models of the same class can define strict sets of rules. For example, whole-genome networks can adhere to the format of the BiGG framework. Adopting a common file format in the BiGG database ensures that every model can run in the same computational framework, such as the COBRA toolbox for Matlab or the CobraPy implementation in Python (Heirendt et al., 2019). The authors of this paper were encouraged by the US NIAID to find a way to share diverse models of infectious diseases among us. From our experience, we propose these steps: Preparing a slide deck or a white paper with a short presentation to the model. Setting up a code repository such as GitHub, with instructions on how to install and run the code, including any packages from third parties needed. Including a “Hello World!” type of walk-through demonstration, typically a Jupyter notebook or similar, to enable others to run a simple application of the code. A table of the relevant parameters used in the model, their definitions, the source for their set values, or other reasons for their set values (e.g., if they are based on previous characterizations). Define a format for reporting the results, or provide routines to convert the output into a uniform format. Omitting analysis files and variations in output format are a major challenge in reusing published mathematical models. From our shared experiences, these steps provide a set of minimum information needed to reproduce a model. These steps are abstract, but they are also general enough that they can in principle be applied to any type of model in infectious diseases and other problems in systems biology. The code repository and especially the walk-through code can go a long way in making sure others can understand and reuse the code. To illustrate these principles, we provide examples of publishing models adhering to this format (Table 1).

Table 1

Examples of models made available with code, parameters, and a guided example to facilitate reuse by other researchers

Date	Presenter	Center	Topic, slides, and notebooks
5/26/2020	Xiang Ji and Marc Suchard	CViSB	Viral Systems Biology Analysis using BEAST
3/24/2020	Anand Sastry and Saugat Poudel	UCSD	Independent Component Analysis
6/25/2019	Slim Fourati	FluOMICS	FluOmics Modeling
4/23/2019	Joao Xavier/Chen Liao	MSKCC	Inference of ecological networks from time series
3/26/2019	Mario Arrieta-Ortiz	ISB	Discovering drug targets via network analysis (notebook)
2/26/2019	Tsuyoshi Mikkaichi/Michael Yeaman/Alexander Hoffmann	UCLA	Ensemble modeling and cell heterogeneity
½2/2019	Jose Bento	BC	Comparing and aligning different networks
9/12/2018	Jonathan Monk/Charles Norsigian	UCSD	U01 Genome-Scale Modeling Workshop
4/22/2018	Jonathan Monk/Charles Norsigian	UCSD	Omics data integration with Genome-scale Models
3/27/2018	Jose Bento	BC	Phylogenetic tree inference methods
9/26/2017	Jonathan Monk/Yara Seif/Charles Norsigian	UCSD	Basics of Flux Balance Analysis

Examples of models made available with code, parameters, and a guided example to facilitate reuse by other researchers

Conclusion

Mathematical models can be useful to study diverse facets of infectious diseases. In this review, we have covered many topics to showcase this wide diversity: Epidemiology and dynamics, intracellular metabolic networks, dynamic interplay of pathogens and immune cells, transcriptional regulation, microbial communities, and more. We illustrated each theme with examples. But in many cases, our examples only covered a theme briefly and we could not give each theme the depth that it deserves. Still, we hope that showing the wide diversity of scales makes obvious a common challenge of mathematical modeling of infectious disease. It is a common adage of modeling that a model is only as good as its assumptions. The steps listed here to publish a model should help modelers make their assumptions transparent by providing their code, by listing the parameters used, and demonstrating the use of their models with simple examples. Adopting these practices can enable others to reuse our models toward gaining new insights on infectious diseases that would otherwise be inaccessible.

107 in total

1. Controllability of complex networks.

Authors: Yang-Yu Liu; Jean-Jacques Slotine; Albert-László Barabási
Journal: Nature Date: 2011-05-12 Impact factor: 49.962

2. Repressive Gene Regulation Synchronizes Development with Cellular Metabolism.

Authors: Justin J Cassidy; Sebastian M Bernasek; Rachael Bakker; Ritika Giri; Nicolás Peláez; Bryan Eder; Anna Bobrowska; Neda Bagheri; Luis A Nunes Amaral; Richard W Carthew
Journal: Cell Date: 2019-07-25 Impact factor: 41.582

3. Diet rapidly and reproducibly alters the human gut microbiome.

Authors: Lawrence A David; Corinne F Maurice; Rachel N Carmody; David B Gootenberg; Julie E Button; Benjamin E Wolfe; Alisha V Ling; A Sloan Devlin; Yug Varma; Michael A Fischbach; Sudha B Biddinger; Rachel J Dutton; Peter J Turnbaugh
Journal: Nature Date: 2013-12-11 Impact factor: 49.962

4. Dietary trehalose enhances virulence of epidemic Clostridium difficile.

Authors: J Collins; C Robinson; H Danhof; C W Knetsch; H C van Leeuwen; T D Lawley; J M Auchtung; R A Britton
Journal: Nature Date: 2018-01-03 Impact factor: 49.962

5. Path-seq identifies an essential mycolate remodeling program for mycobacterial host adaptation.

Authors: Eliza Jr Peterson; Rebeca Bailo; Alissa C Rothchild; Mario L Arrieta-Ortiz; Amardeep Kaur; Min Pan; Dat Mai; Abrar A Abidi; Charlotte Cooper; Alan Aderem; Apoorva Bhatt; Nitin S Baliga
Journal: Mol Syst Biol Date: 2019-03-04 Impact factor: 11.429

6. Identifying the measurements required to estimate rates of COVID-19 transmission, infection, and detection, using variational data assimilation.

Authors: Eve Armstrong; Manuela Runge; Jaline Gerardin
Journal: Infect Dis Model Date: 2020-11-02

7. Bacterial persisters are a stochastically formed subpopulation of low-energy cells.

Authors: Sylvie Manuse; Yue Shan; Silvia J Canas-Duarte; Somenath Bakshi; Wei-Sheng Sun; Hirotada Mori; Johan Paulsson; Kim Lewis
Journal: PLoS Biol Date: 2021-04-19 Impact factor: 8.029

8. An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network.

Authors: Mario L Arrieta-Ortiz; Christoph Hafemeister; Ashley Rose Bate; Timothy Chu; Alex Greenfield; Bentley Shuster; Samantha N Barry; Matthew Gallitto; Brian Liu; Thadeous Kacmarczyk; Francis Santoriello; Jie Chen; Christopher D A Rodrigues; Tsutomu Sato; David Z Rudner; Adam Driks; Richard Bonneau; Patrick Eichenberger
Journal: Mol Syst Biol Date: 2015-11-17 Impact factor: 11.429

9. Systems biology analysis of the Clostridioides difficile core-genome contextualizes microenvironmental evolutionary pressures leading to genotypic and phenotypic divergence.

Authors: Charles J Norsigian; Heather A Danhof; Colleen K Brand; Numan Oezguen; Firas S Midani; Bernhard O Palsson; Tor C Savidge; Robert A Britton; Jennifer K Spinler; Jonathan M Monk
Journal: NPJ Syst Biol Appl Date: 2020-10-20

10. The gut microbiota is associated with immune cell dynamics in humans.

Authors: Jonas Schluter; Jonathan U Peled; Bradford P Taylor; Kate A Markey; Melody Smith; Ying Taur; Rene Niehus; Anna Staffas; Anqi Dai; Emily Fontana; Luigi A Amoretti; Roberta J Wright; Sejal Morjaria; Maly Fenelus; Melissa S Pessin; Nelson J Chao; Meagan Lew; Lauren Bohannon; Amy Bush; Anthony D Sung; Tobias M Hohl; Miguel-Angel Perales; Marcel R M van den Brink; Joao B Xavier
Journal: Nature Date: 2020-11-25 Impact factor: 69.504