Joon-Yong Lee1, Natalie C Sadler1, Robert G Egbert1, Christopher R Anderton2, Kirsten S Hofmockel2,3, Janet K Jansson1, Hyun-Seob Song1,4,5. 1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA. 2. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA. 3. Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, USA. 4. Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA. 5. Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE, USA.
Abstract
Microbial communities organize into spatial patterns that are largely governed by interspecies interactions. This phenomenon is an important metric for understanding community functional dynamics, yet the use of spatial patterns for predicting microbial interactions is currently lacking. Here we propose supervised deep learning as a new tool for network inference. An agent-based model was used to simulate the spatiotemporal evolution of two interacting organisms under diverse growth and interaction scenarios, the data of which was subsequently used to train deep neural networks. For small-size domains (100 µm × 100 µm) over which interaction coefficients are assumed to be invariant, we obtained fairly accurate predictions, as indicated by an average R2 value of 0.84. In application to relatively larger domains (450 µm × 450 µm) where interaction coefficients are varying in space, deep learning models correctly predicted spatial distributions of interaction coefficients without any additional training. Lastly, we evaluated our model against real biological data obtained using Pseudomonas fluorescens and Escherichia coli co-cultures treated with polymeric chitin or N-acetylglucosamine, the hydrolysis product of chitin. While P. fluorescens can utilize both substrates for growth, E. coli lacked the ability to degrade chitin. Consistent with our expectations, our model predicted context-dependent interactions across two substrates, i.e., degrader-cheater relationship on chitin polymers and competition on monomers. The combined use of the agent-based model and machine learning algorithm successfully demonstrates how to infer microbial interactions from spatially distributed data, presenting itself as a useful tool for the analysis of more complex microbial community interactions.
Microbial communities organize into spatial patterns that are largely governed by interspecies interactions. This phenomenon is an important metric for understanding community functional dynamics, yet the use of spatial patterns for predicting microbial interactions is currently lacking. Here we propose supervised deep learning as a new tool for network inference. An agent-based model was used to simulate the spatiotemporal evolution of two interacting organisms under diverse growth and interaction scenarios, the data of which was subsequently used to train deep neural networks. For small-size domains (100 µm × 100 µm) over which interaction coefficients are assumed to be invariant, we obtained fairly accurate predictions, as indicated by an average R2 value of 0.84. In application to relatively larger domains (450 µm × 450 µm) where interaction coefficients are varying in space, deep learning models correctly predicted spatial distributions of interaction coefficients without any additional training. Lastly, we evaluated our model against real biological data obtained using Pseudomonas fluorescens and Escherichia coli co-cultures treated with polymeric chitin or N-acetylglucosamine, the hydrolysis product of chitin. While P. fluorescens can utilize both substrates for growth, E. coli lacked the ability to degrade chitin. Consistent with our expectations, our model predicted context-dependent interactions across two substrates, i.e., degrader-cheater relationship on chitin polymers and competition on monomers. The combined use of the agent-based model and machine learning algorithm successfully demonstrates how to infer microbial interactions from spatially distributed data, presenting itself as a useful tool for the analysis of more complex microbial community interactions.
Communities of soil microorganisms play a key role in controlling the global biogeochemical cycle. Soil microbial activities in soil include cycling of organic carbon compounds, which exert a significant impact on atmospheric CO2 concentrations [1], [2]. A mechanistic understanding of microbial interactions underpinning cycling of C and other nutrients remains a grand challenge due to the overwhelming compositional and functional complexity of the soil microbiome. Soil microbial communities dynamically shift membership, interactions, and functions across both spatial and temporal scales [3]. Thus, gaining a predictive understanding of microbial interactions in soil is critical for future efforts to control community dynamics and ecosystem function. However, the ability to predict microbial interactions in such a highly complex multi-dimensional space is currently lacking.Soil microorganisms self-assemble into specific spatial patterns that are governed both by the metabolic capacity of individual members (e.g. quorum sensing and antibiotic production), and by myriads of environmental variables [4], [5], [6]. Microbial spatial patterns formed at microscales are thus a result of the dynamic interplay among populations and can consequently serve as a key input for predicting interspecies interactions. Rapid advancement in experimental and instrumental technologies is enabling the generation of high-resolution and high-throughput images that enable visualization of the spatial distribution of microorganisms at the microscopic scale [7], [8], [9]. For example, image data from fluorescence microscopy provides clues to interspecies interactions by quantifying cell-to-cell spatial distances and cellular aggregation patterns [5]. In a previous study, imaging techniques were used to determine that inter-cell distances were shorter in surface soil and lengthened with depth [10], indicating relatively closer microbial interactions near the soil surface. Despite these advancements, spatial data per se do not provide direct information on microbial interactions.One promising approach to overcome this gap is computational network inference. Conventionally, this approach uses species abundance data as a primary input to predict interspecies interactions based on similarity metrics, through data-model fitting, or by setting up certain interaction rules [11], [12]. These methods have proven effective in inferring how species are interacting at a population level (e.g., [13], [14]) but have yet to be extended to incorporate spatial organizations of microorganisms. Here, we propose supervised deep learning as a new network inference tool. As a subfamily of machine learning, supervised learning trains an empirical model based on the known relationships between inputs and outputs and provides the most likely predictions of outputs for new inputs [15]. Artificial neural networks are frequently used for this purpose [16]. Neural networks are composed of hidden layers between input and output layers and get ‘deeper’ as the number of layers increases. Implementation of deep network models requires significant computational power, as well as a sufficiently large set of labeled data for training the network [15]. Due to these challenges, the use of a neural network for machine learning has previously been limited to structurally simple nets. However, various computational advances and breakthroughs over the last decade are now poised to enable overcoming the limitations associated with training, as exemplified by the use of deep learning that has come into the spotlight with unprecedented achievements in solving many historically challenging problems in diverse scientific fields [17], [18], [19], [20], [21]. In a recent study, the utility of deep learning models has also been extended to analyze bacterial colony structures for classification [22].Training deep neural networks directly from experimental microscopy image data to infer microbial interactions is currently infeasible due to unknown input–output relationships (i.e., inappropriate for supervision) and insufficient data size (i.e., scanty for deeper networks). To avoid such limitations, agent-based models can be used as a tool for data generation under diverse growth and interaction contexts. In contrast with traditional population-level ecological models that neglect the variance of growth properties in individual cells, agent-based models account for phenotypic heterogeneity in a population by viewing individual cells as autonomous entities [23]. Consequently agent-based models enable predicting the formation of microbial assembly patterns in space as a community’s higher-order property emerging from the interaction of individual cells with each other and with the environment. This high-fidelity simulation tool has been used for various applications, e.g., to examine how microbial growth on soil surfaces is localized in dry compared to wet conditions [24], [25].In this work, we present a new pipeline of network inference that synergistically combines agent-based models and deep learning techniques to predict microbial interactions from both in silico and actual image data (Fig. 1). Through case studies of two interacting organisms temporally co-evolving in a two-dimensional space, we demonstrate that our deep learning networks not only provide accurate predictions of microbial interactions, but also enable extracting local variations of interaction parameters in heterogeneous environments. In the subsequent application to the analysis of real microscopy images, deep learning models that were trained using in silico image data correctly predicted shifts in microbial interactions across different culture conditions. This approach has potential to significantly improve our understanding of how microorganisms colonize habitats and interact with each other in spatially heterogeneous environments such as soils.
Fig. 1
Deep learning pipeline to infer microbial interactions from microbial assembly patterns: (a) training deep learning networks using in silico images generated from agent-based models to predict interactions from new test (or unseen) datasets. aGR and aRB represent the effect of R (red) on the growth of G (green) and the effect of G on the growth of R, respectively; (b) prediction of spatial variation of interaction coefficients in real images using a sliding window method (left panel), final prediction determined by taking averages from an ensemble of best-performing deep learning models (middle panel), and interaction heatmaps generated by integrating neighboring sliding window estimations (right panel). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Deep learning pipeline to infer microbial interactions from microbial assembly patterns: (a) training deep learning networks using in silico images generated from agent-based models to predict interactions from new test (or unseen) datasets. aGR and aRB represent the effect of R (red) on the growth of G (green) and the effect of G on the growth of R, respectively; (b) prediction of spatial variation of interaction coefficients in real images using a sliding window method (left panel), final prediction determined by taking averages from an ensemble of best-performing deep learning models (middle panel), and interaction heatmaps generated by integrating neighboring sliding window estimations (right panel). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Materials and methods
Agent-based models for generating spatiotemporal patterns of interaction species
Growth rules
Agent-based models were designed to simulate two interacting microbial species (denoted by ‘G’ and ‘R’ hereafter) that were co-growing in a two-dimensional space. Simulations were performed in two different sizes of computational domains (i.e., 100 × 100 and 450 × 450 grids). We set the size of each grid to represent the physical dimension of 1 µm × 1 µm and we assumed that it can be occupied by only one microbial cell. Thus, each grid can take three states: occupancy by ‘G’, occupancy by ‘R’, or vacancy.As an initial condition, we randomly distributed ‘G’ and ‘R’ cells (50/50 ratio) to fill about 5% of the computational domain. The growth of individual cells was modeled based on the following equation [26], [27]:where r denotes the actual growth rate, r is the basal growth rate (i.e., the growth rate without the influence of neighbor species or in the absence of space limitation), φ and φ are fractional occupancies of species i and j in a defined domain of interaction, DOI (i.e., the area where species influence each other), represents the effect of species j on the growth of species i, where is the interaction coefficient (i.e., the effect of species j on the growth of species i), and the term accounts for the increase and decrease of the effect of the partner with the abundance of species j and species i. Therefore, ‘’ reflects the decrease of r due to increased intra- and inter-specific competition when the total occupancy in the interaction domain increases.
Simulation of assembly patterns using a single set of parameters
We performed 5000 Monte-Carlo simulations using the agent-based model in the computational domain with 100 × 100 grids. We assumed that the parameter values governing the growth and interaction dynamics of two species are homogeneously distributed in this small space. Thus, in each simulation, the model was assigned with a single set of parameters that were randomly sampled over the following ranges: 0 ≤ r ≤ 0.1, −1 ≤ a ≤ 1, 0 ≤ χ ≤ 1, and 3 × 3 grids ≤ DOI (the size of interaction domain) ≤ 11 × 11 grids. These parameter ranges reflect the conditions commonly considered in the literature [27]. We applied periodic boundary conditions on the four sides of the computational domain. The datasets obtained from this setting are used as the primary source to develop neural network models (Section 3.2).
Simulation of assembly patterns using a composite set of parameters
We extended the agent-based model simulation to the computational domain with 450 × 450 grids. For this relatively larger domain, we released the previous assumption of homogeneous parameter distribution. As illustrated in Supplementary Fig. S1, we divided the computational domain into five subdomains, each of which is governed by a distinct set of parameters. The way we assign parameter values in each subdomain is the same as that described in the previous section. These datasets were used for further testing neural network models in heterogeneous environments (Section 3.3).
Dynamic stochastic simulation of microbial growth
Cellular growth in space is often described as a stochastic process. To reflect this nondeterministic aspect of growth, we ran dynamic simulations of the agent-based model using the Gillespie’s stochastic simulation algorithm (SSA) [28], [29]. In each time instance, the SSA algorithm independently determines the following two random variables: 1) the time until the next cell growth occurs (a continuous random variable), and 2) the index of the cell to grow (a discrete random variable). For the details of implementing the SSA, we refer the reader to a review article by Higham [30].
Development of deep artificial neural net models
Neural network architectures
We employed plain convolutional neural networks (CNNs) [31], [32] and residual networks (ResNets) [33], the effectiveness of which has been demonstrated in various fields, especially in image processing. Similar to Visual Geometry Group (VGG) networks [31], our plain CNN networks consist of a stack of convolution layers, followed by fully connected layers. Specifically, we considered 3 to 6 convolution layers and 2 to 3 fully connected layers so that we tested for eight plain CNN structures with different depths (Supplementary Table S1(a) and Supplementary Fig. S2(a)). We also tested the effect of the width of the convolution layers, which started from 8 or 16 in the first layer and then increased by a factor of 2 after each pooling layer, until it reached 32 or 64. For the pooling layer, we applied the 2D average pooling and max pooling in each network configuration.For deeper network architectures, we tested the ResNets, where the identity shortcut connections (or residual connections) are simply inserted in the plain networks (see Supplementary Table S1(b) and Supplementary Fig. S2(b)). We adapted the ResNet-18 and ResNet-34 structures [33] to fit our data. We tested the basic building blocks of two 3 × 3 convolution layers and the bottleneck blocks of three convolution layers of 1 × 1, 3 × 3, and 1 × 1 as well. In addition, we compared the original residual networks with the pre-activation residual networks introduced in [34], which improved accuracy and reduced overfitting by using identity shortcut connections and identity after-addition activation (Supplementary Fig. S2(c)). For convolutional layers, the rectified linear unit (ReLU) activation function [35] and batch normalization (BN) [36] were adopted in each convolution to improve the trainability and stability of deep neural networks. BN is prevalent for deeper models since it allows higher learning rates and less careful initialization by employing a way to normalize activations in intermediate layers. Details of the algorithms can be found in these papers [31], [33], [34].During training, the input to the convolution layers was set with a fixed-size (e.g., 100 × 100) ternary image with a single channel that is generated from the agent-based model. We used 1, −1, 0 to denote to denote occupancy by ‘G’, occupancy by ‘R’, and vacancy, respectively.To infer a set of interaction coefficient values, we put a linear layer with the same number of output nodes at the end layer. In our case study of two species, we set the number of nodes for the final linear layer to be two to predict asymmetric binary interaction coefficients.
Hyperparameters
We performed hyperparameter optimization to search for the best neural net models. We considered the following parameters as a set of hyperparameters to optimize: configurations of convolution layers and fully-connected layers (see Supplementary Table S1), mini-batch sizes, learning rates, optimization algorithms, dropout probabilities, and L2 regularization coefficients (see Supplementary Table S2 for details). Instead of performing a time-consuming exhaustive search in such a large parameter space, we pre-determined 300 sets of hyperparameters through random selection and performed random grid search.Out of 35,000 in silico images (=5,000 simulation conditions × 7 time points) obtained from agent-based model simulations, we set aside the 20% of the data for testing and used 80% of the data for developing deep neural network models, which were in turn split into five groups for k-fold cross validation [37]. In each training, the best hyperparameter set was determined based on the minimum average validation loss for the validation dataset.
Model training and early stopping
We trained deep neural networks following the standard supervised training procedure [31]. Using the commonly used mini-batch algorithms: mini-batch gradient descent (GD) with momentum and Adam [38], [39], we determined parameters in deep learning networks such that the following mean squared error (MSE) is minimized:where the and denote the known (or target) and predicted interaction coefficients, respectively, and n is the number of batch images.To avoid overfitting, we used the validation loss improvements for early stopping [40]. We set the patience to be 20 during 200 epochs. That is we stopped the training if the performance against the validation set did not show any improvement after twenty consecutive epochs in a maximum of 200 epochs. For initialization of the neural network weights, we used the same strategies as the one in [33], which helps to avoid learning stagnation, a common problem encountered in training deep networks.
Performance metric for model inference
To measure the model performance, we took the following couple of commonly used metrics: root mean squared error (RMSE) and R2 (coefficient of determination), i.e.,where denotes the average of the true s.
Sliding window estimator to predict interactions in larger and irregular sized images
In order to extend the pre-trained models to image data with larger size than the input data (used for training), we adopt the sliding window method, which has been widely used for object detection in images [41], [42]. A basic idea is to scan the larger size image using the smaller-size square window that slides with a fixed step size. The size of window is the same as that of the images used for training. While any integer value can be considered for the step size, we recommend setting it as a factor of the input size (i.e., window size) for convenient interpretation. The numbers of sliding windows across rows and columns ( and ) are as follows:where , w, h, and denote the step size, image width and height, and input size (=window size), respectively. The input size is divisible by the step size . The % represents the integer divide operator that divides two numbers and returns the integer part of the result. As the step size gets smaller, the computational cost to analyze a test image should get higher due to the increase of and . For simplicity, we discarded the residual part in the right and bottom edges in test images.As the sub-image in each window was set to fit the model input, prediction of interaction values for each window is straightforward. With the step size typically smaller than the window size, grid blocks will be generated by overlapping multiple windows. Thus, the values of interaction coefficients in each of the grid blocks can be estimated as the averaged interactions of associated neighboring windows, which overlays the grid block. In the same manner, we can also estimate the variability of the neighbor interactions, which allows to quantify the local variance of interactions. That is, if the predicted interactions of the neighboring windows of the overlapping block has large variance, it can be said that this local area is likely to be governed by a composite of different context-dependent relationships.To validate this approach, we defined the target interaction coefficients of each sliding window S that contains K subdomains governed by different interaction parameters as follows:where k indicates a subarea in a window that has a different interaction and represents the area ratio of the subarea k to the total area of the sliding window S.
Pre-processing of real images
To infer the interaction coefficients from the real images, we transformed the RGB images into ternary images (2D array of 1 s, −1 s, and 0 s) to be available for the trained models. For identifying red and green colors, we used the following color ranges in an HSV color space: (H:36, S:25, V:25) to (H:70, S:255, V:255) for green colors and (H:-20, S:25, V:25) to (H:20, S:255, V:255) for red colors. To get prediction for the sliding windows, split images are fed to the ensemble of trained deep learning models.
Deep learning implementation
In silico image data generated using the agent-based model were stored in the mat file format. For the implementation of deep learning methods, we used PyTorch 1.0.1 (Python 3.6), an open source deep learning software library for the Python programming language (https://pytorch.org/). We employed the Python packages such as pandas to convert the mat files to pickle format, scikit-learn to efficiently manage training/test data, and OpenCV to import and process the real images to be fit to the deep learning models.We used the GPU-equipped machine to train deep learning models, which were configured as follows: Intel(R) Xeon(R) CPU E5645 @ 2.40 GHz CPU, 96 GB RAM, and Nvidia GeForce GTX 1080 Ti with 12 GB VRAM. Using this hardware configuration, training a ResNet-18 model with pre-activation blocks took approximately 6.4 h on average. The use of GPU was critical - model inference for an input took about 1.62 ms on average in the GPU mode, but took about 1.05 s in the CPU mode.
Experimental design and microscopy
Construction of fluorescent bacterial strains
Red and green fluorescent bacterial populations for co-culture studies were prepared as follows. We modified a non-chitinolytic E. coli K-12BW25113 △chiA, a chitinase gene chiA deletion strain from the Keio Knockout Collection (Dharmacon) [43], to include a genomic insertion of constitutively expressed mScarlet-I red fluorescent protein (RFP). To do so, we used a pMRE-Tn5-155, a miniTn5 plasmid with mScarlet-I RFP cargo, gifted from Mitja Remus-Emsermann (Addgene plasmid # 118545; http://n2t.net/addgene:118545; RRID:Addgene_118545) [44].Conjugation with the Tn5 mScarlet-I RFP plasmid was performed as follows: E. coli △chiA recipient was grown on Luria-Bertani (LB) (Lennox; Sigma Aldrich) agar supplemented with 50 mg L−1 kanamycin media for 24 h. Donor strain E. coli S17-1 was streaked and cultured overnight on LB solidified with 1.5% agar (Difco, BD) supplemented with 100 mg L−1 Ampicillin (Sigma Aldrich) to maintain the pMRE-Tn5 plasmid. Freshly grown bacteria were harvested and resuspended in 1× phosphate buffered saline (PBS) to reach an OD600 nm of 1.0 by spectrophotometer (Genesys20, Thermo Scientific). Donor and recipient strains were mixed in a 1:1 ratio and concentrated by centrifugation (4000g, Room temperature, 5 min) to reach an estimated OD600 nm of 20. The bacterial mix was drop spotted on Luria-Bertani (LB) agar and incubated for 18 h at 37 °C. After incubation, the cells were harvested and resuspended in 1 mL 1× PBS. The E. coli S17 strain is an auxotroph for Proline and Arginine [44], so MOPS minimal media [45] agar supplemented with 0.2% glucose and 50 mg L−1 kanamycin (Sigma Aldrich) was used to select for mScarlet-I RFP transformed E. coli △chiA. This selection was carried out four additional times and colony fluorescence was assessed by fluorescent plate scans conducted in a FluorChemQ (ProteinSimple) fluorescent imaging cabinet. Highly fluorescent colonies were selected and cultured in LB with 50 mg L−1 kanamycin and prepared as 10% glycerolstocks for long term storage at −80 °C.We engineered reporter strain bRE007 by transposon insertion of a fluorescent protein reporter into strain Pseudomonas fluorescens SBW25. The reporter cassette consists of a superfolder GFP gene (sfGFP) translationally fused to a chlorampenicol actetyltransferase gene (cmR) expressed from a highly efficient constitutive promoter (apFAB46). Briefly, to generate the strain we transformed a nucleoprotein complex of hyperactive Tn5 transposase (EZ-Tn5, Lucigen) and a PCR-amplified sfGFP-cmR cassette that includes the inverted repeat sequences recognized by Tn5. We prepared electrocompetent cells by growing SBW25 to OD 0.4 and washing multiple times in pre-chilled 10% glycerol. The cells were electroporated (1800 V, 200 O, 25 µF) with 100 ng of the amplified cassette in a 1 mm gap cuvette, recovered in LB broth at 30 °C for 2 h, and plated on LB agar supplemented with 34 μg mL−1 chloramphenicol. Highly fluorescent colonies identified by blue-light transillumination were plated onto fresh LB agar plates supplemented with 34 μg mL−1 chloramphenicol to obtain isogenic colonies. To identify the sfGFP-cmR insertion site, the isolated genome (DNeasy, Qiagen) was subjected to Nextera fragmentation (Illumina) followed by paired end Illumina sequencing (SNPSaurus, Oregon, USA). De novo assembly of sequence reads that did not match the SBW25 genome via Geneious was used to verify the presence of the sfGFP-cmR cassette and the insertion location. The reporter cassette inserted within gene PFLU_RS22785 (locus 5127878) is in a reverse orientation.
Coculture experiments
Bacterial strains were cultivated for 24 h in 5 mL of LB broth without antibiotics for wild type strains and either 20 mg L−1 chloramphenicol or 50 mg L−1 kanamycin for P. fluorescens SBW25 mNeonGFP or E. coli △chiA mScarlet RFP, respectively. To prepare culture chambers, silicon 4 well micro-inserts (Ibidi) with well dimensions of 2.0 × 1.5 × 4.2 mm (w × l × h) were secured to 25 × 75 mm, #1 (~0.13–0.17 mm) float glass coverslips (Chemglass Life Sciences) and wrapped in foil before autoclaving for 30 min, 121 °C. Next, chitin beads were fluorescently stained and prepared for microcosm incubations as follows: 100 µL of chitin resin (New England Biosciences) in 20% ethanol suspension was transferred to a 1.7 mL Eppendorf tube and centrifuged at room temperature (RT) for 2 min at 4000g (5810 R, Eppendorf). The supernatant was removed, and the chitin resin was resuspended in 20% Calcofluor White (Sigma Aldrich) in 1× PBS. The resin was stained in the dark at RT for 30 min, buffer exchanged twice, and diluted with enough filter sterilized MOPS minimal media to achieve ~10 beads (each bead ~50–120 µm) per µL as assessed by 10× light microscopy (MicroStar, American Optical). For the microcosms receiving homogenous carbon substrates, either the chitin monomer N-acetyl glucosamine (Sigma Aldrich), or pentaacetyl-chitopentaose (Megazyme) which is a chitin oligomer composed of only 5 N-acetyl glucosamine monomers, were prepared as 0.2% (w/v) in MOPS minimal media and filter sterilized with Millex 0.22 um PES syringe filters (Millipore). To set up the C substrate explicit microcosms, 1 µL of Calcofluor White stained chitin beads in MOPS minimal media was dispensed into each well followed by as additional 7 µL MOPS media. For the homogenous substrate cultures, 8 µL of 0.2% pentaacetyl-chitopentaose or 0.2% n-acetyl glucosamine in MOPS minimal media was dispensed into wells. The microcosm chambers were then wrapped in Parafilm (Bemis Company, Inc) until inoculation. To prepare the inoculum, 10 or 2 µL P. fluorescens and E. coli cultured to of 0.40 OD600 nm (3.2 × 108 cell mL−1) were centrifuged (6000g at RT) to pellet cells. Spent media was removed and cells were resuspended in 3.2 mL of fresh MOPS minimal media. Cell suspensions were then mixed in a 1:1 ratio and microcosm were inoculated with 1 µL of cell suspension
Microscopy imaging
For confocal laser scanning microscopy, co-culture chips were placed onto a Leica DMI6000 microscope equipped with a CSU 10 Confocal scanning unit (Yokogawa Corporation of America, Sugar Land, TX). Approximately 10 random fields for each culture were selected for imaging as follows: Calcofluor White stained chitin beads (355 nm Ex; 460/50 m Em), GFP (488 nm Ex; 525/50 m Em) and RFP (561 nm Ex; 595/30 m Em) with a Leica Plan APO 20/0.7 objective using a Coolsnap HQ2 (Photometrics, Tucson, AZ) controlled by MetaMorph version 7.7.8.0 (Molecular Devices, Sunnyvale, CA) software. The images were further processed in MetaMorph.
Results and discussion
Spatiotemporal simulation of interaction-specific microbial assembly
Using the agent-based model described in the previous section, we simulated the dynamic growth of two interacting microorganisms in a two-dimensional space of 100 µm × 100 µm size, over which all parameters were assumed to be invariant. The datasets generated under this configuration were used as a main source for training deep learning networks (Section 3.2). This size range was selected because at the scale of 100 µm, microbial interactions are known to play a key role as a primary driver of population structure and dynamics [5].Previous studies have shown that cooperative organisms develop stronger intermixed spatial patterns while competitive relationships lead to spatial segregation [25], [27]. Consistent with these findings, we also obtained distinct spatial patterns for three representative interaction types (competition, cooperation, and exploitation) (Supplementary Fig. S3), the intermixing levels of which were in the following order: cooperation > exploitation > competition (Supplementary Fig. S4). These results suggest that specific spatial patterns can be considered as ecological signatures of microbial interactions, although we recognize they may also be affected to a degree by other parameters (e.g., basal growth rates) that are not directly associated with interactions. Not surprisingly, spatial patterns become more interaction-specific as the population density increased with time, as indicated by the fact that the differences in the intermixing levels among different interaction types were initially smaller but became progressively larger (Supplementary Fig. S4).
Microbial assembly governed by spatially homogeneous parameters
We evaluated the performance of deep learning models in inferring the interactions of two co-evolving organisms under a previously described simple configuration. For this purpose, we iteratively performed 5000 dynamic simulations using the agent-based model. In each simulation, we randomly determined the parameter set and collected multiple in silico images of two organisms with time such that the total population density (in collected images) ranged from 10 to 70%.The simulation results in the previous section indicated that spatial images with higher population densities will be more informative than those with lower densities and that deep learning networks could be best developed using the images with the maximum highest available density. This gave rise to the following fundamental questions in implementing deep learning for actual applications: 1) “What is a threshold of population density, above which a reasonably good performance of deep learning networks is ensured?”; 2) “Will the deep learning networks trained using high density data be generalizable to infer interactions from lower density data?”, and 3) “Alternatively, what is the best training strategy for handling data with varying levels of population densities?”. These questions are systematically examined in the following sections.
Accurate prediction of interaction coefficients by deep learning networks trained by snapshot images
We illustrate the procedures of developing and evaluating deep learning networks when they are trained and tested with one-time snapshot image data with a similar population size (i.e., 70%). Developing deep learning networks requires the pre-determination of hyperparameters (i.e., parameters to determine model architectures such as the topology and size of neural networks and other options associated with the selection of optimization algorithms and regularization methods). Depending on the choice of network architectures and other hyperparameters, we considered 28 cases (i.e., eight CNN and six ResNet models optimized using two alternative optimization algorithms: mini-batch GD vs. Adam). Following a typical practice, we iteratively split 5000 datasets into 4000 (80%) for training and 1000 (20%) for testing. Among all plain CNN models, CNN#6 showed the best performance regardless of the choice of optimizers (Supplementary Tables S3 and S4). Interestingly, the CNN#6 features are relatively less deep but have a wider network structure (i.e., with 16, 32, and 64 channels) compared to other models. This is understandable considering that plain CNNs with increased depth tend to show higher training errors due to the difficulty in finding optimal parameters [33]. ResNet models had smaller validation errors than CNN models on average (Supplementary Table S5). Overall, the performance of deep learning networks was shown to be fairly good: the average R2 values of top 5 plain CNNs and top 3 ResNets for test datasets were 0.83 and 0.84, respectively (Table 1).
Table 1
Test performance of the top 5 in plain CNN configurations and the top 3 in ResNet configurations.
Training Dataset
Validation Dataset
Test Dataset
Avg. MSE
Avg. MSE
Avg. MSE
Avg. R2
Plain CNN (Top 5)
3.066 × 10−4
0.057
0.059
0.828
ResNet (Top 3)
2.269 × 10−4
0.051
0.053
0.844
Test performance of the top 5 in plain CNN configurations and the top 3 in ResNet configurations.
Enhanced performance by incorporating temporally evolving spatial images
In the previous section, we trained and tested deep learning networks against datasets with the same size of populations, but in reality, the population sizes may vary among images. This means that the population size in new images from which one attempts to extract interactions may not match with the ones that were used for training deep learning networks. Therefore, we examined to what extent deep learning networks trained with a specific level of population size (i.e., 70%) can be generalizable to different population sizes, ranging from 10 to 70%. As shown in Fig. 2(a), the performance of deep learning networks (as measured in terms of average RMSE and R2 scores) was the highest when applied to the images with population densities of 70% and became progressively worse as the density level decreased. Notably, R2 values became negative for the images with ≤30% densities, implying that deep learning networks performed worse than the null model. This result indicates that deep learning networks trained with a high-density data may be extended to the analysis of lower density data to a certain level but cannot be generalizable beyond that.
Fig. 2
Effect of the cell density on the prediction performance of deep learning models: (a) test performance of the best-performing models (chosen through rigorous hyperparameter optimization) for different cell density levels when trained using the images with the 70% cell density; (b) performance improvement of the ResNet models when trained using the images with the entire range of cell densities. In both (a) and (b), box plots describe the performance results of the best models and the red dots represent the performance results of the ensemble model in each cell density level. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Effect of the cell density on the prediction performance of deep learning models: (a) test performance of the best-performing models (chosen through rigorous hyperparameter optimization) for different cell density levels when trained using the images with the 70% cell density; (b) performance improvement of the ResNet models when trained using the images with the entire range of cell densities. In both (a) and (b), box plots describe the performance results of the best models and the red dots represent the performance results of the ensemble model in each cell density level. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)To expand the range of population densities that can be analyzed for inference, we tested two approaches. First, we independently trained deep learning models for each of the density levels between 10 and 70% and applied the resulting density-specific models only to the datasets with the corresponding same size of population. In comparison to the previous case above, the use of density-specific models led to improved inference (Supplementary Fig. S5). While the performance was still poor for population densities <20%, this may be ascribed to an intrinsic limitation in handling sparse data because such low population data will not provide sufficient spatial contexts to infer interaction coefficients. Despite improved performance, the use of density-specific models gave rise to an increased computational burden and inconvenience. In the example here, we had to develop seven individual deep learning networks (i.e., for population densities of 10~70%), the number of which increases if the population densities in new images do not match with any of the previously considered levels. One may handle this case by devising an additional heuristic – for instance, if the density level in a new image was 45%, one could average out the inference results obtained from 40% and 50%-density models. It should be realized, however, that this requires additional efforts to optimize these heuristics, which is often not straightforward.In the second approach, we developed deep learning networks (using the top 3 ResNet models) by expanding the training dataset to include the entire range of population sizes from 10 to 70%. Interestingly, this new model (that was trained with a single set of datasets collecting all images across all population densities even though the size is seven times larger than the previous cases) achieved a comparable performance to the case with density-specific models (Fig. 2(b)). The performance was particularly high for the 40 ~ 60% range of density levels, while being poor again when the density was below 20%. A potential reason for this improved performance is that temporal evolution of spatial patterns allows deep learning networks to extract additional information on microbial interactions, which could not be deducible from stagnant snapshot images. We therefore further improved the performance by employing a randomization-based ensemble approach [46]. That is, we aggregated the outputs from the top three ResNet models and took their averages as the final estimations. The ensemble approach produced a better predictive performance than individual models, even the best one in each cell density (Fig. 2(b)). Due to its more robust and accurate predictions we applied this approach in combination with the ensemble estimation in all subsequent cases.
Microbial assembly governed by spatially heterogeneous parameters
We extended our analysis to a larger-size domain where the previous assumption that interaction and growth parameters are invariant in space may not hold. For this purpose, we considered a 450 µm × 450 µm size domain and additionally performed 50 dynamic growth simulations by accounting for spatial heterogeneity of the parameter set in the agent-based model. We divided the computational domain into five subdomains such that the growth and interaction dynamics in each subdomain is governed by independent parameter sets; in each simulation, we randomly assigned five sets of model parameters (see Section 2.1.3 and Supplementary Fig. S1). This configuration sets up simple proof-of-concept simulations for the extended testing of deep learning models. The heterogeneous agent-based model properly generated composite spatial patterns as configured using distinct parameter sets in subdomains (Fig. 3(a)). Depending on how the parameters (randomly determined) were partitioned, both population densities and spatial patterns across subdomains may or may not be distinctive.
Fig. 3
Illustration of (a) an example of the assembly patterns governed by a composite set of 5 different interaction parameters and (b) the sliding windows of image size = 450 ( = 100, =25). There are 225 sliding windows in 15 rows and 15 columns. (c) shows the heatmaps of true interaction distributions when the step size is 25. The orange box and green box in (a) represent the sliding window () and the grid block (), respectively. (c) shows the heatmap of averaged interactions (left) and standard deviations of neighboring interactions (right). aGR and aRB represent the effect of R(red) on G(green) and the effect of G on R, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Illustration of (a) an example of the assembly patterns governed by a composite set of 5 different interaction parameters and (b) the sliding windows of image size = 450 ( = 100, =25). There are 225 sliding windows in 15 rows and 15 columns. (c) shows the heatmaps of true interaction distributions when the step size is 25. The orange box and green box in (a) represent the sliding window () and the grid block (), respectively. (c) shows the heatmap of averaged interactions (left) and standard deviations of neighboring interactions (right). aGR and aRB represent the effect of R(red) on G(green) and the effect of G on R, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)Our goal was to infer spatial variation of interactions using the previously developed top three ResNets without any additional training. This requires a special approach because deep learning networks were trained using the data generated under a completely different configuration. Thus, we extracted local interactions from a predefined 100 µm × 100 µm-sized window that slides through the space to scan the entire domain (see Section 2.2.5). With the step size of 25 µm, the 450 µm × 450 µm domain is filled with 225 (=15 × 15) smaller windows (e.g., Fig. 3(b)). Due to overlapping among windows, each grid is assigned with multiple values of interaction parameters (inferred from deep learning networks). We determined the grid-specific interaction parameters by taking an average of a set of values obtained from neighbor windows.The sliding window approach successfully inferred interaction parameters from composite spatial patterns (averaged R2: 0.79) (Supplementary Fig. S6). The ability to predict the spatial variation of interaction parameters was remarkable considering that deep learning networks were never previously fed with those data. Comparison of interaction parameter distributions demonstrates the performance of our approach in a more dramatic way. The example shown in Fig. 4 includes four different types of interactions: amensalism (subdomains 1 and 5), parasitism (subdomain 2), commensalism (subdomain 3), and competition (subdomain 5), all of which were correctly inferred from the sliding window method. Our approach also enabled visually identifying the borderlines across subdomains that are governed by distinct sets of parameters because they are characterized by relatively higher variations of predicted interaction coefficients as shown in Fig. 4. The performance of deep learning networks was consistently verified through all other examples (Supplementary Fig. S7).
Fig. 4
An example of interaction heatmaps, systematically built by aggregating all prediction results from the sliding window estimation, which represented the interaction governing a locally assembled community at microscale (See 2.2.5). Each grid (an overlapping block) can represent (a) the averaged interactions of the neighboring windows and (b) their standard deviations. The original assembly pattern was shown in Supplementary Fig. S9 and the left panels of (a) show the boundaries of five subdomains governed by different interactions. This visualization facilitates intuitive inspection how similar the target and predicted interaction patterns are. The grid locations of the interaction heatmaps in (a) and (b) corresponds to the coordinate of the input image. Each grid in (a) and (b) respectively shows the average values and standard deviations of interaction coefficients of the neighboring sliding windows that overlie the corresponding grid.
An example of interaction heatmaps, systematically built by aggregating all prediction results from the sliding window estimation, which represented the interaction governing a locally assembled community at microscale (See 2.2.5). Each grid (an overlapping block) can represent (a) the averaged interactions of the neighboring windows and (b) their standard deviations. The original assembly pattern was shown in Supplementary Fig. S9 and the left panels of (a) show the boundaries of five subdomains governed by different interactions. This visualization facilitates intuitive inspection how similar the target and predicted interaction patterns are. The grid locations of the interaction heatmaps in (a) and (b) corresponds to the coordinate of the input image. Each grid in (a) and (b) respectively shows the average values and standard deviations of interaction coefficients of the neighboring sliding windows that overlie the corresponding grid.
The shifts in microbial interactions inferred from real microscopic images
To demonstrate the applicability of the deep learning method for real microscopy images, we co-cultured GFP tagged P. fluorescens and RFP tagged E. coli △chiA (chitinase gene chiA null) with either a chitin polymer or N-acetyl glucosamine and captured images after 36 h of incubation by fluorescent confocal microscopy (see Section 2.3). These substrates were chosen because chitin polymers must first be enzymatically hydrolyzed to oligomers and monomers before the sugar can be utilized for energy and growth and is thus ideally suited for studying metabolic co-dependencies between populations. Because E. coli △chiA is incapable of degrading chitin, E. coli will depend on the chitinolytic activity of P. fluorescens to acquire chitin hydrolysis products (e.g. monomers and oligomers of N-acetyl glucosamine) in the microcosms supplied with chitin as the sole supplied carbon source. Therefore, resource accessibility would be determined by the species spatial organization in a given environment (Supplementary Fig. S8(a)). On the other hand, both species will compete for resources in the microcosms fed with the readily accessible chitin monomer, N-acetyl glucosamine (Supplementary Fig. S8(b)).Using the sliding window method described in the previous section, we were able to extract the local variation of interaction parameters from real images (because it is difficult to assume that spatial distributions of nutrient concentrations and population densities are perfectly uniform), while the level of their variation was insignificant in our experimental settings (Fig. 5). Overall predictions were consistent with our expectations, i.e., the deep learning networks predicted a degrader-cheater relationship between P. fluorescens and E. coli △chiA when grown on chitin polymers (Fig. 5(a)), but a competitive relationship when grown on their hydrolysis products (Fig. 5(b)). The latter case was clearly indicated by negative values of interaction coefficients, both (the influence of E. coli on P. fluorescens) (right top panel of Fig. 5(b)) and (the influence of P. fluorescens on E. coli) (right bottom panel of Fig. 5(b). In the case of growth on chitin polymers, the influence of E. coli on P. fluorescens was predicted to qualitatively vary in space (i.e., positive in the upper area; negative in the bottom area) (right top panel of Fig. 5(a)), while the effect of P. fluorescens on E. coli was predicted to be positive throughout the entire domain (right bottom panel of Fig. 5(a)). While local, the prediction of the positive influence of E. coli on P. fluorescens
) in the upper area mismatched our expectation. Unlike Fig. 5(b), the image in Fig. 5(a) shows that P. fluorescens is predominant while the presence of E. coli is minor. Thus, two possibilities are indicated: 1) our deep learning network was not sufficiently trained against those situations, or 2) images (dominated by one species) pose an intrinsic challenge in accurately predicting the influence of the minor species on its partners. Despite this uncertainty, we stress that our deep learning model demonstrated unprecedented performance in inferring microbial interactions with acceptable accuracy, while predicting specific interspecies interactions from snapshot images is far from intuitive.
Fig. 5
Interaction heatmaps automatically predicted from species level assembly patterns (i.e., real microscopy images) influenced by various trophic dynamics. These images were taken from the co-culture experiments of P. fluorescens (Green fluorescent) and E. coli △chiA (Red fluorescent) in (a) chitin beads and (b) chitin monomer N-acetyl glucosamine (See 2.3). (a) Indicates that, when cultured with chitin, P. fluorescens positively affects E. coli △chiA growth since the heatmap in the bottom right panel shows a reddish color that stands for the strong positive interaction of the P. fluorescens on E. coli. In contrast, the top right panel of (a) shows E. coli has no significant effect on P. fluorescens growth (interaction coefficients reside between −0.2 to +0.2). In (b), the negative effects of P. fluorescens (a became a bit stronger than its counterpart in the left side. However, as it went to the right, the negative effects of E. coli (a did get much stronger than the other. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Interaction heatmaps automatically predicted from species level assembly patterns (i.e., real microscopy images) influenced by various trophic dynamics. These images were taken from the co-culture experiments of P. fluorescens (Green fluorescent) and E. coli △chiA (Red fluorescent) in (a) chitin beads and (b) chitin monomer N-acetyl glucosamine (See 2.3). (a) Indicates that, when cultured with chitin, P. fluorescens positively affects E. coli △chiA growth since the heatmap in the bottom right panel shows a reddish color that stands for the strong positive interaction of the P. fluorescens on E. coli. In contrast, the top right panel of (a) shows E. coli has no significant effect on P. fluorescens growth (interaction coefficients reside between −0.2 to +0.2). In (b), the negative effects of P. fluorescens (a became a bit stronger than its counterpart in the left side. However, as it went to the right, the negative effects of E. coli (a did get much stronger than the other. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Conclusions
Here, we proposed supervised deep learning as a new network inference method to predict interspecies interactions from nonconventional data, i.e., spatial patterns of microorganisms. Due to the unsuitability of real image data for training deep learning models, we employed agent-based models for data generation. Agent-based modeling is commonly used to simulate the self-organization of interacting microorganisms in space and time as governed by specific interaction and growth rules. Our use of the agent-based model was exactly the opposite, i.e., we used their simulation results to train deep learning network models for computational inference of interspecies interactions (i.e., for reverse modeling). Through case studies considered in this work, we demonstrated that: 1) agent-based modeling can serve as an effective source of in silico images (for training deep learning network models); 2) in combination with the sliding windows technique, the deep learning network trained on spatially homogeneous environments can extract local variances of interaction parameters in heterogeneous environments; and 3) in the application to real images, the resulting deep learning network can correctly differentiate context-dependent microbial interactions.Challenges could arise in extending the developed computational method to predict interspecies interactions in complex, natural microbial communities that contain diverse microbial species including microeukaryotes as well as bacteria, which are mostly unculturable. For example, the predictive power of deep neural networks is dependent on the accuracy of in silico data, which is an intrinsic issue with all machine learning methods that use model-generated data for training. Currently, reliable simulation of spatial patterns formed by multiple interacting species remains challenging, often due to the lack of fundamental knowledge of growth characteristics of individual species in environmental communities. However, this gap is expected to be minimized through rapid advancement in simulation methods [47], [48], as well as cultivation techniques [49], [50] that enable building structurally tractable synthetic consortia derived from complex communities to provide molecular omics data that can inform agent-based modeling for more accurate simulations.Our work provides a generalizable pipeline for developing deep learning models from image data, which can serve as a basic platform for further development by combining with strategic data generation through robust design of experiments and transfer learning and fine-tuning using pre-trained models [51] (e.g., to overcome lack of experimental image data [22]). Co-design of experimental and modeling studies following the procedures proposed in this work is our key contribution that can facilitate to reveal key but previously unknown interaction mechanisms in complex microbial communities that have been underexplored to date.
Author statement
J.-Y.L., N.C.S, and H.-S.S. designed the study and wrote the manuscript. J.-Y.L. trained and tested deep learning networks. H.-S.S. performed agent-based model simulations to generate in silico images. N.C.S. performed experiments to collect microscopic images. R.G.E. transformed P. fluorescens, and C.R.A., K.S.H., and J.K.J. contributed to the design of experiments. All authors read and approved the final manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Ferdi L Hellweger; Robert J Clegg; James R Clark; Caroline M Plugge; Jan-Ulrich Kreft Journal: Nat Rev Microbiol Date: 2016-06-06 Impact factor: 60.633
Authors: Jared L Wilmoth; Peter W Doak; Andrea Timm; Michelle Halsted; John D Anderson; Marta Ginovart; Clara Prats; Xavier Portell; Scott T Retterer; Miguel Fuentes-Cabrera Journal: Front Microbiol Date: 2018-02-06 Impact factor: 5.640
Authors: Rudolf O Schlechter; Hyunwoo Jun; Michał Bernach; Simisola Oso; Erica Boyd; Dian A Muñoz-Lintz; Renwick C J Dobson; Daniela M Remus; Mitja N P Remus-Emsermann Journal: Front Microbiol Date: 2018-12-12 Impact factor: 5.640
Authors: Hyun-Seob Song; Joon-Yong Lee; Shin Haruta; William C Nelson; Dong-Yup Lee; Stephen R Lindemann; Jim K Fredrickson; Hans C Bernstein Journal: Front Microbiol Date: 2019-06-11 Impact factor: 5.640
Authors: Joon-Yong Lee; Shin Haruta; Souichiro Kato; Hans C Bernstein; Stephen R Lindemann; Dong-Yup Lee; Jim K Fredrickson; Hyun-Seob Song Journal: Front Microbiol Date: 2020-01-21 Impact factor: 5.640
Authors: Aaron Yip; Julien Smith-Roberge; Sara Haghayegh Khorasani; Marc G Aucoin; Brian P Ingalls Journal: PLoS Comput Biol Date: 2022-10-13 Impact factor: 4.779
Authors: Vardan Andriasyan; Artur Yakimovich; Anthony Petkidis; Fanny Georgi; Robert Witte; Daniel Puntener; Urs F Greber Journal: iScience Date: 2021-05-15