Silvia Zaoli1, Jacopo Grilli1. 1. Quantitative Life Sciences section, The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy.
Abstract
The large taxonomic variability of microbial community composition is a consequence of the combination of environmental variability, mediated through ecological interactions, and stochasticity. Most of the analysis aiming to infer the biological factors determining this difference in community structure start by quantifying how much communities are similar in their composition, trough beta-diversity metrics. The central role that these metrics play in microbial ecology does not parallel with a quantitative understanding of their relationships and statistical properties. In particular, we lack a framework that reproduces the empirical statistical properties of beta-diversity metrics. Here we take a macroecological approach and introduce a model to reproduce the statistical properties of community similarity. The model is based on the statistical properties of individual communities and on a single tunable parameter, the correlation of species' carrying capacities across communities, which sets the difference of two communities. The model reproduces quantitatively the empirical values of several commonly-used beta-diversity metrics, as well as the relationships between them. In particular, this modeling framework naturally reproduces the negative correlation between overlap and dissimilarity, which has been observed in both empirical and experimental communities and previously related to the existence of universal features of community dynamics. In this framework, such correlation naturally emerges due to the effect of random sampling.
The large taxonomic variability of microbial community composition is a consequence of the combination of environmental variability, mediated through ecological interactions, and stochasticity. Most of the analysis aiming to infer the biological factors determining this difference in community structure start by quantifying how much communities are similar in their composition, trough beta-diversity metrics. The central role that these metrics play in microbial ecology does not parallel with a quantitative understanding of their relationships and statistical properties. In particular, we lack a framework that reproduces the empirical statistical properties of beta-diversity metrics. Here we take a macroecological approach and introduce a model to reproduce the statistical properties of community similarity. The model is based on the statistical properties of individual communities and on a single tunable parameter, the correlation of species' carrying capacities across communities, which sets the difference of two communities. The model reproduces quantitatively the empirical values of several commonly-used beta-diversity metrics, as well as the relationships between them. In particular, this modeling framework naturally reproduces the negative correlation between overlap and dissimilarity, which has been observed in both empirical and experimental communities and previously related to the existence of universal features of community dynamics. In this framework, such correlation naturally emerges due to the effect of random sampling.
A surprising large number of microbial species is found in a spoon of soil or a drop of water sampled at a single location and time [1]. The large values of alpha-diversity parallel with high beta-diversity: the taxonomic composition would be different if the sample were collected at a different time or in a different location [2].A primary objective of microbial ecology is to link the observed variability of taxonomic composition with its mechanistic causes. In order to achieve this goal, since the first environmental assays of the late 80s to today’s large sequencing efforts, one of the main goals of microbial communities data-analysis has been to disentangle the predictable, replicable variation of community composition—the “signal”—to contingent, non-replicable, uninformative, variability—the “noise”. Methods to identify replicable temporal or spatial patterns in the change of community composition typically rely on some measure of dissimilarity between communities—or equivalently, on a beta-diversity metric. Such a measure allows in fact to define a “distance” metric between communities, which can be ultimately used to compare and cluster communities. For example, a commonly used model-free approach is Principal Coordinate Analysis [3], which takes as input a matrix of sample-to-sample distances and identifies the coordinates which explain most of the variation between samples. This method allows to infer which variables are a more relevant source of variation and to identify clusters of similar samples. For instance, at the global scales, clusters identified comparing samples of microbial communities from different environments all around the world are well explained by the environment type [1]. At a smaller scale, the composition of gut microbial communities is associated with host clinical markers and lifestyle factors [4].Despite the centrality of similarity and beta-diversity metric in microbial ecology analysis pipelines, we lack a mechanistic understanding of which aspects of community variability influence their values. Here we aim at formulating a quantitative phenomenological framework able to reproduce the observed statistical properties of community similarity. The dissimilarity between two communities is in fact caused both by signal and by noise, but we miss a modeling framework that can be used to assess the effect of each. The sampling nature of the data also has a strong effect on several beta-diversity metrics [5], as it adds an additional source of noise and a bias in the observations, and should be explicitly considered.Macroecology is a promising avenue for filling this gap. By characterising the statistical properties of community composition, macroecology provides access to quantities that are reproducible across systems. In perspective, a macroecological approach could allow to disentangle the statistical property of contingent, non-reproducible, noise from the reproducible statistical features of environmental variability.Most of the efforts in macroecology have been focused on describing and predicting alpha-diversity. For instance, the species abundance distribution (SAD) of empirical microbiome is well characterized across ecosystems [6]. The abundance fluctuation distribution (AFD) is well described by a Gamma distribution, while the distribution of the mean abundance of species is typically well described by a Lognormal [7]. One can extend the macroecological description to dynamics, and characterise the variability of species abundance and diversity across timescales [8, 9].One of the examples of the study of beta-diversity under a macroecological perspective is given by the dissimilarity-overlap analysis (DOA) [10], where beta-diversity metrics have been used to infer ecological mechanisms underlying the differences in composition between samples. The DOA is based on two beta-diversity measures, dissimilarity and overlap. The dissimilarity between two communities measures the differences of the relative abundances of the species present in both samples. The overlap measures the probability that, if we pick an individual from one of the two samples, it belongs to a species that is present in both samples. These two metrics capture, in principle, two distinct aspects of community variability and should therefore vary independently. However, when they are plotted one against the other for a set of samples—in what is termed the Dissimilarity-Overlap curve (DOC)– both natural [10] and experimental [11] communities display a decreasing pattern: communities with high overlap tend to have low dissimilarity. The robustness of this DOC pattern suggests that it could be explained by a robust, general, process. The leading interpretation is that a decreasing DOC is a consequence of the universality of the dynamics [10]: different communities are subject to the same ecological dynamics, characterised by the same parameters, and differ because they occupy different dynamical attractors. Other studies have shown that other mechanism than universal dynamics might be responsible of a decreasing DOC [12]. One limit of these observations and their interpretation is that they are mostly qualitative. For instance, both the empirical DOC and models based on environmental gradients [12] produce negative DOCs. But are the empirical and modeled DOCs in quantitative agreement? It is not trivial to answer this question, because most of the available (null) models cannot be easily parameterized using the available data.More generally, one could generalize the DOA by comparing the values of the several existing beta-diversity metrics. Empirical values of different beta-diversity measures are in fact correlated [13], but, again, we lack a quantitative understanding of their relationship.Here, we introduce a model of community composition that is able to reproduce quantitatively the empirical values of beta-diversity and the relationship between different beta diversity metrics. The model includes two sources of variability for microbial communities. First, temporal stochasticity, corresponding to the non-reproducible sources of variation. This variability was shown to be well described by the stochastic logistic model (SLM), a model that describes the temporal evolution of species abundances under a stochastic environmental noise [7, 14]. According to this model, species abundances fluctuate in time around a constant typical abundance. The second source of variability concerns how this typical abundance differs across communities [9], which represents the reproducible part of variability. The difference between these two sources, which in our model is controlled by varying a single parameter, can be of a larger or lower magnitude. Both sources of variability are modelled phenomenologically, and several mechanisms could underlie them. For instance, ecological interactions contribute to both, as they may mediate rapid abundance fluctuations and be the origin of alternative stable states in community dynamics. Environmental factors also contribute to both, either in the form of rapid environmental fluctuation or of differences in the overall environmental conditions across communities. Importantly, our model also incorporates explicitly the sampling process, allowing us to study the effect of sampling on the relationship between beta-diversity metrics and, in particular, on the DOC.We compare the model predictions with empirical data of the human gut microbiome of different human hosts (see Methods). The model, by varying the parameter measuring the difference across-communities, jointly reproduces several beta-diversity metrics both within and across hosts. As a consequence, it also reproduces quantitatively Dissimilarity-Overlap curves, uncovering how random sampling introduces a relationship between these two—in principle, but not in practice—independent metrics.
Results
A model for community composition with tunable similarities
The model for community composition that we propose is based on the statistical properties of empirical microbial communities which describe how they change across time and space. The stationary fluctuation of an OTU i abundance λ in time are well described by the stochastic logistic model (see [7] and Methods). Consequently, at stationarity the abundance is Gamma distributed
where K is directly related to the carrying capacity of an OTU i, and σ ∈ [0, 2) is related to the level of environmental variability (see Methods). Given the compositional nature of the data, the carrying capacity of OTUs cannot be estimated from the data, as the total abundance is not known. It is however possible to show that the values K are proportional to the carrying capacity, up to an unknown proportionality constant, which is sample-specific and common to all species in that sample [7, 9]. While the proportionality constant is unknown, one can effectively ignore its value as it does not impact the properties of the fluctuations of species abundance [7]. For simplicity, with this important caveat in mind, we will still refer to K as carrying capacity in the following. The values of K and σ for each OTU can be estimated from the time series of its abundance (see Methods). Our previous analyses [9] showed that the value of K for an OTU remains constant for long stretches of time. Over timescales of weeks, Eq 1, together with a set of values of K and σ characterises the time-variability of composition.Comparing the composition across hosts, it is known that differences in K, together with random fluctuations of abundance, explain almost all the dissimilarity between hosts [9]. Differences in the values of σ of each OTUs between communities, instead, have a much smaller role in differentiating communities. Values of K estimated from the time series of two different hosts are correlated at various degrees (see Tables B, C, D, and E in S1 Text), but always at a significant level. This shows that the carrying capacity is to some extent characteristic of the OTU. As expected however [9] the correlation is lower than the one obtained by comparing two segments of the time-series of a given individual. This indicates that different communities, and the same individual over time, have values of K that are different, although highly correlated.The distribution of K is lognormal above a threshold (Fig 1A, dashed lines are the threshold) [7]. The existence of a threshold is due to sampling. In fact, if N is the sampling depth, OTUs with values of K close to or below 1/N might not be sampled. While this is a probabilistic effect, we approximate it as an hard cutoff [7]. We fit therefore a truncated lognormal distribution (black line in Fig 1A) to all the K > c, with c a threshold different for the different environments. For all environments, the fitted lognormal describes well the data above the threshold. It is important to notice that, since the values K vary over-time, the inferred distribution capture both the intrinsic variability across OTUs and the variability of each OTU over time. As shown in [9], both these sources of variability are lognormally distributed and combine in a multiplicative way, resulting in a overall lognormal distribution for the combined effect.
Fig 1
Parametrization of the model.
A) Distributions of K for each individual. Colored points and lines are normalized histogram of the data. The black line is a maximum likelihood fit to a truncated lognormal distribution, with truncation at 10−4.5, of the values from all individuals (parameters of fitted lognormal: μ = −19.85, s = 4.93). The dashed line refers to the cutoff due to finite sampling: below this value the effect of sampling produces deviation from the Lognormal distribution; B) Distribution of σ2 for each individual (colored points and lines are normalized histogram of the data). The black line is a maximum likelihood fit to an exponential distribution of the values from all individuals (mean = 0.93). All individuals are characterized by the same distributions of carrying capacity K and variability σ.
Parametrization of the model.
A) Distributions of K for each individual. Colored points and lines are normalized histogram of the data. The black line is a maximum likelihood fit to a truncated lognormal distribution, with truncation at 10−4.5, of the values from all individuals (parameters of fitted lognormal: μ = −19.85, s = 4.93). The dashed line refers to the cutoff due to finite sampling: below this value the effect of sampling produces deviation from the Lognormal distribution; B) Distribution of σ2 for each individual (colored points and lines are normalized histogram of the data). The black line is a maximum likelihood fit to an exponential distribution of the values from all individuals (mean = 0.93). All individuals are characterized by the same distributions of carrying capacity K and variability σ.We also explored the heterogeneity of environmental variability σ across species, which has been previously neglected. We find that the distribution of σ2 is exponential with mean 0.93 (see Fig 1B).To formulate our model, we start by observing that different communities are characterized by the same parameters of the lognormal distribution of carrying capacities and of the exponential distributions of variability σ2. See, for example, the distributions corresponding to the gut communities of different individuals in Fig 1A and 1B. Therefore, we assume that K and σ are identically distributed across communities. We further assume that the total number S of OTUs present in a community, including the ones that are present but undetected, is the same across communities.Then, the aim of our model is to generate pairs of communities with different levels of similarity. Based on the previous observations on the variability of K and σ across communities [9], we assume that an OTU i in two different communities have the same values of σ but different values of K. The correlation between the logarithms of the carrying capacities, ρ, obtained by averaging across OTUs, tunes the level of similarity of two communities.To generate a pair of communities, we generate S pairs of values from a bivariate gaussian distribution with mean and variance corresponding to the ones of log K and with correlation ρ. Each pair of values corresponds to an OTU i, and represents the logarithms of its carrying capacity log K in the two communities. For each OTU i we also extract a value of σ2, common to the two communities, from the corresponding exponential distribution. Given the set of K and σ values for a community, we extract the real abundance λ of OTU i from the Gamma distribution in Eq (1) with parameters K and σ. We finally extract a finite sample of the community with total number of reads N from a multinomial distribution with N trials.
By tuning the correlation of carrying capacity the model predicts a wide range of values of beta-diversity
We use the model formulated in the previous section to generate in-silico communities with a range of values of community similarities, obtained by varying the correlation between carrying capacities ρ. We use this ensemble of communities to study the value of different beta-diversity measures.In particular, we simulate pairs of samples from different communities and from the same community at different times, by tuning the value of ρ. For each pair we compute six different beta-diversity measures: i) Jaccard similarity, ii) Sørensen similarity index, iii) Whittaker index, iv) Morisita-Horn similarity, v) Bray-Curtis dissimilarity, and vi) Horn similarity. The choice of these commonly used different measures is aimed at covering different types of measures, accounting for presence-absence, abundance, or both, and more or less focused on common species rather than rare ones. These different beta-diversity measures have correlated values in the synthetic communities (see Fig 2, where the measures are all plotted against the input value of ρ). The correlation can be positive or negative because some metrics measure similarity and other dissimilarity. Note that the maximal level of correlation ρ = 1 does not correspond to the theoretical maximum level of similarity (e.g. 1 for Jaccard or 0 for Bray-Curtis). Each OTU is in fact also subject to the stochastic environmental fluctuations of the Stochastic Logistic model, which are by definition independent across communities. This level of fluctuation, together with the effect of finite sampling, contribute to a decreased level of similarity.
Fig 2
Relationships between dissimilarity measures for communities generated with the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the Carrying capacity correlation ρ. Grey circles represent the 200 pairs of communities generated with the model. Each community has S = 104 OTUs. Each pair of communities have the same σ, extracted from an exponential distribution with mean 0.9. Values of K are extracted from a lognormal distribution with parameters μ = −19 and s = 5. 100 pairs have the same values of K, to mimic samples from the same community at different times. The remaining 100 pairs have correlated values of K, with ρ ranging between 0.5 and 1, obtained by exponentiating values extracted from a bivariate Gaussian distribution. For each community, abundances are extracted from a Gamma distribution with parameters K and σ. For the pairs with the same values of K, Gamma-distributed abundances have a correlation ranging from 0 to 0.5. Reads are obtained from the real abundances by simulating multinomial sampling with number of reads 3 * 104. Black lines are the binned average of the grey circles.
Relationships between dissimilarity measures for communities generated with the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the Carrying capacity correlation ρ. Grey circles represent the 200 pairs of communities generated with the model. Each community has S = 104 OTUs. Each pair of communities have the same σ, extracted from an exponential distribution with mean 0.9. Values of K are extracted from a lognormal distribution with parameters μ = −19 and s = 5. 100 pairs have the same values of K, to mimic samples from the same community at different times. The remaining 100 pairs have correlated values of K, with ρ ranging between 0.5 and 1, obtained by exponentiating values extracted from a bivariate Gaussian distribution. For each community, abundances are extracted from a Gamma distribution with parameters K and σ. For the pairs with the same values of K, Gamma-distributed abundances have a correlation ranging from 0 to 0.5. Reads are obtained from the real abundances by simulating multinomial sampling with number of reads 3 * 104. Black lines are the binned average of the grey circles.
The model reproduces the empirical relationships between dissimilarity measures
We compare the empirical relationships between beta-diversity measures with the ones obtained from the model. A critical problem in doing the comparison is that the values of K are not known for individual samples, and therefore, the value of ρ is unknown for any pair of samples.In order to circumvent this problem we used our model to infer the value of ρ in any pair of samples (see Methods). We notice that, in our in-silico data, there is a simple relationship between the Spearman correlation of abundances of a pair of sample and their value of ρ (see S1 Fig). By characterizing this relationship, we are able to infer the value of ρ of a pair of samples.Given the inferred value of ρ we then generate a pair of in-silico samples. In addition to ρ, to fully specify the in-silico data, we used the fitted distributions of K and σ, a number of reads equal to the average of the reads in the empirical samples, and a number of OTUs S equal to the number estimated (see SI section 2). The relationships predicted by the model follow quantitatively the empirical patterns (Fig 3).
Fig 3
Comparison between the relationships between dissimilarity measures in empirical data and according to the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the carrying capacity correlation ρ. Black circles correspond to pairs of empirical samples from the same host, while grey squares correspond to pairs of empirical samples from different hosts (but of the same dataset). The value of ρ for individual samples is inferred as specified in the Methods. Red dots are the binned average of the predictions of the model. The model is simulated with the distributions of K and σ fitted from the data, and with a number of species equal to the one estimated (see Section B in S1 Text). The number of reads is equal to the average number of reads for the empirical samples, 3 ⋅ 104.
Comparison between the relationships between dissimilarity measures in empirical data and according to the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the carrying capacity correlation ρ. Black circles correspond to pairs of empirical samples from the same host, while grey squares correspond to pairs of empirical samples from different hosts (but of the same dataset). The value of ρ for individual samples is inferred as specified in the Methods. Red dots are the binned average of the predictions of the model. The model is simulated with the distributions of K and σ fitted from the data, and with a number of species equal to the one estimated (see Section B in S1 Text). The number of reads is equal to the average number of reads for the empirical samples, 3 ⋅ 104.This suggests that the ingredients of the model are sufficient to capture the statistical features of communities and of differences between communities that are relevant to determine the relationships between beta-diversity measures. In Table A in S1 Text, we report the proportion of variation observed in the data for the different beta-diversity metrics explained by our model.
Overlap-dissimilarity negative relationship is expected under finite sampling
In this section we consider the two beta-diversity metrics, Dissimilarity and Overlap, introduced in [10] (see Methods). Our model shows that a decreasing Overlap-Dissimilarity curve can emerge purely because of sampling, and therefore might not reflect any ecological property of the communities.For the communities generated with the model, prior to sampling, overlap and dissimilarity are completely independent. In fact, all pairs of communities have overlap equal to 1, as all species are always present. However, after simulating the sampling, a non-trivial Dissimilarity-Overlap curve emerges (Fig 4A). For pairs of communities with a medium to high correlation ρ between their (log) values of K, the Dissimilarity-Overlap curve has a decreasing pattern. The two insets of panel A clarify why this pattern emerges. The insets show the abundances of two pairs of communities, one with a high ρ and one with a low ρ. The dissimilarity of the pair of samples is computed on the OTUs that are sampled in both communities, i.e. the blue points. It is clear from the insets that the abundances of the blue OTUs are more dissimilar when ρ is low. The overlap, instead, depends on the abundance of the OTUs observed in both samples (blue) with respect to that of all the observed ones (blue + orange). This quantity also has a clear pattern with ρ, because the proportion of orange OTUs diminishes when ρ increases. These two effects, caused by sampling, create the the decreasing pattern in the Dissimilarity-Overlap curve.
Fig 4
Overlap-Dissimilarity curves in the model and in empirical data.
A) Relationship between Overlap and Dissimilarity for communities generated with the model. The color of the circles corresponds to the correlation ρ of the values of K of the community pair, ranging from 0.6 to 1. The two insets show scatter plots of the abundances in two pairs of communities, one with a high ρ and one with a low ρ. Blue circles represent OTUs sampled in both communities, orange circles OTUs sampled in only one community, and red circles OTUs sampled in neither. The dotted lines mark 1/N. B) Relationship between overlap and dissimilarity in empirical data (black circles: samples from the same hosts, grey squares: samples from different hosts) and according to the model (red circles, binned average of model prediction). The inset shows the same plot with a logarithmic scale on the x axis. For the main plot, the binned average of the model prediction is performed along the y axis, to better capture the pattern at high overlap values.
Overlap-Dissimilarity curves in the model and in empirical data.
A) Relationship between Overlap and Dissimilarity for communities generated with the model. The color of the circles corresponds to the correlation ρ of the values of K of the community pair, ranging from 0.6 to 1. The two insets show scatter plots of the abundances in two pairs of communities, one with a high ρ and one with a low ρ. Blue circles represent OTUs sampled in both communities, orange circles OTUs sampled in only one community, and red circles OTUs sampled in neither. The dotted lines mark 1/N. B) Relationship between overlap and dissimilarity in empirical data (black circles: samples from the same hosts, grey squares: samples from different hosts) and according to the model (red circles, binned average of model prediction). The inset shows the same plot with a logarithmic scale on the x axis. For the main plot, the binned average of the model prediction is performed along the y axis, to better capture the pattern at high overlap values.To verify if the model can quantitatively explain the Dissimilarity-Overlap curve of empirical data, we compare the empirical pattern with that obtained from the simulation of the model. The model is able to reproduce well the Dissimilarity-Overlap curve of empirical data (Fig 4B). This result indicates that the relationship created by sampling between dissimilarity and overlap is sufficient to explain the decreasing pattern seen in empirical data.
Discussion
In this paper, we have introduced a modeling framework to describe the variability of community composition in space and time. The model is based on the stochastic logistic model (SLM), which describes the time variability and identifies the parameters that characterize a community, that is, the carrying capacity and the noise intensity. Our framework allows to model pairs of community with different levels of similarity by setting the correlation of their carrying capacities. The model quantitatively reproduces the values of several beta-diversity metrics, which weight in different ways the statistical properties of community composition variability. In particular the model naturally reproduces the negative relationship between overlap and dissimilarity observed in empirical [10] and experimental [11] data.Our framework is based on several assumptions. In particular, it assumes that the SLM describes well the dynamics and properties of communities and that the difference in the abundance of species across communities can be captured by considering two SLMs with different carrying capacities. These two assumptions have been extensively studied in previous works. The SLM reproduces the dynamical properties of microbial communities as well as several macroecological patterns observed in empirical data [7, 14]. The variation of carrying capacities suffice to explain the typical difference of composition between communities [9].A novel assumption—and result—of this work is that the differences in similarity observed across pairs of communities can be fully described and collapsed in a single parameter: the correlation between the (log) carrying capacities in the two communities, ρ. The fact that, by varying this parameter, one reproduces the relationship between different beta-diversity metrics is a direct test of this assumption. In principle, other properties could matter to differentiate communities. For instance, the amplitude of abundance fluctuations σ could in principle differ across communities and be important to explain the observed beta-diversity. Our analysis complements the results of [9] by showing that the variability in σ is negligible from a macroecological perspective. Another element that could in principle determine beta-diversity is the set of species present in a community, which could differ across community. However, our model shows that differences in carrying capacity, together with finite sampling, are sufficient to explain empirical beta-diversity values without the need to assume that different communities have different presence-absence patterns.An interesting aspect of the variability σ, that we unveil in this paper, is that its values have a reproducible distribution across species. Interestingly, σ2 is exponentially distributed, with a similar scale parameter across individuals. This novel macroecological patterns adds to the list of reproducible statistical properties of microbial communities, that a comprehensive theory should be able to reproduce.Our model does not address the biological origin of the values of the parameters and of the correlation of carrying capacities across individual. Their variation is due, potentially, to multiple biotic and abiotic factors, and to the interactions between species. Our framework clarifies that they are the effective dynamics parameters that suffice to explain the statistical properties of community composition.Once the macroecological patterns are taken into account and the SLM is used to generate in-silico communities, one naturally reproduces the empirical values of beta-diversity metrics. While our model has proven flexible (can reproduce a wide range of values of similarity) and accurate (reproduces quantitatively the empirical values), we have not studied here the fluctuations of the beta-diversity metrics and whether their joint distribution is captured by our model. Most likely, the several mechanism that the Stochastic Logistic model neglects contribute to the shape of these distributions. Characterizing these deviations from our model and linking them to the ecological forces that we did not consider in this work is the natural step forward.Of particular relevance is the ability of our model to reproduce the empirical Dissimilarity-Overlap curve (DOC), which raises questions on its interpretation. Bashan et al. [10] interpret the empirical Dissimilarity-Overlap curve as the consequence of the fact that species dynamics is governed by the same equations and the same parameters across communities. In this view, two communities with a similar set of species (high overlap) would have similar stable states (low dissimilarity) and vice versa, producing a DOC with negative slope. Our analysis points to an alternative origin of the empirical DOCs. For communities described by non-universal parameters (correlated but different carrying capacities), a DOC with negative slope naturally emerges due to finite sampling. Consequently, the DOC would have no implication on underlying ecological mechanisms.Beyond the interpretation of the DOCs, our results speaks directly to the original assumptions of the dissimilatity-overlap analysis. The leading assumption behind it is that the two measures of dissimilarity and overlap are independent. This statement about independence is always defined only in the light of a model for community composition: given a null statistical ensemble, the value of the dissimilarity is not correlated to the one of overlap. The model considered in [10] implicitly assumes that the typical abundance of a species and its occupancy (how likely the species is to be present) are independent, which is not verified in the data. Occupancy and average abundance do in fact display a strong correlation [15], which is predicted by the SLM with finite sampling [7]. By including explicitly sampling noise, the SLM reproduces the relation between occupancy and abundance observed in the data [7] and qualifies therefore as more reasonable model to test the independence between overlap and dissimilarity. Once the effect of sampling is included, and therefore the non-independence between abundance and occupancy is taken into account, a negative correlation between dissimilarity and overlap naturally emerges (as shown in Fig 4).One additional interesting prediction of our framework, is that, when samples with small enough overlap are included, the DOC curve should take the shape of an inverse U (Fig 4). Therefore, for small enough values of the overlap, our model predicts positive DOCs. In empirical data, the range of values of overlap is not large enough to observe this trend. However, in experimental data [11], by considering in vitro communities grown in different nutrients it is possible to reach smaller values of overlap. In those cases an inverse-U DOC is in fact observed, as predicted by our framework.Our results add an important step to the quantitative understanding of the structure of microbial communities. By generating more and more realistic in-silico communities, we gain an understating of what salient features of the data are the direct results of relevant biological and ecological processes, therefore allowing to disentangle general processes from contingent factors, signal from noise.
Methods
Data
We analyze gut microbial communities. We consider time-series of 14 individuals coming from three different datasets: ten individuals of the BIO-ML dataset [16] (all those for which a dense long-term time-series is available), the two individuals M3 and F4 from the Moving Pictures dataset [17] and the two individuals A and B from [18]. The length of the time-series ranges from 6 months to 1.5 year, and the sampling frequency varies (daily in the most dense series). Individuals A and B from [18] both undergo a period of strong disturbance to their gut flora due, respectively, to two diarrhoea episodes during a travel abroad and a Salmonella infection. We exclude these periods from the analysis and consider for each individuals two separate time-series, before and after the perturbation. Only samples with a number of reads N > 104 are used. Detail on how the raw data were analyzed can be found in the Supplementary Information of [9].
Statistical properties of the fluctuations of abundances
The stationary fluctuations of OTUs abundance have been shown to follow a Gamma distribution [7]. Additionally, dynamical properties of these fluctuation suggest that abundance dynamic can be described by a Stochastic Logistic Model with environmental noise [7, 14]:
where ξ(t) is Gaussian white noise. This model has three parameters: τ has the dimension of a time, and determines the time-scale of relaxation to stationarity, K would be the carrying capacity in the absence of noise, and σ measures the intensity of the environmental noise. If σ < 2, the model has a stationary distribution which is the Gamma distribution in Eq (1). The mean of the stationary distribution is and the variance . The coefficient of variation depends only on the parameter σ, which can thus be interpreted as the amplitude of the fluctuations. Grilli (2020) showed that Taylor’s Law applies to abundance fluctuations with exponent 2, that is, var(λ) = C ⋅ 〈λ〉2 with C a constant, which implies that σ is not correlated with K.
Estimation of K and σ
The parameters K and σ can be estimated from the mean and variance of the abundance time-series, inverting the expressions for the mean and variance of the stationary distribution. To estimate the variance of abundance from the sampled abundance, we need to use an expression corrected for the sampling bias. In fact, the variance of the sampled abundance is a result of the actual variance of abundance and of the variance due to the random sampling. We use the sampling-corrected estimate as done in [7]
We note that the variance estimated with this formula may result negative if many counts are 0 or 1. The OTUs for which this happens are excluded from the analysis, as it is not possible to estimate their parameters.
Estimate of ρ between two samples
We estimate the value of ρ by calculating the Spearman correlation of abundances between two samples. In our in-silico data, we found that the carrying capacity correlation ρ is in an approximate quadratic relationship with the Spearman correlation coefficient (see S1 Fig). By using this relationship found in the in-silico dataset, we can infer the empirical value ρ of a pair of data samples.
Beta-diversity measures
We consider six beta-diversity measures pairwise beta-diversity measures: i) Jaccard similarity, ii) Sørensen index, iii) Whittaker index, iv) Bray-Curtis disimilarity, v) Horn similarty, vi) Morisita-Horn similarity. We also included the Dissimilarity and Overlap introduced in [10]. The choice of these eight different measures is aimed at covering different types of measures, accounting for presence-absence, abundance, or both, and more or less focused on common species rather than rare ones.Let n and n be the OTU counts in two samples A and B, with total number of reads, respectively, N and N. Let x = n/N be the sampled relative abundances. Let S and S be the sets of OTUs observed, respectively, in sample A and B, S ∪ S the set of all OTUs observed and S ∩ S the set of OTUs observed in both samples. Then, the beta-diversity measures are defined as follows:
Jaccard similarity index
where |⋅| denotes the number of element of a set. In particular |S ∩ S| is the number of species present in both sets, while |S ∪ S| is the total number of species (equivalent to γ-diversity). The Jaccard similarity index only accounts for presence-absence, disregarding abundance. As such, it is very sensitive to differences in rare OTUs, for which a small abundance difference could cause an OTU to go undetected in a sample but not in the other.
Sørensen similarity index
where |⋅| denotes the number of element of a set. Similarly to the Jaccard similarity index, Sørensen index only accounts for presence-absence, disregarding abundance.
Whittaker similarity index
where |⋅| denotes the number of element of a set. The Whittaker index corresponds to the ration of γ and α-diversity.
Effective Whittaker similarity
The Effective Whittaker index is defined as the ratio of γ and α-diversity when they are estimated using Shannon index
where and .
Morisita-Horn similarity
This index includes also OTUs present in only one sample (they are counted in the denominator), but is overly sensitive to common OTUs, due to the quadratic dependence on abundance.
Horn similarity
where .
Bray-Curtis dissimilarity
Similarly to the Morisita-Horn index, it includes also OTUs present in only one sample but is more sensitive to common ones.
Dissimilarity
The dissimilarity introduced in [10] is defined on the abundances and , normalized on the set of OTUs common to the two samples: . The dissimilarity is then computed as the root Jensen–Shannon divergence (rJSD) of and :
where and is the Kullback-Leibler divergence between x and y.This Dissimilarity measure accounts only for the differences in abundances of OTUs common to the two samples. Additionally, the measure is dominated by common OTUs, due to the x factor in the Kullback-Leibler divergence.
Overlap
The overlap is the average across the two samples of the fraction of the total reads that come from OTUs observed in both samples:
This measure accounts both for presence-absence and for abundance. In fact, two samples have a large overlap if most of the OTUs are present in both and those that are present in only one have small abundance.
Supplementary sections and tables.
OTU selection and fitting a truncated log-normal distribution and estimating the total number of species. Tables with coefficient of determination R2 and correlations between estimated values of K.(PDF)Click here for additional data file.
Relationship between Spearman correlation and carrying capacity correlation ρ in model data.
Black points are averages of simulated data (point range corresponds to 2 standard deviations). The black curve is obtained by a quadratic fit to the data, resulting in ρ ∼ 0.92 + 0.34s − 0.48s2, where s is the Spearman correlation.(PDF)Click here for additional data file.3 Jan 2022Dear Dr Grilli,Thank you very much for submitting your manuscript "The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communities" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.Both reviewers highlight the potential interest in the results, but also raise a number of questions about technical aspects and the presentation of this work. Most of these points seem valid and constructive in making the manuscript more accessible to experts in the field and a broader audience. If you decide to submit a revised version, it is important that you address all of the reviewers' concerns. As suggested by reviewer 1, it would be helpful to shorten the introduction and make the focus of this work clearer.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Tobias BollenbachAssociate EditorPLOS Computational BiologyJames O'DwyerDeputy EditorPLOS Computational Biology***********************Both reviewers highlight the potential interest in the results, but also raise a number of questions about technical aspects and the presentation of this work. Most of these points seem valid and constructive in making the manuscript more accessible to experts in the field and a broader audience. If you decide to submit a revised version, it is important that you address all of the reviewers' concerns. As suggested by reviewer 1, it would be helpful to shorten the introduction and make the focus of this work clearer.Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: In their manuscript “The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communties” the authors state that with a rather simple null-modell the change of beta diversity as a function of simple parameters of microbial communities can be predicted. The addressed question is of great relevance in (microbial) ecology. Despite the common usage of beta diversity metrics I agree with the authors that we should better understand where these values actually come from and what impacts them. The text is overall well written and the results are clearly presented. It is always great to find simple rules in complex systems because it may point towards finding something of general importance. However, sometimes you just don’t have a real scientific finding but just a general mathematical property. It remains for me a little unclear which of both is the case in the given manuscript.Despite being familiar with theoretical modeling I am not a full theoretician myself and several of my raised points might be caused by limited understanding on my site.Major issues:1) My major issue with this work is that it remains a little unclear what are “real” findings at all and what are just general mathematical properties without a clear physical statement. The authors state that upon increasing the correlation between the randomly chose Ks the similarity in the obtained communities increases. It seems true for me for every function that as one plugs in more similar values also the outcomes is more similar. So I wonder what the specific modeling approach really tells me beyond this? Is the major statement that it fits the data quantitatively well? But then I feel there should be also more discussion how well the data quantitatively fits the measurements.2) It remains a little unclear how this approach deals with potential regime shifts or more general deviations from steady states. In the introduction it is stated that the model can deal with them but it is not clear to me how. Moreover, regime shifts are removed from the data (infections). So in how far does the model assume that the system stays around a steady state? It also remains a little unclear if the K and sigma are obtained for the whole time series or just for parts of it (Fig 1C). How was the time over which K and sigma where calculated chosen? Shouldn’t the overall length of the timeseries have an impact on these values?3) As stated by the authors sequencing data does not result in absolute abundances and therefore does not give the carrying capacities. Within a sample the measured steady state populations densities (reads) may be in the same way proportional to K but across samples this is not true. Accordingly I wonder how K from different samples can be compared (Fig1A)?4) The authors assume that two communities from the same environment have the same sigma, but different values of K. I don’t understand this. Aren’t K and sigma estimated for all the OTUs within a community? What does K and sigma of a community then mean? Average across all OTUs? Why is K different and sigma constant across communities?5) For the sampling two correlated values of K are drawn from a Gaussian distribution with a certain correlation. It remains for me a little unclear in how far this correlation argument is justified. Data is just shown for the correlation between two individuals in Fig 1D. That seems quite little data to make an assumption that is so central for the paper.6) The authors state that their model explains the overlap-dissimilarity connection that can often be found but doesn’t have to be there theoretically. However, isn’t that simply a result of a detection threshold? If there is a detection threshold than for highly correlated K the two sampled species are either both above or below the threshold and thus either appear together or not with similar population densities e.g. high overlap and low dissimilarity. If K is less correlated the dissimilarity will increase because population densities become more different and it is also more likely that just one of the two falls below the detection threshold. Is there a way to better understand where this overlap-dissimilarity connection comes from?Minor issues:.. are present at a given locations → locationsI think it would be very useful to give line number for a manuscript otherwise it is rather difficult to refer to specific sentences.What exactly is plotted in Figure 1? Are these binned histograms? Why is P(K) reaching around 1 at the maximum? Shouldn’t it be much smaller than that given that the integral over the plot should equal 1?In Fig2: 100 values have the same K for both samples. Shouldn’t that result in a massive black blob at correlation=1 ?I feel the text – especially the introduction – could be more concise and to the point. A lot of ideas seem to be introduced that seems not so central to the actual work. One has to read quite some text to find out what the goal of the paper actually. It’s of course the authors choice what they write, but I feel a little more focus would help.In the supplementary figures are several times questionsmarks where probably should be numbers.Reviewer #2: In the manuscript entitled “The stochastic logistic model with correlated carrying capacities reproduces beta diversity metrics of microbial communities”, Zaoli and Grilli build the null model for microbial communities using the stochastic logistic model and analyze some metrics of beta diversity. Their null model can show some patterns of beta diversity in empirical studies such as Overlap-Dissimilarity relations.I think Result D and Figure 4 (i.e., the relationship between Overlap and Dissimilarity) are very interesting. However, I think the rests of the results need to be clarified and revised.Major comments1 In the current draft, it is unclear why the authors call their model the null model. I guess because the model does not include any species interactions and any other ecological properties such as migration, but please explain clearly what the authors mean by null in Introduction.2 Result A and Figure 12.1 The current figures 1 a and b do not explain what the color dots and lines represent and how they are calculated. This point prevents my understanding in section Result A.2.2 In the main text, the authors argue that “as shown in Fig.1 C, the estimation of K … are strongly correlated” (the first paragraph of Result A) and that “however the correlation is lower than the temporal one (Fig 1D)” (the second paragraph of Result B), they do not show any correlation coefficients. I think that they should show for example Pearson or Spearman correlations.3 Result B: I have some concerns in this section.3.1 First, the section title seems misleading. In this section, the authors show the scatter plots of Pearson correlations of species abundances between two environments and other metrics of beta diversity obtained from their null model. The authors do not predict anything here.3.2 Second, the choice of the metrics of beta diversity seems problematic. Although some measures (e.g., Bray-Cruits dissimilarity and Jaccard similarity) are widely used in (microbial) ecology, I have never seen the studies using Pearson correlation as beta diversity (see also the next comment). In addition, there are many metrics of beta diversity that the authors do not analyze but are widely used (e.g., Sørensen index, the ratio of gamma diversity to alpha diversity, or gamma diversity minus alpha diversity). See, for example, Beck et al (2013) https://onlinelibrary.wiley.com/doi/10.1111/2041-210x.12023, Tuomisto (2010), https://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2009.06148.x, or Barwell et al (2015), which the authors cite (ref. 24). The authors need to reconsider the choices of the metrics and/or to justify them (e.g., choosing metrics that are widely used in recent studies).3.3 Related to the above two points, I am not sure whether the x-axes of figure 2 (and figure 3) should be Pearson correlation of species abundances. If the authors want to predict something, I think they should plot the (each measure of) beta diversity in the empirical data against the beta diversity predicted by the null model tuning parameter rho_K. As I do not think Pearson correlation is used as beta diversity, figure 2 (and figure 3) does not make sense to me.4 In addition to the above comment, I have a concern about figure 3 and the sentence “The relationship predicted by the null model follows in a remarkably precise manner the empirical patterns” in the first paragraph of section Result C. Of course, the pattern of the mean values in the null model (red dots in figure 3) seems similar to the patterns of empirical data. However, if we compare the exact points from the null model (i.e., figure 2) with the empirical ones (figure 3), I am not sure whether the distributions of dots in the null model and empirical data are similar. It seems to me that the distributions of the dots at least in panels B, C, and D differ between figures 2 and 3. The authors need to quantify the similarities between the null model and the empirical data.Minor comments1. Figure legend of 1a. I recommend that the authors add the explanation on the dashed line. Although it is explained in the main text, the figure legend should be (ideally) clear without reading the main text. In addition, the figure legend should explain what the colored dots and lines represent in panels A and B.2. I think there is a typo in r.h.s. of equation (2): 1/(tau)*lambda*(1- lambda/K)+…3. In Method D and Result B, adding the indexes to the beta diversity measurements could make the first sentence clearer. I mean (i)Pearson correlation, (ii) Jaccard similarity index, (iii) Morisita-Horn dissimilarity, (iv)Brays-Cruits dissimilarity…. In addition, Result B says the authors analyze seven metrics while Method D says they analyze six measures. This is inconsistent.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols8 Mar 2022Submitted filename: ResponseToReviewers.pdfClick here for additional data file.21 Mar 2022Dear Dr Grilli,We are pleased to inform you that your manuscript 'The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communities' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Tobias BollenbachAssociate EditorPLOS Computational BiologyJames O'DwyerDeputy EditorPLOS Computational Biology***********************************************************Both reviewers appreciate the improvements made in the revision and support publication. Please consider all remaining suggestions they made when preparing the final version of the manuscript.Reviewer's Responses to QuestionsComments to the Authors:Reviewer #1: The authors clarified my concerns and from my side the work can be published. There are a few smaller comments which the the authors should take as suggestions.It took me some time to understand Fig.1 . I think It would help to write a little clearer what the figure shows and especially what the different colors, lines and dots mean. I assume that is stated in the SciAdv publications, but maybe would be nice to have the information here as well?Fig 4: sampled in both → maybe add word communities? It was a little unclear at first what sampled in both/just one/none means.line 158: as → hasCongratulations for this work!Reviewer #2: I really appreciate the effort the authors made to revise this manuscript. My previous comments were mainly about Results B-C and Figures 2 and 3. These are now clearer than before and I am satisfied with the responses from the authors. I have one very minor comment.Line 217-218: the authors write "In SI Table S1, we report the statistical significance of the deviation of the data respect to the model", but this sentence seems strange to me. I do not think that the authors should mention the statistical significance here. Many researchers use statistical significance based on p-values calculated by certain statistical tests (e.g., t-test). On the other hand, Table S1 shows R^2, a goodness of fit of a linear model, and large R^2 does not necessarily mean small p-values for coefficients of the focal linear model. I do not think that the sentence from line 217 to 218 is needed: reporting R^2 is sufficient to argue that the model predicts the metrics of beta diversity in natural communities.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #2: No28 Mar 2022PCOMPBIOL-D-21-02238R1The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communitiesDear Dr Grilli,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Zsofia FreundPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Ashley Shade; Robert R Dunn; Shane A Blowes; Petr Keil; Brendan J M Bohannan; Martina Herrmann; Kirsten Küsel; Jay T Lennon; Nathan J Sanders; David Storch; Jonathan Chase Journal: Trends Ecol Evol Date: 2018-09-09 Impact factor: 17.712
Authors: Stilianos Louca; Saulo M S Jacques; Aliny P F Pires; Juliana S Leal; Diane S Srivastava; Laura Wegener Parfrey; Vinicius F Farjalla; Michael Doebeli Journal: Nat Ecol Evol Date: 2016-12-05 Impact factor: 15.460
Authors: M Poyet; M Groussin; S M Gibbons; J Avila-Pacheco; X Jiang; S M Kearney; A R Perrotta; B Berdy; S Zhao; T D Lieberman; P K Swanson; M Smith; S Roesemann; J E Alexander; S A Rich; J Livny; H Vlamakis; C Clish; K Bullock; A Deik; J Scott; K A Pierce; R J Xavier; E J Alm Journal: Nat Med Date: 2019-09-02 Impact factor: 53.440
Authors: Brian W Ji; Ravi U Sheth; Purushottam D Dixit; Konstantine Tchourine; Dennis Vitkup Journal: Nat Microbiol Date: 2020-04-13 Impact factor: 17.745
Authors: J Gregory Caporaso; Christian L Lauber; Elizabeth K Costello; Donna Berg-Lyons; Antonio Gonzalez; Jesse Stombaugh; Dan Knights; Pawel Gajer; Jacques Ravel; Noah Fierer; Jeffrey I Gordon; Rob Knight Journal: Genome Biol Date: 2011 Impact factor: 13.583
Authors: Lawrence A David; Arne C Materna; Jonathan Friedman; Maria I Campos-Baptista; Matthew C Blackburn; Allison Perrotta; Susan E Erdman; Eric J Alm Journal: Genome Biol Date: 2014 Impact factor: 13.583
Authors: Luke R Thompson; Jon G Sanders; Daniel McDonald; Amnon Amir; Joshua Ladau; Kenneth J Locey; Robert J Prill; Anupriya Tripathi; Sean M Gibbons; Gail Ackermann; Jose A Navas-Molina; Stefan Janssen; Evguenia Kopylova; Yoshiki Vázquez-Baeza; Antonio González; James T Morton; Siavash Mirarab; Zhenjiang Zech Xu; Lingjing Jiang; Mohamed F Haroon; Jad Kanbar; Qiyun Zhu; Se Jin Song; Tomasz Kosciolek; Nicholas A Bokulich; Joshua Lefler; Colin J Brislawn; Gregory Humphrey; Sarah M Owens; Jarrad Hampton-Marcell; Donna Berg-Lyons; Valerie McKenzie; Noah Fierer; Jed A Fuhrman; Aaron Clauset; Rick L Stevens; Ashley Shade; Katherine S Pollard; Kelly D Goodwin; Janet K Jansson; Jack A Gilbert; Rob Knight Journal: Nature Date: 2017-11-01 Impact factor: 49.962