| Literature DB >> 30787410 |
Laurence M Shaw1,2, Adam Blanchard3,4, Qinglin Chen5, Xinli An5, Peers Davies3,6, Sabine Tötemeyer3, Yong-Guan Zhu5, Dov J Stekel7.
Abstract
High throughput genomics technologies are applied widely to microbiomes in humans, animals, soil and water, to detect changes in bacterial communities or the genes they carry, between different environments or treatments. We describe a method to test the statistical significance of differences in bacterial population or gene composition, applicable to metagenomic or quantitative polymerase chain reaction data. Our method goes beyond previous published work in being universally most powerful, thus better able to detect statistically significant differences, and through being more reliable for smaller sample sizes. It can also be used for experimental design, to estimate how many samples to use in future experiments, again with the advantage of being universally most powerful. We present three example analyses in the area of antimicrobial resistance. The first is to published data on bacterial communities and antimicrobial resistance genes (ARGs) in the environment; we show that there are significant changes in both ARG and community composition. The second is to new data on seasonality in bacterial communities and ARGs in hooves from four sheep. While the observed differences are not significant, we show that a minimum group size of eight sheep would provide sufficient power to observe significance of similar changes in further experiments. The third is to published data on bacterial communities surrounding rice crops. This is a much larger data set and is used to verify the new method. Our method has broad uses for statistical testing and experimental design in research on changing microbiomes, including studies on antimicrobial resistance.Entities:
Year: 2019 PMID: 30787410 PMCID: PMC6382752 DOI: 10.1038/s41598-019-38873-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Compositional data of (a) ARGs by class and (b) bacterial taxa by phyla for each of the soil samples; similarly (c) ARGs by class for each of the sheep and (d) bacterial taxa by phyla. By visual inspection, there appear to be differences between ARGs between the different soil treatments but it us not clear whether the bacterial communities are different. In (c and d) there is greater variation between individuals and it is difficult to see whether there are differences between the seasons.
Figure 2Power estimates of the Dirichlet LRT test for repeated experiments of the sheep hooves experiment with different numbers of sheep.
Key notation used in the Methods section.
| Notation | Explanation |
|---|---|
|
| Number of classes in the multitype population being tested, labelled |
|
| Number of environments, labelled |
|
| Number of observations of populations in environment |
|
| Vector of Dirichlet distribution parameters for environment |
|
| Concatenation of the vectors |
|
| Proportion of individuals in observation |
|
| The set of all |
| Likelihood of a parameterisation | |
|
| Test statistic used in the likelihood ratio test. |
|
| Test statistic used in goodness-of-fit testing from observed data. |
|
| Test statistic used in goodness-of-fit testing with data from simulation |
|
| Parameterisation of |
Summary of results for data used in this paper with number of samples, n, and dimensionality, mK, of each dataset.
| Data | Randomization p-value |
|
| |
|---|---|---|---|---|
| Soil ARG | 1.03 × 10−125 | 0.0002 | 24 | 64 |
| Soil Bacteria | 1.32 × 10−19 | 0.0266 | 23 | 96 |
| Sheep ARG | 0.057 | 0.586 | 12 | 27 |
| Sheep Bacteria | 0.597 | 0.898 | 12 | 15 |
| Root Bacteria | 2.81 × 10−286 | NA | 354 | 42 |