| Literature DB >> 35533202 |
Alice J Sommer1,2,3, Annette Peters2,3,4, Martina Rommel3,5, Josef Cyrys3, Harald Grallert5,6, Dirk Haller7,8, Christian L Müller9,10,11, Marie-Abèle C Bind1,12.
Abstract
Statistical analysis of microbial genomic data within epidemiological cohort studies holds the promise to assess the influence of environmental exposures on both the host and the host-associated microbiome. However, the observational character of prospective cohort data and the intricate characteristics of microbiome data make it challenging to discover causal associations between environment and microbiome. Here, we introduce a causal inference framework based on the Rubin Causal Model that can help scientists to investigate such environment-host microbiome relationships, to capitalize on existing, possibly powerful, test statistics, and test plausible sharp null hypotheses. Using data from the German KORA cohort study, we illustrate our framework by designing two hypothetical randomized experiments with interventions of (i) air pollution reduction and (ii) smoking prevention. We study the effects of these interventions on the human gut microbiome by testing shifts in microbial diversity, changes in individual microbial abundances, and microbial network wiring between groups of matched subjects via randomization-based inference. In the smoking prevention scenario, we identify a small interconnected group of taxa worth further scrutiny, including Christensenellaceae and Ruminococcaceae genera, that have been previously associated with blood metabolite changes. These findings demonstrate that our framework may uncover potentially causal links between environmental exposure and the gut microbiome from observational data. We anticipate the present statistical framework to be a good starting point for further discoveries on the role of the gut microbiome in environmental health.Entities:
Mesh:
Year: 2022 PMID: 35533202 PMCID: PMC9129050 DOI: 10.1371/journal.pcbi.1010044
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 1The four stages of the causal inference framework [21] adapted to the exploration of environment-gut microbiome relationships.
Stage 1: Formulation of a plausible hypothetical intervention (e.g., decreasing inhaled environmental exposures) to examine its impacts on the gut microbiome. Stage 2: Construct a hypothetical paired-randomized experiment in which the environmental intervention been implemented randomly. Stage 3: Choose powerful test statistics comparing the gut microbiome had the subjects been hypothetically randomized to the environmental intervention vs. not and test the sharp null hypotheses of no effect of the intervention at different aggregation levels of the data. Stage 4: Interpretation of the statistical analyses and recommendations for future studies or implementation of the intervention.
Potential outcomes for the subjects of the hypothetical experiment.
|
| 1 | 2 | … | B | ||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| 1 |
|
|
|
| … | … |
|
|
| 2 |
|
|
|
| … | … |
|
|
| … | … | … | … | … | … | … | … | … |
| N |
|
|
|
| … | … |
|
|
Before and after matching number of units.
The thresholds for the air pollution experiment are based on 90 and 10 percentiles of the PM2.5 distribution.
| Air pollution | Smoking | |||
|---|---|---|---|---|
|
|
|
|
| |
|
| PM2.5 ≥ 13.0 | PM2.5 ≤ 10.3 | Smoker | Never smoker |
| Before | 206 | 193 | 302 | 908 |
| After | 99 | 99 | 271 | 271 |
Data transformation and choice of test statistics.
| analysis level | data transformation | test statistic |
|---|---|---|
| richness | breakaway [ | betta regression coefficient [ |
| DivNet [ | betta regression coefficient [ | |
| pairwise distance matrices | MiRKAT score statistic [ | |
| high-dimensional means | centered log ratios | mean abundance difference [ |
| abundance | normalization by ratio [ | LogFold mean difference |
| correlation | association matrices [ | differential associations [ |
Baseline characteristics of the study population in the air pollution reduction (left table) and smoking prevention experiments (right table).
Continuous variables: mean and standard deviation (St. d.). Categorical variables: number of samples per category (N) and proportion of category (%).
| Air pollution (PM2.5) | Smoking | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| ≥ 13.0 | ≤ 10.3 | Smoker | Never-Smoker | ||||||
| Mean | St. d. | Mean | St. d. | Mean | St. d. | Mean | St. d. | ||
| Age | 60.6 | 12.4 | 60.3 | 12.4 | 54.2 | 9.4 | 54.4 | 9.6 | |
| Body Mass Index | 27.0 | 4.3 | 27.0 | 3.8 | 26.7 | 4.4 | 26.7 | 4.2 | |
| Alcohol intake (g/day) | 11.3 | 14.1 | 11.5 | 13.9 | 13.0 | 15.6 | 11.6 | 14.3 | |
| Years of education | 11.9 | 2.6 | 11.7 | 2.8 | 11.7 | 2.3 | 11.8 | 2.2 | |
| N | % | N | % | N | % | N | % | ||
| Sex | F | 41 | 20.7 | 41 | 20.7 | 130 | 24.0 | 130 | 24.0 |
| M | 58 | 29.3 | 58 | 29.3 | 141 | 26.0 | 141 | 26.0 | |
| Smoking | Ex-S. | 27 | 13.6 | 27 | 13.6 | - | - | - | - |
| Never-S. | 62 | 31.3 | 62 | 31.3 | - | - | - | - | |
| Smoker | 10 | 5.1 | 10 | 5.1 | - | - | - | - | |
| Diabetes | No | 95 | 48.0 | 95 | 48.0 | 264 | 48.7 | 264 | 48.7 |
| Yes | 4 | 2.0 | 4 | 2.0 | 7 | 1.3 | 7 | 1.3 | |
| Phys. Activity | No | 36 | 18.2 | 36 | 18.2 | 125 | 23.1 | 125 | 23.1 |
| Yes | 63 | 31.8 | 63 | 31.8 | 146 | 26.9 | 146 | 26.9 | |
Fig 2Richness and α-diversity.
Boxplots (with median), values of the test-statistics from the betta regression [54], and one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design. (A) Boxplots of the richness. (B) Boxplots of the α-diversity.
β-diversity.
Microbiome Regression-based Kernel Association Test (MiRKAT), unadjusted and adjusted one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
| Air pollution | Smoking | |||||
|---|---|---|---|---|---|---|
|
| test-statistic | p-value | p-value | test-statistic | p-value | p-value |
| UniFrac | 12.1 | 0.0199 | 0.0506 | 61.5 | 0.0024 | 0.0070 |
| Aitchison | 82596.0 | 0.1096 | 0.2466 | 356921.5 | 0.0001 | 0.0003 |
| Jaccard | 19.4 | 0.0884 | 0.2043 | 84.5 | 0.0001 | 0.0003 |
| Gower | 0.2 | 0.0089 | 0.0250 | 0.1 | 0.0485 | 0.1204 |
Compositional equivalence test.
Test statistic for high-dimensional data suggested by [56] and one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
| ASV | Species | Genus | Family | Order | Class | Phylum | ||
|---|---|---|---|---|---|---|---|---|
|
| nb. of taxa (p) | 4,370 | 414 | 252 | 74 | 44 | 29 | 15 |
| test statistic | 12.8 | 12.9 | 11.9 | 8.8 | 8.4 | 8.4 | 8.1 | |
| p-value | 0.1451 | 0.0722 | 0.0733 | 0.1521 | 0.1161 | 0.1021 | 0.0591 | |
|
| nb. of taxa (p) | 7,409 | 479 | 271 | 81 | 48 | 31 | 16 |
| test statistic | 13.0 | 14.5 | 13.3 | 11.6 | 8.6 | 9.4 | 10.4 | |
| p-value | 0.1607 | 0.0302 | 0.0384 | 0.0279 | 0.0859 | 0.0440 | 0.0135 |
Fig 3Differential abundance.
For each genus, adjusted two-sided randomization-based p-values for 10,000 permutations of the smoking prevention intervention assignment following a matched-pair design. Genera with no tip point belong to the set of reference taxa. Black circled tip point: differentially abundant genus (Marvinbryantia) in the air pollution reduction experiment.
Fig 4Genus-genus associations of smokers and never-smokers (n = 271, p = 140).
(A) Visualization of the genus-genus partial correlations estimated with the SPIEC-EASI method. Edges thickness is proportional to partial correlation, and color to sign: red: negative partial correlation, green: positive partial correlation. Node size is proportional to the centered log ratio of the genus abundances, and color is according to phyla. Triangle shaped nodes are differentially abundant (see Fig 3). (B) Zoom in largest connected component and differential associations (bold genera).
Differential associations of genera.
Smallest five adjusted two-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
|
| |
| Genus-genus associations (-: disappearance after intervention) | p-value |
| 0.0661 | |
| 0.1063 | |
| 0.2795 | |
| 0.4147 | |
| 0.4753 | |
|
| |
| Genus-genus associations (-: disappearance after intervention) | p-value |
| 0.1585 | |
| 0.1585 | |
| 0.2031 | |
| 0.2376 | |
| 0.2492 | |
Fig 5Lipid metabolites exploration.
(A) Lipid metabolites correlation with selected genera from the smoking prevention experiment (green). (B) Scatterplots of high-density lipoprotein (HDL) cholesterol and triglycerides vs. centered log-ratio transformed relative abundances of the genera Ruminococcaceae-UCG-005 and Christensenellaceae-R-7-group.