| Literature DB >> 34793531 |
Maryia Khomich1,2, Ingrid Måge3, Ida Rud1, Ingunn Berget3.
Abstract
The diet plays a major role in shaping gut microbiome composition and function in both humans and animals, and dietary intervention trials are often used to investigate and understand these effects. A plethora of statistical methods for analysing the differential abundance of microbial taxa exists, and new methods are constantly being developed, but there is a lack of benchmarking studies and clear consensus on the best multivariate statistical practices. This makes it hard for a biologist to decide which method to use. We compared the outcomes of generic multivariate ANOVA (ASCA and FFMANOVA) against statistical methods commonly used for community analyses (PERMANOVA and SIMPER) and methods designed for analysis of count data from high-throughput sequencing experiments (ALDEx2, ANCOM and DESeq2). The comparison is based on both simulated data and five published dietary intervention trials representing different subjects and study designs. We found that the methods testing differences at the community level were in agreement regarding both effect size and statistical significance. However, the methods that provided ranking and identification of differentially abundant operational taxonomic units (OTUs) gave incongruent results, implying that the choice of method is likely to influence the biological interpretations. The generic multivariate ANOVA tools have the flexibility needed for analysing multifactorial experiments and provide outputs at both the community and OTU levels; good performance in the simulation studies suggests that these statistical tools are also suitable for microbiome data sets.Entities:
Mesh:
Year: 2021 PMID: 34793531 PMCID: PMC8601541 DOI: 10.1371/journal.pone.0259973
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A diagram of statistical methods used in the study.
An overview of statistical methods and their properties.
| Method | Method name | Number of experimental factors allowed | Parametric | Multivariate | Univariate | Provides output at community level | Statistics for ranking OTUs | Reference |
|---|---|---|---|---|---|---|---|---|
|
| ANOVA-like differential expression tool for high-throughput sequencing data | any | yes | no | yes | no | p-values or effect sizes | [ |
|
| Analysis of composition of microbiomes | main factor + covariates | yes | no | yes | no | W-stat for the main variable | [ |
|
| Analysis of similarities | one | no | yes | no | yes | no | [ |
|
| ANOVA-simultaneous component analysis | any | yes | yes | no | yes | loadings or PLS-DA regression coefficients | [ |
|
| Differential gene expression analysis based on the negative binomial distribution | any | yes (GLM) | no | yes | no | p-values or effect sizes (coefficients) | [ |
|
| Fifty-fifty multivariate ANOVA | any | yes | yes | yes (rotation tests) | yes | p-values | [ |
|
| Permutational multivariate analysis of variance | any | no | yes | no | yes | no | [ |
|
| Similarity percentage | two-group comparison | no | yes | no | no | permutation p-values | [ |
ANOVA—analysis of variance; GLM—generalized linear model; OTU—operational taxonomic unit; PLS-DA—partial least squares discriminant analysis.
Fig 2Explained variance for simulated data and the relative number of simulations where the simulated effect was detected.
Numbers indicate the percentage of simulated data sets where p-value was significant.
Community-level method comparison across five experimental data sets.
| Factor/predictor | FFMANOVA | ASCA | PERMANOVA | ANOSIM | Factor/predictor | ||||
|---|---|---|---|---|---|---|---|---|---|
| Effect size (explained variance), % | p-value | Effect size (explained variance), % | p-value | Effect size (explained variance), % | p-value | Effect size (explained variance), % | p-value | ||
|
|
| ||||||||
|
|
| ||||||||
|
| 34.31 | < 0.001 | 37.39 | < 0.001 | 37.26 | 0.001 | 82.19 | 0.001 |
|
|
| 3.96 | < 0.001 | 4.14 | < 0.001 | 4.12 | 0.001 |
| ||
|
| 5.85 | < 0.001 | 5.99 | < 0.001 | 5.96 | 0.002 |
| ||
|
| 55.75 | 52.69 | 52.67 | 17.81 |
| ||||
|
|
| ||||||||
|
|
| ||||||||
|
| 27.31 | < 0.001 | 30.50 | < 0.001 | 30.85 | 0.001 | 99.11 | 0.001 |
|
|
| 12.18 | < 0.001 | 14.08 | < 0.001 | 13.99 | 0.001 |
| ||
|
| 7.93 | < 0.001 | 7.51 | < 0.001 | 7.47 | 0.002 |
| ||
|
| 52.31 | 47.74 | 47.70 | 0.89 |
| ||||
|
|
| ||||||||
|
|
| ||||||||
|
| 1.78 | < 0.001 | 1.99 | < 0.001 | 2.14 | 0.001 | 13.33 | 0.001 |
|
|
| 3.29 | < 0.001 | 3.45 | < 0.001 | 3.51 | 0.001 |
| ||
|
| 1.32 | < 0.001 | 1.41 | 0.006 | 1.36 | 0.009 |
| ||
|
| 26.53 | < 0.001 | 26.42 | 0.073 | 27.28 | 0.046 | 86.67 |
| |
|
| 66.13 | 65.71 | 65.71 |
| |||||
|
|
| ||||||||
|
|
| ||||||||
|
| 2.49 | 0.067 | 2.11 | 0.038 | 2.27 | 0.036 | 4.32 | 0.143 |
|
|
| 2.00 | 0.007 | 1.96 | 0.082 | 1.77 | 0.246 |
| ||
|
| 5.15 | 0.824 | 4.41 | 0.777 | 4.45 | 0.701 |
| ||
|
| 50.44 | < 0.001 | 54.23 | < 0.001 | 64.30 | 0.001 | 95.68 |
| |
|
| 31.23 | 27.25 | 27.21 |
| |||||
|
|
| ||||||||
|
|
| ||||||||
|
| 1.75 | < 0.001 | 1.27 | 0.107 | 1.27 | 0.132 | -0.03 | 0.998 |
|
|
| 69.38 | < 0.001 | 73.85 | 0 | 73.85 | 0.001 |
| ||
|
| 28.88 | 24.87 | 24.87 |
| |||||
Distance-based ANOSIM and PERMANOVA and abundance-based ASCA and FFMANOVA were compared with respect to effect sizes (expressed as percentage of explained variance) and corresponding p-values.
1based on the 50–50 F-test, 999 permutations.
2based on 999 permutations.
3based on a combined factor with no interaction in the model (limitation of ANOSIM).
clrcentred log-ratio transformed data as input.
Fig 3Sensitivity (True Positive Rate) for the four scenarios in the simulation study.
Fig 4Spearman’s correlation (Y-axis) calculated for pairwise comparison of statistical methods (X-axis) for (A) simulated data and (B) five experimental data sets.
Each point represents Spearman’s rank correlation coefficient between OTU ranking metrics from the two methods compared.
Fig 5Mean relative abundance (log-scale) plotted versus ANCOM W-stat.
(A) The “Few-Low” simulation scenario and (B) Birkeland data set, (C) “Many-Low” simulation scenario and (D) Moen data set. Red points (panels A and C) indicate differentially abundant OTUs.