| Literature DB >> 35421994 |
Huijuan Zhou1,2,3, Kejun He3, Jun Chen4, Xianyang Zhang5.
Abstract
Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.Entities:
Keywords: Compositional effect; Differential abundance analysis; False discovery rate; Multiple testing
Mesh:
Year: 2022 PMID: 35421994 PMCID: PMC9012043 DOI: 10.1186/s13059-022-02655-5
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1Performance comparison (S0C0: log normal abundance distribution, a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the 95% confidence intervals (CIs) of the method LinDA and the dashed horizontal line indicates the target FDR level of 0.05
Fig. 2Performance comparison (S3C0: gamma abundance distribution, a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and the dashed horizontal line indicates the target FDR level of 0.05
Runtime (in second) comparison under different settings (R version 4.0.3 (2020-10-10); Platform: x86_64-pc-linux-gnu (64-bit); CPU: E5-2670 v2 @ 2.50GHz; Memory: 67.7 GB). The result is based on one simulation run. The“elapsed” from the R command system.time() was used
| S0C0 | S0C1 | S0C2 | |||||
|---|---|---|---|---|---|---|---|
| LinDA | ANCOM-BC | LinDA | ANCOM-BC | LinDA | ANCOM-BC | ||
| 21.835 | 22.057 | 64.519 | |||||
| 162.218 | 163.552 | 216.564 | |||||
| 184.972 | 162.611 | 599.985 | |||||
| 5135.393 | 5157.148 | 5506.353 | |||||
Characteristics of four real microbiome datasets. NORA represents new-onset untreated rheumatoid arthritis. The second and the third columns respectively list the number of taxa and sample size of each filtered dataset (prevalence ≥10%, library size ≥1000)
| CDI | 123 | 183 | CDI/Diarrhea control (94 v.s. 89) | |
| IBD | 579 | 81 | Crohn’s disease/Healthy (62 v.s. 19) | Antibiotic use (n/y, 48 + 19 v.s. 14 + 0) |
| RA | 438 | 72 | NORA/Healthy (44 v.s. 28) | |
| SMOKE | 209 | 132 | Smoke (n/y, 67 v.s. 65) | Female/Male (31 + 16 v.s. 36 + 49) |
Fig. 3Number of discoveries v.s. target FDR level (0.01–0.25) for the three real datasets
Fig. 4Overlaps of differential taxa with target FDR level of 0.1 for the four real datasets
Fig. 5Density of . The panels on the left and right correspond to σ=1 and σ∼IG(2,1) respectively, where IG denotes the inverse-gamma distribution. The red curve is the density of the standard normal distribution. The blue and green curves are the densities of with and , and and , respectively