| Literature DB >> 28049409 |
Xinyan Zhang1, Himel Mallick2,3, Zaixiang Tang4, Lei Zhang4, Xiangqin Cui1, Andrew K Benson5, Nengjun Yi6.
Abstract
BACKGROUND: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data.Entities:
Keywords: Correlated measures; Count data; Metagenomics; Microbiome; Negative binomial model; Penalized Quasi-likelihood; Random effects
Mesh:
Substances:
Year: 2017 PMID: 28049409 PMCID: PMC5209949 DOI: 10.1186/s12859-016-1441-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Microbiome Data Structure
| Feature 1 | Feature 2 |
| Feature | Total read | Host factors | Sample variables | |
|---|---|---|---|---|---|---|---|
| Sample 1 |
|
|
|
|
|
|
|
| Sample 2 |
|
|
|
|
|
|
|
| · | · | · | · | · | · | · | · |
| · | · | · | · | · | · | · | · |
| · | · | · | · | · | · | · | · |
| Sample |
|
|
|
|
|
|
|
Parameter Ranges in Simulation Studies
| Parameter | Range |
|---|---|
| log (Ti) + μ | Unif (0.1, 3.5) |
| Shape parameter θ | Unif (0.1, 5) |
| Fixed effect β | 0, Unif (0.2, 0.35), Unif (0.4, 0.55) |
| Standard deviation τ | Unif (0.5, 1) |
| Correlation ρ | Unif (−0.1, 0.1), Unif (0.5, 0.8), Unif (−0.8, −0.5) |
Fig. 1Type I error rates for the five methods in different simulation settings
Fig. 2Empirical powers for the five methods in different simulation settings
Fig. 3Differences between the estiamates and their simulated values for the parameters β, τ2, and θ in the proposed NBMM in different simulation settings. The points represent the average values and the lines represent the interval estimates
Fig. 4The analyses of NBMM and LMM with the arcsine square root transformation: minus log transformed p-values for the significant differentially abundant taxa at the 5% significance threshold between high fat diet and control diet groups for species, genus, family, order, and class levels