Literature DB >> 30036363

A marginalized two-part Beta regression model for microbiome compositional data.

Haitao Chai1,2, Hongmei Jiang3, Lu Lin1, Lei Liu2,4.   

Abstract

In microbiome studies, an important goal is to detect differential abundance of microbes across clinical conditions and treatment options. However, the microbiome compositional data (quantified by relative abundance) are highly skewed, bounded in [0, 1), and often have many zeros. A two-part model is commonly used to separate zeros and positive values explicitly by two submodels: a logistic model for the probability of a specie being present in Part I, and a Beta regression model for the relative abundance conditional on the presence of the specie in Part II. However, the regression coefficients in Part II cannot provide a marginal (unconditional) interpretation of covariate effects on the microbial abundance, which is of great interest in many applications. In this paper, we propose a marginalized two-part Beta regression model which captures the zero-inflation and skewness of microbiome data and also allows investigators to examine covariate effects on the marginal (unconditional) mean. We demonstrate its practical performance using simulation studies and apply the model to a real metagenomic dataset on mouse skin microbiota. We find that under the proposed marginalized model, without loss in power, the likelihood ratio test performs better in controlling the type I error than those under conventional methods.

Entities:  

Mesh:

Year:  2018        PMID: 30036363      PMCID: PMC6072097          DOI: 10.1371/journal.pcbi.1006329

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


Introduction

In recent years, metagenomics studies have been growing rapidly due to the advances of next-generation sequencing (NGS) technologies [1]. Microbiota have been known to be associated with various diseases, e.g., obesity and diabetes [2, 3], Crohn’s disease [4], bacterial vaginosis [5], and cancer [6, 7]. The microbial abundance is usually measured in read counts. However, such quantities are not directly comparable across samples due to the uneven total sequence counts of samples. Therefore, the read counts are often normalized to relative abundances which sum to 1 for all microbes in a sample [8]. Relative abundance can be characterized by a point mass at zero and a right-skewed continuous distribution with a positive support, the so-called “semi-continuous” or “zero-inflated continuous” data. The zero values indicate that certain microbes are absent in the sample, or the rare microbes are present but missed due to undersampling, while the continuous distribution with a positive support describes the levels of relative abundance among the present microbes. The relative abundance is often described by a two-part model [9], which separates zeros and positive values explicitly by two submodels: a logistic model for the probability of the outcome being positive in Part I and a (generalized) linear regression model for the amount of the (transformed) positive value in Part II. An important issue in such two-part models is to determine the distributional form in Part II. The nonzero relative abundance data are non-normally distributed and bounded in [0, 1). Beta distribution has been used to model this outcome. A two-part Beta regression model can be thus developed [10-12]. It includes two sets of parameters, one in the logistic regression for the presence of a microbe, and the other in the Beta regression for the relative abundance conditional on the presence of the microbe. These two sets of parameters are interpreted as effects on the presence of a microbe and on the level of relative abundance given that the microbe is present, respectively. That is, there is a conditional interpretation in Part II. However, it is often of great interest to have a straightforward interpretation of covariate effects on the overall marginal (unconditional) mean. For example, [13] proposed a marginalized two-part log-normal model by parameterizing covariates effects directly in terms of the marginal mean. As conventional two-part Beta regression models do not provide an unconditional interpretation of covariate effects, we propose a marginalized two-part Beta regression model for microbiome abundance data which parameterizes covariate effects in terms of the marginal mean. The proposed model not only accounts for the zero-inflated nature of the microbiome data but also yields more interpretable effect estimates. Of note, an alternative to describe zero-inflated data is the Tobit model [14] where zero values are considered as left censored observations of the underlying true negative values (of Normal or other distributions accommodating negative values). However, the Tobit model is not appropriate for the Beta distribution which does not have a support of negative values. Consequently, the Tobit model cannot be applied directly to the relative abundance data.

Models

In the following Section, we will introduce the conventional two-part Beta regression model and the proposed marginalized two-part Beta regression model. We will also describe their properties to assess the overall impact of covariates on the marginal mean, and demonstrate that the proposed model outperforms the conventional model.

Two-part Beta regression model

We begin with the conventional two-part model with a Beta component in Part II [10-12]. For a given operational taxonomic unit (OTU), let Y denote its semi-continuous relative abundance for subject i, where 0 ≤ Y < 1 and i = 1, 2, …, n. Specifically, a two-part Beta regression model has the following form: where the density function of the Beta distribution is parameterized as with μ (0 < μ < 1) and ϕ (ϕ > 0) being the mean and dispersion parameters of the Beta distribution, respectively, and p is the probability that the observation Y is from the Beta distribution. The two-part model describes the probability p in the logistic component and the conditional mean in the Beta component as functions of covariates, where and are vectors of regression coefficients, = (1, x, …, x) is the (p + 1) dimensional covariate vector (including an intercept) for the i-th subject. We assume identical covariates for both parts of the model for simplicity of notation. One can instead allow for different sets of covariates for the two parts.

Marginalized two-part Beta regression model

To obtain interpretable covariate effects on the marginal mean, we propose the following marginalized two-part Beta regression model. Let v = E(Y) be the marginal mean of Y. The first part of the proposed marginalized two-part model is the same as Part I in the conventional two-part model, In Part II, the marginal (unconditional) mean v, instead of the conditional mean μ, is modeled as a function of covariates: As we can see, the marginalized two-part model not only captures zero-inflation and skewness as the conventional two-part model, but also allows us to examine covariate effects on the overall marginal mean. In the S1 Text, we can see that the likelihood of the conventional two-part model can be reparameterized to that of , and ϕ in the marginalized model. However, the interpretation of covariate effects are different in the two frameworks, which will be elaborated in the next subsection. The estimation of the marginalized two-part model can be carried out in SAS Proc NLMIXED (The main code is shown in S1 Code). To obtain starting values of the estimation, a logistic model and a Beta regression model are fitted for the binary part and the positive part, respectively. Then the estimates of these two models are used as starting values for the two-part marginalized model. The convergence of the estimation is determined by a threshold value 1 × 10−8 for the relative gradient, a common convergence criterion in SAS Proc NLMIXED. This criterion is satisfied in our simulations for all replicates, and in the real data analysis for all 131 OTUs.

Interpretation of covariate effects

For the conventional model

Using the conventional two-part model shown in Eqs (1) and (2), β is interpreted as the effect of a unit increase in the jth covariate on the logit of the conditional mean of Y given Y is positive. In many applications, however, the primary interest is to examine the impact of covariates on the overall marginal mean E(Y). For the conventional two-part model, we have Along the lines of [15], we can assess the effect of the j-th continuous covariate x on the unconditional mean as where with α and β being the coefficients corresponding to x in the conventional two-part model and , (−, and (− be the corresponding vectors with the j-th covariate removed. A straightforward calculation shows that (6) can be equivalently written as where As the logit transformation is a monotonically increasing function in the interval (0, 1), the hypothesis test of the covariate effects on the marginal mean is equivalent to that on its logit transformation. In Eq (7), the logit transformation of the marginal mean abundance is independent of covariate x if both α and β are zero. However, if α and β have opposite signs, even when they are not zero, the logit transformation of the marginal mean abundance may be still independent of covariate x. Furthermore, the coefficients c1(α, β) and c2(α, β) in Eq (7) are functions of α and β. Thus, the independence between the marginal mean and covariate x cannot be tested simply as the hypothesis of α = 0 and β = 0, e.g., by the likelihood ratio test. Instead, the Delta method has to be used on the hypothesis test of Eq (7), which depends on in a complicated way. When the interest is to assess the effect of a discrete variable on response, e.g., placebo vs. treatment, Eq (7) no longer applies. Without loss of generality, consider a binary covariate x taking value 0 or 1. Similar to [15], the difference in the logit transformation of the marginal mean with x = 1 vs. x = 0 is used to evaluate the impact on the expected marginal mean response. Under the conventional two-part model, the difference between the logit transformations with x = 1 and x = 0 is where It is worth noting that b1(α), b2(β), and b3(α, β) all equal to 0 if α and β are 0. Similar to the continuous covariate, the logit transformation of the marginal mean abundance does not depend on the binary covariate x if both α and β are zero. However, even though neither of the coefficients is zero, the transformed mean abundance may still be independent of the binary covariate x when α and β have opposite signs. Eq (8) indicates that the independence between the response and the binary covariate x cannot be ascertained by directly testing α = 0 and β = 0 by e.g., the likelihood ratio test, as shown in the simulation studies and the real data analysis.

For the marginalized model

In the marginalized two-part model Eqs (3) and (4), the effect of a continuous covariate x on the marginal mean E(Y) can be characterized by where γ is the coefficient corresponding to x in Eq (4). Thus, the effect of the covariate x on the marginal mean abundance is determined by its coefficient in the marginalized model. With the marginalized two-part model, we can estimate the coefficient γ as well as test the effect on the marginal mean. As for a binary covariate in the marginalized two-part model, the difference in logit transformation of the marginal mean with x = 1 vs. x = 0 can be expressed as One can see that the effect of a binary covariate x on the marginal mean abundance is determined by its coefficient γ in the marginalized two-part model. The logit transformation of the marginal mean abundance with x = 1 is bigger than that with x = 0 when γ is positive, and the reverse is true when γ is negative.

Results

In this section, simulation studies and real data analysis are presented to assess the performance of the proposed marginalized and the conventional two-part models. Results show that the proposed model outperforms the conventional model, which is consistent with the theoretical results.

Simulation studies

In this section, we conduct simulation studies to evaluate the finite-sample performance of the proposed marginalized two-part model. To test the effect of the covariate on the overall marginal mean E(Y), likelihood ratio tests (LRT) are performed and compared under the marginalized two-part (MTP) model and the conventional two-part (CTP) model. In addition, the two sample T-test and the Wilcoxon rank sum test are also compared. We assume that, in both parts, there is only one binary covariate x1, which is generated from the Bernoulli distribution with p = 0.5. However, according to the interpretation of the covariate effects in the preceding section, the proposed model can be applied to multiple covariates. The response y is generated below: where is the conditional mean given that y is positive and ϕ is the dispersion parameter of the Beta distribution. In the simulation studies, 1000 samples of sizes 200 and 400 are generated. We set the parameters as α0 = 1.5, γ0 = −2.5, and ϕ = 1, while α1 and γ1 may have different values according to which of the two criteria are under study: the type I error or the power. First, we evaluate the type I error for testing the null hypothesis H0: the binary covariate x. In the MTP model, this is equivalent to testing as shown in Eq (10). However, testing in the CTP model is not equivalent to testing H0. Specifically, according to Eq (8), even though neither of the coefficients is zero, the binary covariate x1 may still have no effect on the marginal mean. This means that the conventional model cannot control the type I error for testing H0 when both α1 and β1 are non-zero. The results are shown in Fig 1. Type I errors are calculated under two settings: α1 = 0, γ1 = 0 and α1 = 1, γ1 = 0. For each setting, two α-levels are considered: 0.01 and 0.05. As we can see from Fig 1, under the first setting (α1 = 0, γ1 = 0), all the methods control the type I error reasonably well. Under the setting α1 = 1, γ1 = 0, the LRT under the MTP and the T-test control the type I error well, while the LRT under the CTP and the Wilcoxon test cannot control the type I error, especially the LRT under the CTP model. Because in this setting, testing in the CTP model is not equivalent to testing the null hypothesis H0.
Fig 1

Type I errors of the four methods.

The results in the upper panels correspond to the setting α1 = 0, γ1 = 0 and the lower panels correspond to setting α1 = 1, γ1 = 0. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The dashed horizontal line in each panel represents the correct level. The results for sample size 400 can be found in S1 Fig in Supporting information.

Type I errors of the four methods.

The results in the upper panels correspond to the setting α1 = 0, γ1 = 0 and the lower panels correspond to setting α1 = 1, γ1 = 0. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The dashed horizontal line in each panel represents the correct level. The results for sample size 400 can be found in S1 Fig in Supporting information. The powers under two different settings, α1 = 0, γ1 = 1 and α1 = 1, γ1 = 1, are shown in Fig 2. As we can see, the LRT under the CTP and the MTP are the most powerful methods with the power close to 1 in all settings. The Wilcoxon test performs a little worse than the LRT while the T-test has the lowest power.
Fig 2

Powers of the four methods.

The upper panels show the powers corresponding to the setting α1 = 0, γ1 = 1 and the lower panels show the powers corresponding to the setting α1 = 1, γ1 = 1. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The powers for sample size 400 are shown in S2 Fig in Supporting information.

Powers of the four methods.

The upper panels show the powers corresponding to the setting α1 = 0, γ1 = 1 and the lower panels show the powers corresponding to the setting α1 = 1, γ1 = 1. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The powers for sample size 400 are shown in S2 Fig in Supporting information. We also estimate the coefficients in the MTP model under the setting α1 = 1, γ1 = 1. The results in Table 1 demonstrate that the biases are negligible and the coverage probabilities are acceptably close to the nominal level 0.95 for all the model parameters. In addition, we observe small differences between the empirical standard errors and our estimates. The mean squared errors for sample size 400 are smaller than those for sample size 200.
Table 1

Estimates of the coefficients in the marginalized two-part model under the setting α1 = 1, γ1 = 1.

Parametersample size = 200sample size = 400
EstSESEMCPMSEEstSESEMCPMSE
α0 = 1.51.53210.26950.26460.9550.07361.51530.19210.18540.9430.0375
α1 = 11.03450.48760.48030.9600.23871.00780.34570.33200.9470.1195
γ0 = -2.5-2.51040.16730.17270.9560.0281-2.50740.12100.12190.9490.0147
γ1 = 10.99620.17580.18030.9490.03091.00020.12530.12730.9570.0157
ϕ = 11.03230.13740.13310.9560.01991.01570.09290.09230.9540.0089

Est: mean of the parameter estimates;

SE: standard error of the parameter estimates;

SEM: sample mean of the standard error estimates;

CP: coverage probability of the corresponding 95% confidence interval.

Est: mean of the parameter estimates; SE: standard error of the parameter estimates; SEM: sample mean of the standard error estimates; CP: coverage probability of the corresponding 95% confidence interval. According to the simulation results, the LRT under the MTP model has the best performance: it controls the type I error reasonably well and also achieves the best power. The T-test has the similar performance in the error control while it is not as powerful as the LRT under the MTP model. The LRT under the CTP model is powerful, however, it fails to control the type I error. The Wilcoxon test has poor performances in both the error control and power than the LRT under the MTP model. To assess the robustness of the proposed method, we consider a setting where positive responses are generated from another distribution. First of all, the only covariate x is generated from the Uniform distribution on (0, 1), while the response y has the following distribution: where and the overall marginal mean v of the response is Instead of the Beta distribution, positive responses are generated from the Binomial distribution Bin(100, μ) and then divided by 100 to make them bounded in (0, 1). As in the previous simulation, we set . The probability of having exactly 0 success in 100 trials is (1 − μ)100, which is negligible with the proper choice of the parameters and . Thus almost all the zero values in this zero-inflated Binomial data are structural zeros. In this simulation study, 1000 samples of sizes 200 and 400 are generated. The parameters are set as α0 = 2, γ0 = −0.5, while α1 and γ1 may have different values in order to calculate the type I errors and the powers. The type I errors are calculated under two settings: α1 = 0, γ1 = 0 and α1 = 1, γ1 = 0. For each setting, two α-levels are considered: 0.01 and 0.05. As we can see from Fig 3, under both settings, the proposed marginalized model controls the type I error reasonably well. The conventional model controls the type I error under the setting α1 = 0, γ1 = 0 while it fails under the setting α1 = 1, γ1 = 0, similar to Fig 1.
Fig 3

Type I errors for the CTP model and the MTP model.

The results in the upper panels correspond to the setting α1 = 0, γ1 = 0 and the lower panels correspond to the setting α1 = 1, γ1 = 0. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The dashed horizontal line in each panel represents the correct α-level.

Type I errors for the CTP model and the MTP model.

The results in the upper panels correspond to the setting α1 = 0, γ1 = 0 and the lower panels correspond to the setting α1 = 1, γ1 = 0. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. The dashed horizontal line in each panel represents the correct α-level. As shown in Fig 4, both the marginalized model and the conventional model have power equal to 1 under all settings.
Fig 4

Powers for the CTP model and the MTP model.

The powers in the upper panels correspond to the setting α1 = 0, γ1 = 1 and the lower panels correspond to setting α1 = 1, γ1 = 1. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05.

Powers for the CTP model and the MTP model.

The powers in the upper panels correspond to the setting α1 = 0, γ1 = 1 and the lower panels correspond to setting α1 = 1, γ1 = 1. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for level 0.05. From the simulation studies we can conclude that the proposed marginalized two-part Beta regression model is powerful and control the type I error well. Also, it is robust against model misspecification.

Real data analysis

In this section, the proposed marginalized two-part model and the conventional two-part model are applied to a real metagenomic dataset on mouse skin microbiota to investigate the effects of immunization on the relative abundances of 131 core OTUs [16, 17]. The data are publicly available at https://www.nature.com/articles/ncomms3462#supplementary-information. In addition to the likelihood ratio tests under CTP and MTP, the T test and the Wilcoxon rank sum test are also included for comparison. All the tests are carried out with Bonferroni’s correction. The skin dataset contains the relative abundances of the most common 131 OTUs for 261 mouse skin samples, including 78 non-immunized and 183 immunized individuals. There is a presence of a large portion of zero abundances in the skin data, ranging from 0 to 68.97% with average 33.03% and median 33.72% (see S3 and S4 Figs). The positive values are highly right skewed and the logit transformations in the MTP model and the CTP model capture the skewness (See S5 Fig). Fig 5 shows the results for these four methods. As we can see, the LRT under the marginalized two-part model results in significant effects of immunization on 45 (namely, 31 + 14) OTUs. The LRT under the conventional two-part model has significant results for all these 45 OTUs, and 14 (namely, 8 + 4 + 2) additional OTUs. The T test identifies 31 of these 45 OTUs and another 7 (namely, 4 + 3) OTUs. Similar to the LRT under conventional two-part model, the Wilcoxon test identifies all these 45 OTUs and 21 (namely, 9 + 8 + 4) additional OTUs. Finally, 60 OTUs are not identified by any methods.
Fig 5

Venn diagram for the OTUs.

Among all the 131 OTUs, 60 OTUs are not identified by any methods and the other 71 OTUs are identified by at least one method. For example, “31” in the intersection of all sets indicates that 31 OTUs are identified by all methods; while “4” located in the intersection of three sets, indicates that 4 OTUs are identified by three methods, namely, the T test, the CTP model, and the Wilcoxon test.

Venn diagram for the OTUs.

Among all the 131 OTUs, 60 OTUs are not identified by any methods and the other 71 OTUs are identified by at least one method. For example, “31” in the intersection of all sets indicates that 31 OTUs are identified by all methods; while “4” located in the intersection of three sets, indicates that 4 OTUs are identified by three methods, namely, the T test, the CTP model, and the Wilcoxon test. The LRT under the CTP model and the Wilcoxon test identify more OTUs than the LRT under the MTP due to their failure to control the type I error as shown in Simulation studies (Fig 1). Actually, for those 14 OTUs identified by the CTP but not by the MTP, all of them have significant coefficients in Part I of the two-part model. Out of the 21 OTUs that are identified by the Wilcoxon test but not by the MTP, 17 have significant coefficients in Part I of the two-part model. This corresponds to the setting α1 = 1, γ1 = 0 where both the CTP and the Wilcoxon test have much higher type I errors than the MTP (See the lower panel of Fig 1). Because it is less powerful than the MTP (Fig 2), the T test identifies less OTUs than the MTP. Table 2 shows 10 most significant OTUs from the MTP model. As in [17], for OTUs which cannot be classified at the species level, the next highest classifiable taxonomic level (denoted by ‘o’, ‘f’ and ‘g’ for order, family, and genus, respectively) is displayed. We use a number in the superscript to distinguish among different OTUs with the same classification name. The detailed results of all the 45 OTUs identified by the proposed MTP model are shown in S1 Table.
Table 2

Top 10 OTUs identified by the MTP model.

RankIDSpeciesEstSEp Value
1237040g_Alicyclobacillus2.32620.3148< 1E-16
2101810g_Helicobacter1-1.39340.1460< 1E-16
352884g_Helicobacter21.90910.28754.88E-15
4N10167o_Bacteroidales12.57610.42348.33E-15
5381715f_Ruminococcaceae2.76740.47776.82E-14
6269548g_Helicobacter31.23570.18333.57E-13
7N26397Acetobacter aceti1.88410.30554.86E-13
8294146Acetobacterorleanensis2.06620.33875.66E-13
9N2007Acinetobacterlwoffii2.55490.44981.19E-12
10N8891g_Mucispirillum1.58680.26791.53E-11

Est: estimation of the coefficient of treatment in the second submodel;

SE: standard error of the coefficient of treatment in the second submodel.

Est: estimation of the coefficient of treatment in the second submodel; SE: standard error of the coefficient of treatment in the second submodel. Moreover, for most of the 131 OTUs, the proposed marginalized two-part model fits the observed data better than the conventional two-part model. Fig 6 shows the density curves of the observed relative abundances, the predicted relative abundances using the MTP model, and the predicted relative abundances using the CTP model for two OTUs. As we can see, the MTP model fits the observed data much better than the CTP model.
Fig 6

Density curves for two OTUs.

The blue curve shows the density of the observed data. The green curve shows the density of predictions from the MTP model while the red curve represents the density of predictions from the CTP model.

Density curves for two OTUs.

The blue curve shows the density of the observed data. The green curve shows the density of predictions from the MTP model while the red curve represents the density of predictions from the CTP model.

Discussion

In this paper, we propose a marginalized two-part Beta regression model for semi-continuous microbiome compositional data. This model allows investigators to obtain covariate effects on the marginal mean of the outcome. It takes into account the compositional and zero-inflation nature of the microbiome relative abundance data. It also has an unconditional interpretation of the covariate effect on the marginal mean. Our proposed marginalized two-part model has satisfactory performance in both simulation studies and real data analysis. For count outcomes exhibiting many zeros, a zero-inflated Poisson (ZIP) regression model or a zero-inflated negative binomial (ZINB) model, is often employed to examine the relation between covariates and the response. To model the overall population mean count directly, the marginalized ZIP model and the marginalized ZINB model were proposed by [18] and [19], respectively. However, in the case of bounded count data, the ZIP is questionable while the zero-inflated binomial (ZIB) model and its extension for over-dispersion: the zero-inflated beta-binomial (ZIBB) model, are available in [20-22]. It is of interest to develop a marginalized modeling approach for ZIB or ZIBB. More recently, there has been increasing interest in analyzing correlated zero-inflated semi-continuous data. The correlation may stem from the structure of clustered data or from longitudinal data where repeated measures are correlated for the same subject. Typically, random effects are included to account for the correlations between observations [10, 15, 23–25]. However, similar limitation exists in these two-part random effects models, as they cannot account for covariate effects on the marginal mean. Recently, Smith et al. [26] proposed a marginalized two-part model for longitudinal semicontinuous data based on the log-skew normal distribution for positive values. In future studies, we will extend our marginalized two-part model to correlated semi-continuous data bounded by 0 and 1. Finally, it is of interest to consider different microbiomes together, taking into account the constraint that the relative abundances of all OTUs sum to 1. Scealy and Welsh [27, 28] considered Kent models for such compositional data. It merits further consideration to incorporate zero values in the Kent model framework.

Likelihood derivation.

(PDF) Click here for additional data file.

SAS code.

The main SAS codes for the conventional two-part model and the proposed marginalized two-part model are shown in this section. (PDF) Click here for additional data file.

Type I errors for the sample size 400.

This figure shows the type I errors of the four methods for sample size 400. The results in the upper panels correspond to the setting α1 = 0, γ1 = 0 and the lower panels correspond to setting α1 = 1, γ1 = 0. In each setting, the left panel shows the results for significance level 0.01 and the right panel shows the results for significance level 0.05. The dashed horizontal line in each panel represents the significance level. (TIF) Click here for additional data file.

Powers for the sample size 400.

This figure shows the powers of the four methods for sample size 400. The upper panel contains the power corresponding to the setting α1 = 0, γ1 = 1 and the lower panel shows the power corresponding to the setting α1 = 1, γ1 = 1. In each setting, the left figure shows the results for significance level and the right panel shows the results for significance level 0.05. (TIF) Click here for additional data file.

Zero-inflation of the skin data.

The figure shows the distributions of relative abundances of 6 OTUs. From the upper panel to the lower panel and from the left to the right, the proportions of zero values for these 6 OTUs are 0.77%, 3.45%, 4.97%, 14.18%, 29.89%, and 48.28%, respectively. (TIF) Click here for additional data file.

The figure shows the percentages of zero abundance in the 261 mouse skin samples for all 131 core OTUs.

The lower quartile and the upper quartile of the percentages are 20.11% and 48.28%, respectively. (TIF) Click here for additional data file.

Skewness of the skin data.

The figure shows the histogram of the relative abundance for 6 OTUs. The first one in every panel is the histogram of the OTU in the original scale, while the second one in every panel shows the histogram after logit transformation. (TIF) Click here for additional data file.

The detailed results of the MTP model.

The table shows the detailed results of all the 45 OTUs that are identified by the proposed MTP model. (PDF) Click here for additional data file.
  22 in total

1.  Analysis of repeated measures data with clumping at zero.

Authors:  Janet A Tooze; Gary K Grunwald; Richard H Jones
Journal:  Stat Methods Med Res       Date:  2002-08       Impact factor: 3.021

2.  Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study.

Authors:  Lei Liu; Robert L Strawderman; Bankole A Johnson; John M O'Quigley
Journal:  Stat Methods Med Res       Date:  2012-04-02       Impact factor: 3.021

3.  The future of microbial metagenomics (or is ignorance bliss?).

Authors:  Jack A Gilbert; Folker Meyer; Mark J Bailey
Journal:  ISME J       Date:  2010-11-25       Impact factor: 10.302

4.  A marginalized two-part model for longitudinal semicontinuous data.

Authors:  Valerie A Smith; Brian Neelon; John S Preisser; Matthew L Maciejewski
Journal:  Stat Methods Med Res       Date:  2015-07-07       Impact factor: 3.021

5.  Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach by P. de Boeck and M. Wilson and Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models by A. Skrondal and S. Rabe-Hesketh.

Authors:  Jay Verkuilen
Journal:  Psychometrika       Date:  2006-06       Impact factor: 2.500

Review 6.  Analyzing the human microbiome: a "how to" guide for physicians.

Authors:  Andrea D Tyler; Michelle I Smith; Mark S Silverberg
Journal:  Am J Gastroenterol       Date:  2014-04-22       Impact factor: 10.864

7.  A two-part mixed-effects model for analyzing longitudinal microbiome compositional data.

Authors:  Eric Z Chen; Hongzhe Li
Journal:  Bioinformatics       Date:  2016-05-14       Impact factor: 6.937

Review 8.  Diabetes, obesity and gut microbiota.

Authors:  Amandine Everard; Patrice D Cani
Journal:  Best Pract Res Clin Gastroenterol       Date:  2013-02       Impact factor: 3.043

9.  Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria.

Authors:  Sujatha Srinivasan; Noah G Hoffman; Martin T Morgan; Frederick A Matsen; Tina L Fiedler; Robert W Hall; Frederick J Ross; Connor O McCoy; Roger Bumgarner; Jeanne M Marrazzo; David N Fredricks
Journal:  PLoS One       Date:  2012-06-18       Impact factor: 3.240

10.  Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering.

Authors:  Girish Srinivas; Steffen Möller; Jun Wang; Sven Künzel; Detlef Zillikens; John F Baines; Saleh M Ibrahim
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

View more
  6 in total

1.  A Bayesian framework for identifying consistent patterns of microbial abundance between body sites.

Authors:  Richard Meier; Jeffrey A Thompson; Mei Chung; Naisi Zhao; Karl T Kelsey; Dominique S Michaud; Devin C Koestler
Journal:  Stat Appl Genet Mol Biol       Date:  2019-11-08

2.  The Supragingival Biofilm in Early Childhood Caries: Clinical and Laboratory Protocols and Bioinformatics Pipelines Supporting Metagenomics, Metatranscriptomics, and Metabolomics Studies of the Oral Microbiome.

Authors:  Kimon Divaris; Dmitry Shungin; Adaris Rodríguez-Cortés; Patricia V Basta; Jeff Roach; Hunyong Cho; Di Wu; Andrea G Ferreira Zandoná; Jeannie Ginnis; Sivapriya Ramamoorthy; Jason M Kinchen; Jakub Kwintkiewicz; Natasha Butz; Apoena A Ribeiro; M Andrea Azcarate-Peril
Journal:  Methods Mol Biol       Date:  2019

3.  MODELING MICROBIAL ABUNDANCES AND DYSBIOSIS WITH BETA-BINOMIAL REGRESSION.

Authors:  Bryan D Martin; Daniela Witten; Amy D Willis
Journal:  Ann Appl Stat       Date:  2020-04-16       Impact factor: 2.083

4.  Mediation effect selection in high-dimensional and compositional microbiome data.

Authors:  Haixiang Zhang; Jun Chen; Yang Feng; Chan Wang; Huilin Li; Lei Liu
Journal:  Stat Med       Date:  2020-11-17       Impact factor: 2.373

5.  Testing for mediation effect with application to human microbiome data.

Authors:  Haixiang Zhang; Jun Chen; Zhigang Li; Lei Liu
Journal:  Stat Biosci       Date:  2019-07-27

6.  MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data.

Authors:  Quran Wu; James O'Malley; Susmita Datta; Raad Z Gharaibeh; Christian Jobin; Margaret R Karagas; Modupe O Coker; Anne G Hoen; Brock C Christensen; Juliette C Madan; Zhigang Li
Journal:  Genes (Basel)       Date:  2022-06-11       Impact factor: 4.141

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.