Literature DB >> 36103548

Testing microbiome associations with survival times at both the community and individual taxon levels.

Yingtian Hu¹, Yunxiao Li¹, Glen A Satten², Yi-Juan Hu¹.

Abstract

BACKGROUND: Finding microbiome associations with possibly censored survival times is an important problem, especially as specific taxa could serve as biomarkers for disease prognosis or as targets for therapeutic interventions. The two existing methods for survival outcomes, MiRKAT-S and OMiSA, are restricted to testing associations at the community level and do not provide results at the individual taxon level. An ad hoc approach testing each taxon with a survival outcome using the Cox proportional hazard model may not perform well in the microbiome setting with sparse count data and small sample sizes.
METHODS: We have previously developed the linear decomposition model (LDM) for testing continuous or discrete outcomes that unifies community-level and taxon-level tests into one framework. Here we extend the LDM to test survival outcomes. We propose to use the Martingale residuals or the deviance residuals obtained from the Cox model as continuous covariates in the LDM. We further construct tests that combine the results of analyzing each set of residuals separately. Finally, we extend PERMANOVA, the most commonly used distance-based method for testing community-level hypotheses, to handle survival outcomes in a similar manner.
RESULTS: Using simulated data, we showed that the LDM-based tests preserved the false discovery rate for testing individual taxa and had good sensitivity. The LDM-based community-level tests and PERMANOVA-based tests had comparable or better power than MiRKAT-S and OMiSA. An analysis of data on the association of the gut microbiome and the time to acute graft-versus-host disease revealed several dozen associated taxa that would not have been achievable by any community-level test, as well as improved community-level tests by the LDM and PERMANOVA over those obtained using MiRKAT-S and OMiSA.
CONCLUSIONS: Unlike existing methods, our new methods are capable of discovering individual taxa that are associated with survival times, which could be of important use in clinical settings.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36103548 PMCID： PMC9512219 DOI： 10.1371/journal.pcbi.1010509

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.779

This is a PLOS Computational Biology Software paper.

Introduction

Advances in sequencing technologies for profiling human microbiomes have led to the discovery of numerous microbiome associations with clinical responses [1-3]. These successes suggest that microbial taxa may be useful as biomarkers for disease prognosis, or targets for therapeutic interventions [4]. For example, the miCARE study is attempting to find whether the gut microbiome can be used to predict colorectal cancer recurrence (Principal Investigator: Dr. Veronika Fedirko, personal communication). Like the miCARE study, studies conducted to establish these links would collect the subjects’ times to an event of interest (i.e., survival times) as the outcomes, some of which may have censored values. For the success of this research, finding microbiome associations with the survival outcomes only at the community level may be less important than finding associations with individual taxa (we use “taxon” generically to refer to any feature such as amplicon sequence variants or any other taxonomic or functional grouping of bacterial sequences). However, data from microbiome association studies can be difficult to analyze, because the taxa count data may have hundreds to thousands of taxa and 50–90% zero counts, and are typically highly overdispersed. In addition, there generally exist confounders, such as previous treatment history or current medications, that correlate with both the microbiome composition and the survival outcome and so must be properly adjusted for. Finally, the sample size in a microbiome association study is typically not large and the event rate may be low, especially for rare diseases such as cancers. Analysis methods that cannot account for these data complexities will typically not yield robust and clinically meaningful results. Two methods have been proposed specifically for testing association between the microbiome and survival outcomes: MiRKAT-S [5] and OMiSA [6]. Unfortunately, both methods are restricted to community-level (global) association tests. While OMiSA does allow testing pre-determined sets of taxa such as taxonomic classes, it requires each set to be comprised of multiple taxa. As a result, neither MiRKAT-S nor OMiSA can be used to find individual taxa that can act as biomarkers. A third, ad hoc, approach is to apply the Cox proportional hazard model [7] in a taxon-by-taxon manner [8, 9]. However, the performance of this approach has not been formally evaluated in the microbiome context, although it is known that small sample sizes and sparse count data may lead to inflated type I error when using the Cox model [10, 11]. Unfortunately, permutation-based inference, which might improve the performance of the ad hoc approach, is difficult for survival outcomes. We previously proposed the linear decomposition model (LDM) [12] for testing microbiome associations with continuous or categorical (including binary) outcomes, which not only performs the test at the community level but also at the individual taxon level with false discovery rate (FDR) control. Here, we extend the LDM to survival outcomes, in order to allow a unified test framework to test both community-level and taxon-specific associations for survival outcomes. The LDM is based on a linear model that regresses the microbial data at each taxon on the (confounding) covariates that we wish to adjust for and the outcome variable(s) that we wish to test. Inference is based on permutation to circumvent making parametric assumptions about the distribution of the taxon-level data. In addition, the LDM is highly versatile: it can analyze the taxon-level data at the relative abundance scale, the arcsin-root-transformed relative abundance scale (which is variance-stabilizing for Multinomial and Dirichlet-Multinomial count data) or any other transformation, as well as the presence-absence scale [13], and can also accommodate clustered samples [12, 14]. Our extension of the LDM was motivated by ideas developed in MiRKAT-S and OMiSA. Both of these tests first fit a Cox model to account for the relationship between any fixed covariates (excluding microbiome variables) and survival times. Then, using a random-effects framework, the variance-covariance matrix of the (Martingale) residuals from the Cox model are compared to a between-sample distance matrix calculated using the microbiome data; the similarity between these two matrices indicates the extent of association between the microbiome and the survival outcome. MiRKAT-S allows an arbitrary distance matrix, most commonly, the Bray-Curtis or Jaccard distance matrix. OMiSA extends MiRKAT-S by using a family of power transformations of the relative abundance data to weigh abundant and rare taxa differently but calculating the MiRKAT-S test statistic based on the Euclidean distance only. Our generalization of the LDM to survival outcomes is also based on obtaining residuals from the Cox model; however, we use these residuals as covariates in the LDM to directly assess the association between the microbiome and the survival outcome. In this way, we are able to use the LDM to test both community-level and taxon-level associations with a survival outcome. In a similar manner, we also extend PERMANOVA [15], the most commonly used method for testing microbiome associations, to handle survival outcomes, although the test is at the community level and distance-based like MiRKAT-S. The rest of this paper is organized as follows. In the Methods section, we first describe our tests based on the Martingale residuals, showing their connection to MiRKAT-S, OMiSA, and the taxon-by-taxon Cox regression. Then we extend the tests to use the deviance residuals, which are transformations of the Martingale residuals that are more symmetric above zero, and then construct combination tests that combine the results from tests using the two types of residuals. In the Results section, we first present simulation studies and then an application of all methods to data on acute graft-versus-host disease (aGVHD) after allogeneic blood or marrow transplantation [16]. We conclude with a brief discussion section.

Methods

Suppose that, for n unrelated subjects, we have data on the time to an event of interest (e.g., disease onset or relapse) that may be subject to random censoring. For i = 1, 2, …, n, let T be the (underlying) time to event for the ith subject and C be the corresponding censoring time. Instead of observing T and C, we only observe the time U = min(T, C) and the indicator Δ = I(T ≤ C) that indicates whether U corresponds to the event or to censoring. Further, let X be a set of possibly confounding covariates, which does not include the intercept. For j = 1, 2, …, J, let Z denote the microbiome data on taxon j from subject i, which can be the relative abundance, arcsin-root-transformed relative abundance, presence-absence status, or any (e.g., additive or centered) log-ratio transformed data. Following the conventions used in the LDM, we assume that both X and Z are centered to have mean zero, i.e., and for any j. Because survival times are censored, it is difficult to include them in the linear model framework used by the LDM. Following MiRKAT-S [5], we resolve this issue by first fitting a Cox model to the survival outcomes (U, Δ) and covariate data X; we then use the residuals from this model as a covariate in the LDM [12]. Because no microbiome data is used in the Cox model, the residuals should be associated with the microbiome data if the microbiome affects the survival outcome. If we use the Martingale residuals, denoted by M for subject i, we propose to test the association of taxon j with the Martingale residuals while adjusting for covariates X by using the LDM to fit the following linear model: where e is the error term with mean zero and a constant variance (the only distributional assumption we make). Note that the Martingale residuals have the properties that and [11]. To test H0 : β = 0, the LDM uses an F-statistic, the numerator of which is proportional to the square of given by where is the least squares estimator of β under the null model of (1). Further, the numerator of the global test statistic for testing the global association between the Martingale residuals and the microbiome is These test statistics can be used to show a connection between our approach and existing methods. First, the global statistic agrees with the variance-component score statistic in MiRKAT-S when the Euclidean distance (the linear kernel) is used, as well as the variance-component score statistic in OMiSA (the OMiSALN part) for untransformed data. Second, letting λ(⋅) denote the hazard function for a survival analysis, the taxon-specific coincides with the score statistic for testing α = 0 in the Cox model [11], which includes both the covariates and the microbiome data from the jth taxon as explanatory variables in the hazard function. These connections justify the use of the Martingale residual as a covariate in the LDM. The main advantage of our approach is that results for individual taxa are available, and that the global test statistic is a coherent combination of these taxon-specific statistics; neither MiRKAT-S nor OMiSA provide taxon-specific results. However, the LDM is based on the Euclidean distance for combining taxon-specific statistics, while MiRKAT-S can use arbitrary distances. For this reason, we also provide an extension of PERMANOVA for testing survival outcomes that can be used with arbitrary distances, at the end of this section. An important feature of our approach is that, although the effect of X has been removed from M (i.e., M and X are uncorrelated), we still include X in (1). In the S1 Text, we show how including this term allows our permutation tests to achieve higher power than the permutation tests currently available in MiRKAT-S. We further show how to obtain global tests with power similar to what we achieve using the original MiRKAT method [17] with the Martingale residual as a continuous outcome. Compared with the ad hoc approach of fitting a Cox model for each taxon, our permutation-based inference is robust to small sample size, low event rate, and sparse count data, while the Cox model is known to have inflated type I error in these situations [10, 11]. Compared with the ad hoc approach, both MiRKAT-S and our approach share the huge computational advantage that the Cox model only needs to be fit once. In addition, both methods only depend on the presence of an association between the Martingale residuals and the microbiome measures, and so do not depend on the correct specification of the Cox model for validity (i.e., type I error control), although power may be lost if the Cox model provides a poor fit to the data. One deficiency of the Martingale residual is its skewness, because it has a maximum value 1 but a minimum value −∞. Because a residual measure with a more normal-like distribution may perform better in downstream analyses, Therneau et al. [18] introduced the deviance residual for a Cox model: which is a non-linear transformation of the Martingale residual M. Therneau et al. found that with less than 25% censoring, the deviance residual is approximately normally distributed; with more than 40% censoring, too many points will lie near 0 making the distribution non-normal, although the deviance residuals remain approximately symmetric about 0. Therefore, we also consider a variation of our method that replaces M by D in the linear model (1). Although D is not orthogonal to X, we can still use the LDM to fit (1) as long as X enters the model before D because, in this case, the LDM will make D orthogonal to X before testing for association with Z. In our simulations, use of the Martingale residual sometimes gave better power and sensitivity; in other situations the deviance residual performed better. Since we cannot characterize those scenarios a priori, we also combine the results from analyzing each residual separately into a single combination test. To account for differences in residual scale, we take the minimum of the p-values obtained from analyzing each residual separately, and use the corresponding minima of null p-values for each test from the permutation replicates to simulate the null distribution; the null p-value is calculated based on the rank of the test statistic among all permutation replicates [19]. We extend PERMANOVA to analyzing survival outcomes in a similar way. Like MiRKAT-S, PERMANOVA is distance-based and offers a global test of the association at the community level. To explain the variability in a given distance matrix, we use a similar linear model as in (1) that includes the covariates X and the Martingale residual M as explanatory variables. We obtain the p-value for testing M, repeat the procedure with the deviance residual D, and then construct a combination test that take the minimum of the two p-values as the final test statistic. A common use of PERMANOVA is through the function “adonis2” in the R package vegan. We have also provided an alternative implementation of PERMANOVA through the function “permanovaFL” in our LDM package [12], which differs from adonis2 in the way permutations are conducted. We found that permanovaFL outperforms adonis2 in many situations [12, 14, 20].

Results

Simulated design

We conducted simulation studies to evaluate the properties of our approach and compare our results to those obtained using competing methods. Our simulations were based on data on 856 taxa of the upper-respiratory-tract (URT) microbiome [21] that were also used in the MiRKAT-S paper. We considered a binary confounder X and assumed equal numbers of subjects with X = 1 and X = 0. We randomly sampled 100 taxa to be associated with X and generated their associations as follows. We first set two vectors, π1 and π2, to the taxon frequencies (i.e., relative abundances) estimated from the URT microbiome data, and then permuted the frequencies in π2 that belong to the set of 100 taxa selected to be associated with X, which ensured the same frequencies in π1 and π2 for taxa not selected. Next, we defined a subject-specific frequency vector to be , in which β can be interpreted as the effect of X on the selected taxa. When β = 0, there was no association between X and the microbiome, in which case X reduced to a simple covariate for the survival outcome instead of a confounder. Finally, we generated the taxon count data for each subject using the Dirichlet-Multinomial (DM) model with mean , overdispersion 0.02, and library size sampled from N(10000, (10000/3)2) and left-truncated at 1000. We considered two models, M1 and M2, for simulating the survival outcome. In what follows, we number the taxa by decreasing relative abundance so that taxon 1 is the most abundant. In model M1, we assumed that the relative abundances of taxa 1–10 determined the association with the survival outcome; in model M2, we assumed that the presence or absence of 10 randomly selected taxa, selected from taxa 11–100, determined this association. Specifically, we defined under M1 and under M2, where δs were directions taking values 1 and −1 with equal probabilities (and fixed across replicates of data), was the set of selected “causal” taxa, Z was the observed frequency (taxon count divided by the library size), and was the average frequency for the jth taxon across subjects. Then, we simulated the time to event from the Cox model with the baseline hazard following the Weibull distribution , namely, where V was sampled from the uniform distribution U[0, 1] and B = exp{βscale(X) + βscale(S)} with β characterizing the effects of the “causal” taxa on the event time, β being fixed at 0.5, and scale(.) standardizing the input vector to have mean 0 and standard deviation 1. The censoring time was simulated independently from the Exponential distribution Exp(μ), where μ was set to 0.03, 0.08, and 0.2 to achieve approximately 25%, 50%, and 75% censoring. Using this procedure, we generated n = 100 or 50 subjects for each replicate of data. To evaluate robustness of our methods to violation of the proportional hazard (PH) assumption, we also simulated the event time from the accelerated hazard (AH) model [22] with the baseline hazard following the lognormal distribution, namely, , where Φ−1 is the inverse cumulative distribution function of the standard normal distribution. The censoring time was simulated as before using μ = 0.5 to achieve approximately 50% censoring. The AH model generated data that strongly violated the PH assumption (specifically, 28.8% rejection rate for testing the PH assumption [23] using our simulated data, which was much higher than the nominal level 5% of the test) and even had crossing survival curves. Prior to analysis, we filtered out taxa that were found in fewer than 5 subjects in the dataset. We used the R package Survival to obtain the Martingale and deviance residuals, M and D, from fitting the Cox model for the survival outcomes with X as the explanatory covariate. For testing individual taxa, we applied the LDM with either X and M as covariates or X and D as covariates in the linear regression model (1), and refer to them as LDM-m and LDM-d, respectively. Specifically, for data generated under model M1, we applied the LDM to the relative abundance data and arcsin-root-transformed relative abundance data separately and used the omnibus test that combined their results; for data generated under model M2, we applied the LDM to the presence-absence data. We also obtained the combination test that combines the results from LDM-m and LMD-d, and refer to it as LDM-c. To evaluate the ad hoc approach, we fit the Cox model and the Firth-corrected Cox model (using the “coxphf” function in the R package coxphf) taxon by taxon, using X and the taxon relative abundance under model M1 or taxon presence-absence status under model M2 as covariates; the p-values for these taxon-specific tests were then adjusted for multiple testing using the Benjamini-Hochberg procedure [24]. We evaluated the sensitivity and empirical FDR at nominal level 10% for all taxon-specific tests, using 1000 replicates of data. For testing global association, we obtained these results from LDM-m, LDM-d, and LDM-c, and we also applied permanovaFL in a similar way to obtain permanovaFL-m, permanovaFL-d, and permanovaFL-c. For permanovaFL-based tests and all other distance-based tests described below, we used the Bray-Curtis distance under model M1 and the Jaccard distance (without rarefying the taxa count table since the library sizes were balanced in the simulation) under model M2. For comparison, we applied MiRKAT-S using the permutation p-value, which was based on the Martingale residual only. We also applied OMiSA, specifically OMiSALN, the part of OMiSA that combines the results from analyzing differently power-transformed relative abundance data (with the default set of power values), which always analyzes data at the relative abundance scale even under model M2. In addition, we considered a number of secondary tests to gain more insights. To verify the equivalence of MiRKAT-S to an implementation of MiRKAT, we applied MiRKAT with a linear regression model that used the Martingale residual as the continuous outcome and the microbiome profile as the covariates without adjusting for X, and refer to this test as MiRKAT-m1. We also applied a variation of MiRKAT-m1 that additionally adjusted X in the linear regression, referred to as MiRKAT-m, and a variation of MiRKAT-m that replaced the Martingale residual by the deviance residual, referred to as MiRKAT-d. Finally, we applied PERMANOVA implemented in adonis2, with either X and M as covariates or X and D as covariates to obtain adonis2-m and adonis2-d. All global tests were evaluated on their type I error and power at the nominal level 0.05, based on 10000 and 1000 replicates of data, respectively.

Simulation results

We focus on the results from simulated data with 50% censoring and sample size 100; the results when the censoring rate was varied to 75% or 25% or the sample size was reduced to 50 showed the same patterns and are thus deferred to Supplementary Materials (S1 Table, S3–S5 Figs). Fig 1 shows the sensitivity and empirical FDR results for the taxon-specific tests. In both scenarios M1 and M2, the deviance residual (LDM-d) corresponds to higher sensitivity than the Martingale residual (LDM-m), although the difference was small. We explored two more scenarios, one assuming taxon 11 to be associated with the event time (referred to as M3) and one assuming taxon 21 to be associated (referred to as M4), in which data were analyzed at the relative abundance scale and the presence-absence scale, respectively. The results were displayed in S1 Fig. We found that the Martingale residual led to higher sensitivity than the deviance residual under M3 and the two residuals performed very differently under M4. Fortunately, the combination test LDM-c tracked the results of the best-performing residual in all cases. As expected, all LDM tests controlled the FDR (except for some minor inflation when the sensitivity was extremely low). The ad hoc Cox regression had very inflated FDR in all cases; the Firth-corrected Cox regression also had inflated FDR, albeit to a lesser degree. We hypothesized that the inflated FDR for both methods is due to the sparsity of the data, with zero counts for many observations at many taxa. To confirm this hypothesis, we varied the overdispersion parameter from 0.02 to 0.002 and finally to 0.0002 to successively decrease the number of cells with zero counts; results from these simulations are found in S6 Fig. Indeed, as the data became less sparse, the FDR of both Cox models became less inflated.

Fig 1

Sensitivity and empirical FDR of the taxon-specific tests in analysis of simulated data with a confounder X (β = 0.8), 50% censoring, and n = 100.

“Cox-f” is the Firth-corrected Cox model. The gray dotted line represents the nominal FDR level 10%.

Sensitivity and empirical FDR of the taxon-specific tests in analysis of simulated data with a confounder X (β = 0.8), 50% censoring, and n = 100.

“Cox-f” is the Firth-corrected Cox model. The gray dotted line represents the nominal FDR level 10%. The type I error results of the global tests are summarized in Table 1, which shows that the LDM- and permanovaFL-related tests all yielded type I error close to the nominal level 0.05. MiRKAT-S and OMiSA produced conservative type I errors when X was a confounder; for example, their type I error rates were 0.007 and 0.034 under model M2. Note that all these tests yielded highly inflated type I error (> 0.4) when the confounder was not adjusted for in the entire analysis, confirming that we have generated substantial confounding effects. The type I error rate of all these tests were robust to violation of the PH assumption when the event times were instead simulated using the AH model.

Table 1

Type I error of the global tests for simulated data with 50% censoring and n = 100.

Hazards model	Scenario	β _XZ	LDM-			permanovaFL-			MiRKAT-S	OMiSA
Hazards model	Scenario	β _XZ	c	m	d	c	m	d	MiRKAT-S	OMiSA
Cox	M1	0	0.051	0.049	0.047	0.051	0.051	0.048	0.052	0.050
		0.8	0.050	0.047	0.048	0.052	0.051	0.050	0.032	0.034
		0.8*	0.626	0.634	0.563	0.450	0.453	0.418	0.471	0.518
	M2	0	0.044	0.042	0.044	0.046	0.046	0.044	0.048	0.050
		0.8	0.048	0.050	0.047	0.049	0.050	0.046	0.007	0.034
		0.8*	0.805	0.808	0.74	0.814	0.817	0.74	0.818	0.518
AH	M1	0	0.050	0.051	0.054	0.050	0.050	0.052	0.052	0.052
	M1	0.8	0.050	0.048	0.045	0.051	0.050	0.050	0.034	0.034
	M2	0	0.050	0.049	0.049	0.051	0.052	0.050	0.053	0.052
	M2	0.8	0.052	0.050	0.053	0.050	0.046	0.051	0.007	0.034

Note: AH is the accelerated hazards model [22]. When β = 0, X was a simple covariate (i.e., not correlated with the microbiome data); when β = 0.8, X was a confounder; when β = 0.8*, X was a confounder but omitted in the entire analysis. Fig 2 displays the power for the global tests. Using either the LDM or permanovaFL, the Martingale and deviance residuals, as well as the combination test, all led to similar power. The similar power between the LDM and permanovaFL was a coincidence here and is not guaranteed in general, since permanovaFL results will vary depending on the distance measure used. MiRKAT-S had similar power to permanovaFL-m when X was a simple covariate (i.e., not correlated with the microbiome data) but had lower power than permanovaFL-m when X was a confounder (especially under Model M2), which is consistent with its conservative type I error results in this situation. OMiSA had very low power in both scenarios M1 and M2. We explored an additional scenario in which rare taxa (taxa 91–100) were associated with the event time; OMiSA yielded good power among all tests when the data were simulated and analyzed based on the relative abundance scale (S2 Fig).

Fig 2

Power of the global tests in the presence of a covariate (β = 0) and a confounder (β = 0.8).

The data were simulated with 50% censoring and n = 100. The gray dotted line represents the nominal type I error level 0.05.

Power of the global tests in the presence of a covariate (β = 0) and a confounder (β = 0.8).

The data were simulated with 50% censoring and n = 100. The gray dotted line represents the nominal type I error level 0.05. Fig 3 displays the power for the secondary global tests and included MiRKAT-S again as a calibration. Indeed, MiRKAT-S had equivalent power to MiRKAT-m1 in all cases. MIRKAT-m and MiRKAT-d always had very similar power to permanovaFL-m and permanovaFL-d, respectively, which was expected given the equivalent performance of MiRKAT and permanovaFL we have consistently observed in the context of testing continuous or binary outcomes. These results confirmed that the improvement in the power of permanovaFL-m over MiRKAT-S was truly due to its inclusion of X in the linear regression model (1). Lastly, adonis2-m and adonis2-d occasionally had lower power than permanovaFL-m and permanovaFL-d, as seen before [12, 14, 20].

Fig 3

See the caption to Fig 2.

The MiRKAT-S results are the same as those in Fig 2.

See the caption to Fig 2.

The MiRKAT-S results are the same as those in Fig 2.

Analysis of the aGVHD data

We analyzed the same data on aGVHD [16] that were also analyzed in the MiRKAT-S paper. We first followed the same procedure as in the MiRKAT-S paper to process the 16S rRNA sequencing data to obtain 2436 operational taxonomic unites (OTUs) in 94 subjects, and then removed subjects with library sizes less than 1000 and excluded OTUs that were found in fewer than 5 subjects to obtain a final set of 88 subjects and 441 OTUs for our analysis. We tested the association of the gut microbiome with two survival outcomes separately, the overall survival and the time to stage-III aGVHD, both adjusting for age and gender. The censoring rates for the overall survival and the time to stage-III aGVHD were 52.3% and 42.0%, respectively. The Martingale and deviance residuals obtained from the Cox model with age and gender as covariates were displayed in S7 Fig, which shows that neither residuals were normally or symmetrically distributed in this dataset. We applied the LDM, the Cox model, and the Firth-corrected Cox model for testing individual OTUs, and the LDM, permanovaFL, MiRKAT-S, and OMiSA for testing the global association. We applied these methods to both relative-abundance and presence-absence data scales, in the same way as in the simulation studies; in particular, we used the OMiSALN part only for OMiSA in analysis of the relative abundance data. For the presence-absence analyses, we considered both rarefied and unrarefied data for all methods. The unrarefied data may be subject to confounding by the library size, which varied considerably between 1,274 and 265,352 in this dataset. In the rarefaction-based analysis with rarefaction depth 1,274, the LDM was based on all rarefied OTU tables (the LDM-A method in [13]), and permanovaFL and MiRKAT-S were based on the expected Jaccard distance matrix over all rarefied OTU tables [20]. Unfortunately, the Cox model and Firth-corrected Cox model cannot handle multiple rarefied OTU tables except by manually rarefying and combining the results, while OMiSALN cannot be used for presence-absence analysis. Whenever possible, we also constructed the omnibus test for each method that combined their results from analyzing the relative abundance data and the presence-absence data (with all rarefactions). For LDM-m, LDM-d, and LDM-c, we applied LDM-omni3 [25] (an omnibus test that combines results from analyzing three data scales: relative abundance, arcsin-root-transformed relative abundance, and presence-absence scales) when analyzing the residuals of survival times. For permanovaFL-m, permanovaFL-d, and permanovaFL-c, we constructed an omnibus test based on the Bray-Curtis and Jaccard (using all rarefactions) distances. OMiSA itself is an omnibus test that combines results of OMiSALN and the omnibus version of MiRKAT-S (based on the Bray-Curtis distance and the weighted, unweighted, and generalized UniFrac [26, 27] distances without rarefaction). The Cox models and MiRKAT-S (original implementation in [5]) do not provide such omnibus tests; we did not construct a Cox model omnibus test combining relative abundance and presence-absence analyses since the marginal performance of the Cox models was so poor. All test results were summarized in Table 2. The LDM or permanovaFL combination tests (LDM-c, permanovaFL-c) always tracked the better results obtained using the Martingale residual and the deviance residual, so we focus on their combination tests hereafter. Among the different analyses we performed, presence-absence analyses based on all rarefied OTU tables consistently led to the most significant results for all tests. Specifically, LDM-c detected 17 OTUs associated with the overall survival and 29 OTUs associated with the time to stage-III aGVHD; the survival functions stratified by the presence and absence status of each detected OTU (based on a singly rarefied OTU table) were plotted in Figs 4 and 5, which showed a clear separation in each case. LDM-c, permanovaFL-c, and MiRKAT-S yielded p-values 0.0002, 0.0006, and 0, respectively, for testing the global association of the gut microbiome with the overall survival, and 0.0006, 0.0016, and 0.001 for the global association with the time to stage-III aGVHD. The substantial difference in results between the rarefied and unrarefied analyses implied that differences in the library size played an important, although undesired, role in the unrarefied analysis. Based on the relative abundance data and a nominal significance level 0.05, LDM-c and permanovaFL-c declared a significant global association of the gut microbiome with the time to stage-III aGVHD but failed for the overall survival; MiRKAT-S failed for both outcomes; OMiSA was significant for both outcomes. The omnibus test results tracked the results of the best-performing data scale in all cases.

Table 2

Results in analysis of the aGVHD data.

			Relative abundance	Presence-absence (unrarefied)	Presence-absence (all rarefactions)	Omnibus test
Overall survival	Number of detected OTUs	LDM-c	2	2	17	3
		LDM-m	5	10	28	16
		LDM-d	0	1	3	1
		Cox	0	2	-	-
		Cox-f	0	2	-	-
	Global p-value	LDM-c	0.0640	0.0456	0.000200	0.000200
		LDM-m	0.0565	0.0385	0.000200	0.000200
		LDM-d	0.0965	0.0737	0.000400	0.00100
		permanovaFL-c	0.0785	0.0376	0.000600	0.000800
		permanovaFL-m	0.0665	0.0316	0.000800	0.00140
		permanovaFL-d	0.132	0.0411	0.000400	0.000500
		MiRKAT-S	0.0581	0.0290	0	-
		OMiSA	0.002	-	-	0.01
Time to stage-III aGVHD	Number of detected OTUs	LDM-c	12	12	29	29
		LDM-m	50	15	64	57
		LDM-d	0	8	8	4
		Cox	0	0	-	-
		Cox-f	0	5	-	-
	Global p-value	LDM-c	0.0376	0.0365	0.000600	0.00180
		LDM-m	0.0315	0.0323	0.000600	0.00160
		LDM-d	0.0591	0.0668	0.00180	0.00400
		permanovaFL-c	0.0366	0.0411	0.00160	0.00300
		permanovaFL-m	0.0310	0.0355	0.00140	0.00260
		permanovaFL-d	0.0604	0.0624	0.00260	0.00500
		MiRKAT-S	0.0711	0.0300	0.00100	-
		OMiSA	0.004	-	-	0.012

Note: The OTUs were detected by controlling the FDR at 10% level. The permanovaFL and MiRKAT-S results were based on the Bray-Curtis distance in analysis of relative abundance data and the Jaccard distance in analysis of presence-absence data. The omnibus tests for LDM combined results from analyzing the relative abundance, arcsin-root transformed relative abundance, and presence-absence (all rarefactions) data. The omnibus tests for permanovaFL combined results from the relative abundance scale (using the Bray-Curtis distance) and the presence-absence scale (using the Jaccard distance and averaging over all rarefactions).

Fig 4

Survival functions for the overall survival outcome by the presence (blue) and absence (red) status (based on a singly rarefied OTU table) of the OTUs detected by LDM-c.

The plots were ordered by the adjusted p-values from LDM-c.

Fig 5

See the caption to Fig 4.

The outcome is the time to stage-III aGVHD here.

Survival functions for the overall survival outcome by the presence (blue) and absence (red) status (based on a singly rarefied OTU table) of the OTUs detected by LDM-c.

The plots were ordered by the adjusted p-values from LDM-c.

See the caption to Fig 4.

The outcome is the time to stage-III aGVHD here. Note: The OTUs were detected by controlling the FDR at 10% level. The permanovaFL and MiRKAT-S results were based on the Bray-Curtis distance in analysis of relative abundance data and the Jaccard distance in analysis of presence-absence data. The omnibus tests for LDM combined results from analyzing the relative abundance, arcsin-root transformed relative abundance, and presence-absence (all rarefactions) data. The omnibus tests for permanovaFL combined results from the relative abundance scale (using the Bray-Curtis distance) and the presence-absence scale (using the Jaccard distance and averaging over all rarefactions).

Discussion

We have presented an approach that can be used in the LDM and PERMANOVA frameworks to testing microbiome associations with survival outcomes. This approach is based on a linear model treating both the Martingale and deviance residuals from the Cox proportional hazards model as continuous covariates. Unlike existing methods which only give community-level (global) tests, our extension of the LDM gives both community-level and taxon-level association tests. Further, we find that the LDM global test and permanovaFL outperform the existing permutation-based global tests, MiRKAT-S and OMiSA, when there are strong confounders. Although the analysis of a single type of residuals can make use of existing code of the LDM or permanovaFL, the test that combines the two, which is recommended over each single test, does entail additional programming and has been added to the LDM package. Note that the only additional computational burden for testing survival outcomes in the LDM framework is the single calculation of the Cox model residuals and the calculation of the combination test, which is a negligible addition in computation. The gut microbiome data in the aGVHD dataset that we have analyzed here were generated from 16S rRNA sequencing. Our approach is readily applicable to microbiome data generated from shotgun metagenomic sequencing, although these data have different error profiles than 16S rRNA sequencing data. In fact, in a recent publication [28], we have applied permanovaFL using the approach developed here to analyze the shotgun metagenomic sequencing data of the gut microbiome and the outcome data on progression-free survival that were generated from a melanoma immunotherapy study [29]. In our simulation studies where the data were generated from the Cox model, we found that the tests based on the Cox model made many discoveries including excessive false discoveries, leading to inflated FDR. Conversely, in our analysis of the aGVHD data, we found that the Cox model made fewer discoveries and particularly zero discoveries based on the relative abundance data. This disagreement reflects the fact that the aGVHD data do not follow the Cox model exactly. Indeed, some survival functions in Figs 4 and 5 showed violation of the proportional hazards assumption. In this article, we have primarily considered testing hypotheses that are expressed in terms of relative abundances. Some investigators may prefer to test hypotheses that are expressed in terms of ratios of counts or relative abundances. To this end, a common approach is to normalize the read count data using methods such as GMPR [30] and CSS [31] and then apply tests of differential abundance (such as the LDM) to the normalized data. This approach critically depends on the validity of the normalization method of choice, which may not perform well in the presence of sparse read count data. In addition, the LDM can be directly applied to log-ratio data, although some of the appealing features of the LDM such as analyses on multiple scales must be suppressed.

Supplementary text.

(PDF) Click here for additional data file.

Type I error of the global tests for simulated data in other cases.

(PDF) Click here for additional data file.

Results in scenarios M3 and M4 when taxa 11 and 21, respectively, were associated with the event time.

Results of sensitivity and empirical FDR were obtained when X was a confounder (β = 0.8). (PDF) Click here for additional data file.

Results in the scenario when rare taxa (taxa 91–100) were associated with the event time.

Left column: data were simulated and analyzed based on the relative abundance scale, same as in model M1. Right column: data were simulated and analyzed based on the presence-absence scale (except for OMiSA), same as in model M2. The censoring rate was 50% and n = 100. Results of sensitivity and empirical FDR were obtained when X was a confounder (β = 0.8). (PDF) Click here for additional data file.

Results for simulated data with 75% censoring and n = 100.

Results of sensitivity and empirical FDR were obtained when X was a confounder (β = 0.8). (PDF) Click here for additional data file.

Results for simulated data with 25% censoring and n = 100.

Results of sensitivity and empirical FDR were obtained when X was a confounder (β = 0.8). (PDF) Click here for additional data file.

Results for simulated data with 50% censoring and n = 50.

Results of sensitivity and empirical FDR were obtained when X was a confounder (β = 0.8). (PDF) Click here for additional data file. The overdispersion parameter (“disp”) varied from 0.02, 0.002, to 0.0002. The results with overdispersion 0.02 are the same as those in Fig 1 (left column). (PDF) Click here for additional data file.

Martingale and deviance residuals, generated from the Cox model that fit age and gender as covariates in analysis of the aGVHD data.

(PDF) Click here for additional data file. 15 Jun 2022 Dear Dr. Hu, Thank you very much for submitting your manuscript "Testing microbiome associations with survival times at both the community and individual taxon levels" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Dina Schneidman-Duhovny Software Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this manuscript, Hu et al. extended their LDM framework to analyze survival outcomes. By treating the Martingale or deviance residuals as a covariate, their LDM framework can be conveniently applied. One nice thing about the proposed method is that it achieves both the community-level and individual taxa-level tests. Simulations and real data analysis demonstrated the superior performance of the proposed method to the ad hoc Cox model (individual taxa tests) and MiKRAT-S and OMiSA (global test). As clinical research has been increasingly adding the microbiome component, statistical methods for integrating the microbiome data into survival analysis are urgently needed. However, previous research mainly focused on community-level tests. Methods for individual taxa-level tests predominantly use the traditional Cox regression model by treating the taxon abundance as a covariate. It is not clear whether this method is valid or efficient. In this work, the authors revealed that the Cox model could not control the type I error properly, which is quite surprising. Overall, I believe the work is a nice contribution to the microbiome community. Once fully validated, I envision it to be an essential tool in statistical analyses of microbiome data. I have a couple of minor questions and comments that need the authors’ feedback. 1. I do not quite understand why the traditional Cox model has such a high FDR inflation. If the null model is generated according to the Cox model, I will expect the Cox model could control the type I error very well. 2. The authors did not mention/address the potential compositional effects. For example, what if several highly abundant taxa are associated with survival in the same direction? I think the authors need to address it in the discussion by providing some possible solutions, e.g., using robust normalization (GMPR, CSS, etc.). 3. In the simulation, the 20% FDR cutoff may be too high. I think the authors should increase the signal strength instead of using a higher FDR since the real data uses 10% FDR. 4. In the discussion, the authors mentioned the LDM-omni3. Could the authors provide the omni3 p-value for the real data? Reviewer #2: In this manuscript, Hu and colleagues present a method for associating microbiome with censored survival times. They extend their previous method (and package, so that the method is available for others) to be able to both detect community associations as well as associating particular taxa with the outcomes. Overall, the methods appear to be well done and the benchmarks based on simulated data do show the advantages of this method. However, the application to real data shown in the current version is still limited. I think it would strengthen the manuscript to present more of the analyses of the aGVHD data in the main text. In particular, I recommend moving Fig S7 to a main figure. One important claim that the authors make is that applying the Cox model in an ad hoc model leads to unacceptably high false discovery rates. This is demonstrated in simulated data, but it would be important to see the effect in real data as well. Although not something that I would personally consider to be strictly required, adding some shotgun metagenomics data analyses would also further bolster the case that this method is widely applicable. In principle, it should be possible to apply the method; but shotgun data does have different error profiles. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 6 Jul 2022 Submitted filename: 2022-07-01_response.pdf Click here for additional data file. 4 Aug 2022 Dear Dr. Hu, Thank you very much for submitting your manuscript "Testing microbiome associations with survival times at both the community and individual taxon levels" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. Please address the point raised by Reviewer 2. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Dina Schneidman-Duhovny Software Editor PLOS Computational Biology Dina Schneidman-Duhovny Software Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have done a great job in addressing my comments. I think the manuscript is in very good shape and ready for publication. Thanks for this important work! Reviewer #2: I thank the authors for their answers and improvements to the manuscript. However, unless I am mistaken, the new results in Table 2 (related to the results of the Cox model) actually show the opposite of what was claimed based on simulated data: the Cox model returns fewer positives (in several instances, zero taxa) than the alternatives. Thus, it becomes hard to see how this corresponds to unacceptably high false positive rates (rather than its opposite, a lack of statistical power). I feel that this point should be clarified before publication. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jun Chen Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 11 Aug 2022 Submitted filename: 2022-08-04_response.pdf Click here for additional data file. 23 Aug 2022 Dear Dr. Hu, We are pleased to inform you that your manuscript 'Testing microbiome associations with survival times at both the community and individual taxon levels' has been provisionally accepted for publication in PLOS Computational Biology. Please update the final version based on the comments of the Reviewer #2. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Dina Schneidman Software Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: My concerns have been addressed as the new paragraph explicitly addresses the discrepancy between simulation and real data. I will defer to the editor as to whether it is also appropriate to make it more explicit earlier that references [10,11] which are used to support the statement that "it is known that small sample sizes and sparse count data may lead to inflated type I error when using the Cox model" are not microbiome-related (which may explain why the results in this manuscript contradict this statement). The second sentence in the new paragraph should probably read "Conversely, in our analysis of the aGVHD data, we found that the Cox model made fewer discoveries and particularly zero discoveries based on the relative abundance data." ("discoveries" and not "discovery"). ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No 8 Sep 2022 PCOMPBIOL-D-22-00454R2 Testing microbiome associations with survival times at both the community and individual taxon levels Dear Dr Hu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

24 in total

1. The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients.

Authors: Vyara Matson; Jessica Fessler; Riyue Bao; Tara Chongsuwat; Yuanyuan Zha; Maria-Luisa Alegre; Jason J Luke; Thomas F Gajewski
Journal: Science Date: 2018-01-05 Impact factor: 47.728

2. Intestinal Blautia Is Associated with Reduced Death from Graft-versus-Host Disease.

Authors: Robert R Jenq; Ying Taur; Sean M Devlin; Doris M Ponce; Jenna D Goldberg; Katya F Ahr; Eric R Littmann; Lilan Ling; Asia C Gobourne; Liza C Miller; Melissa D Docampo; Jonathan U Peled; Nicholas Arpaia; Justin R Cross; Tatanisha K Peets; Melissa A Lumish; Yusuke Shono; Jarrod A Dudakov; Hendrik Poeck; Alan M Hanash; Juliet N Barker; Miguel-Angel Perales; Sergio A Giralt; Eric G Pamer; Marcel R M van den Brink
Journal: Biol Blood Marrow Transplant Date: 2015-05-11 Impact factor: 5.742

3. Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients.

Authors: V Gopalakrishnan; C N Spencer; L Nezi; A Reuben; M C Andrews; T V Karpinets; P A Prieto; D Vicente; K Hoffman; S C Wei; A P Cogdill; L Zhao; C W Hudgens; D S Hutchinson; T Manzo; M Petaccia de Macedo; T Cotechini; T Kumar; W S Chen; S M Reddy; R Szczepaniak Sloane; J Galloway-Pena; H Jiang; P L Chen; E J Shpall; K Rezvani; A M Alousi; R F Chemaly; S Shelburne; L M Vence; P C Okhuysen; V B Jensen; A G Swennes; F McAllister; E Marcelo Riquelme Sanchez; Y Zhang; E Le Chatelier; L Zitvogel; N Pons; J L Austin-Breneman; L E Haydu; E M Burton; J M Gardner; E Sirmans; J Hu; A J Lazar; T Tsujikawa; A Diab; H Tawbi; I C Glitza; W J Hwu; S P Patel; S E Woodman; R N Amaria; M A Davies; J E Gershenwald; P Hwu; J E Lee; J Zhang; L M Coussens; Z A Cooper; P A Futreal; C R Daniel; N J Ajami; J F Petrosino; M T Tetzlaff; P Sharma; J P Allison; R R Jenq; J A Wargo
Journal: Science Date: 2017-11-02 Impact factor: 47.728

4. Integrative analysis of relative abundance data and presence-absence data of the microbiome using the LDM.

Authors: Zhengyi Zhu; Glen A Satten; Yi-Juan Hu
Journal: Bioinformatics Date: 2022-05-13 Impact factor: 6.931

5. Extension of PERMANOVA to Testing the Mediation Effect of the Microbiome.

Authors: Ye Yue; Yi-Juan Hu
Journal: Genes (Basel) Date: 2022-05-25 Impact factor: 4.141

6. UniFrac: a new phylogenetic method for comparing microbial communities.

Authors: Catherine Lozupone; Rob Knight
Journal: Appl Environ Microbiol Date: 2005-12 Impact factor: 4.792

7. Dietary fiber and probiotics influence the gut microbiome and melanoma immunotherapy response.

Authors: Christine N Spencer; Jennifer L McQuade; Vancheswaran Gopalakrishnan; John A McCulloch; Marie Vetizou; Alexandria P Cogdill; Lorenzo Cohen; Giorgio Trinchieri; Carrie R Daniel; Jennifer A Wargo; Md A Wadud Khan; Xiaotao Zhang; Michael G White; Christine B Peterson; Matthew C Wong; Golnaz Morad; Theresa Rodgers; Jonathan H Badger; Beth A Helmink; Miles C Andrews; Richard R Rodrigues; Andrey Morgun; Young S Kim; Jason Roszik; Kristi L Hoffman; Jiali Zheng; Yifan Zhou; Yusra B Medik; Laura M Kahn; Sarah Johnson; Courtney W Hudgens; Khalida Wani; Pierre-Olivier Gaudreau; Angela L Harris; Mohamed A Jamal; Erez N Baruch; Eva Perez-Guijarro; Chi-Ping Day; Glenn Merlino; Barbara Pazdrak; Brooke S Lochmann; Robert A Szczepaniak-Sloane; Reetakshi Arora; Jaime Anderson; Chrystia M Zobniw; Eliza Posada; Elizabeth Sirmans; Julie Simon; Lauren E Haydu; Elizabeth M Burton; Linghua Wang; Minghao Dang; Karen Clise-Dwyer; Sarah Schneider; Thomas Chapman; Nana-Ama A S Anang; Sheila Duncan; Joseph Toker; Jared C Malke; Isabella C Glitza; Rodabe N Amaria; Hussein A Tawbi; Adi Diab; Michael K Wong; Sapna P Patel; Scott E Woodman; Michael A Davies; Merrick I Ross; Jeffrey E Gershenwald; Jeffrey E Lee; Patrick Hwu; Vanessa Jensen; Yardena Samuels; Ravid Straussman; Nadim J Ajami; Kelly C Nelson; Luigi Nezi; Joseph F Petrosino; P Andrew Futreal; Alexander J Lazar; Jianhua Hu; Robert R Jenq; Michael T Tetzlaff; Yan Yan; Wendy S Garrett; Curtis Huttenhower; Padmanee Sharma; Stephanie S Watowich; James P Allison
Journal: Science Date: 2021-12-23 Impact factor: 47.728

8. Differential abundance analysis for microbial marker-gene surveys.

Authors: Joseph N Paulson; O Colin Stine; Héctor Corrada Bravo; Mihai Pop
Journal: Nat Methods Date: 2013-09-29 Impact factor: 28.547

9. A rarefaction-based extension of the LDM for testing presence-absence associations in the microbiome.

Authors: Yi-Juan Hu; Andrea Lane; Glen A Satten
Journal: Bioinformatics Date: 2021-01-21 Impact factor: 6.937

Review 10. Gut Microbiota as Potential Biomarker and/or Therapeutic Target to Improve the Management of Cancer: Focus on Colibactin-Producing Escherichia coli in Colorectal Cancer.

Authors: Julie Veziant; Romain Villéger; Nicolas Barnich; Mathilde Bonnet
Journal: Cancers (Basel) Date: 2021-05-05 Impact factor: 6.639