| Literature DB >> 35432465 |
Zhiwen Jiang1, Mengyu He2, Jun Chen3, Ni Zhao4, Xiang Zhan5.
Abstract
Increasing evidence has elucidated that the microbiome plays a critical role in many human diseases. Apart from continuous and binary traits that measure the extent or presence of a disease, multi-categorical outcomes including variations/subtypes of a disease or ordinal levels of disease severity are commonly seen in clinical studies. On top of that, studies with clustered design (i.e., family-based and longitudinal studies) are popular alternatives to population-based ones as they are able to identify characteristics on both individual and population levels and to investigate the trajectory of traits of interest over time. However, existing methods for microbiome association analysis are inadequate to handle multi-categorical outcomes, neither independent nor clustered data. We propose a microbiome kernel association test with multi-categorical outcomes (MiRKAT-MC). Our method is versatile to deal with both nominal and ordinal outcomes for independent and clustered data. In addition, it incorporates multiple ecological distances to allow for different association patterns between outcomes and microbiome compositions to be incorporated. A computationally efficient pseudo-permutation strategy is used to evaluate the statistical significance. Comprehensive simulations show that MiRKAT-MC preserves the nominal type I error and increases statistical powers under various scenarios and data types. We also apply MiRKAT-MC to real data sets with nominal and ordinal outcomes to gain biological insights. MiRKAT-MC is easy to implement, and freely available via an R package at https://github.com/Zhiwen-Owen-Jiang/MiRKATMC with a Graphical User Interface through R Shinny also available.Entities:
Keywords: beta-diversity; kernel association test; longitudinal studies; microbiome association analysis; multi-categorical outcomes
Year: 2022 PMID: 35432465 PMCID: PMC9010828 DOI: 10.3389/fgene.2022.841764
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
Empirical type I error rates of MiRKAT-MC for independent data with three-categories.
| MiRKAT-MCN | MiRKAT-MCO | |||
|---|---|---|---|---|
|
|
|
|
| |
| Kw | 0.0463 | 0.0465 | 0.0440 | 0.0470 |
| Ku | 0.0436 | 0.0491 | 0.0487 | 0.0492 |
| KBC | 0.0488 | 0.0468 | 0.0469 | 0.0449 |
| K5 | 0.0479 | 0.0518 | 0.0476 | 0.0466 |
| HMP | 0.0502 | 0.0475 | 0.0461 | 0.0455 |
N denotes the sample size. Kw, the weighted UniFrac kernel; Ku, the unweighted UniFrac kernel; KBC, the Bray-Curtis kernel; K5, the generalized UniFrac kernel with parameter 0.5; HMP, the omnibus test using harmonic mean p-value test.
FIGURE 1Statistical powers of MiRKAT-MC for independent data with three categories. Scenario 1: = A randomly selected common cluster among 20 clusters by PAM; Scenario 2: = The rarest cluster among 20 clusters by PAM; Scenario 3: = 10 most abundant OTUs. Kw, the weighted UniFrac kernel; Ku, the unweighted UniFrac kernel; KBC, the Bray-Curtis kernel; K5, the generalized UniFrac kernel with parameter 0.5; HMP, the omnibus test using harmonic mean p-value test. (A) MiRKAT-MCN with 80 total samples; (B) MiRKAT-MCO with 80 total samples; (C) MiRKAT-MCN with 200 total samples; (D) MiRKAT-MCO with 200 total samples.
Empirical type I errors of MiRKAT-MC for clustered data with a random intercept and a random slope model with three-category outcomes.
|
|
| |||||
|---|---|---|---|---|---|---|
|
| 0.25 | 1 | 4 | 0.25 | 1 | 4 |
| MiRKAT-MCN | ||||||
| Kw | 0.0498 | 0.0492 | 0.0467 | 0.0478 | 0.0496 | 0.0484 |
| Ku | 0.0521 | 0.0533 | 0.0486 | 0.0449 | 0.0508 | 0.0478 |
| KBC | 0.0519 | 0.0542 | 0.0494 | 0.0522 | 0.0478 | 0.0497 |
| K5 | 0.0527 | 0.0516 | 0.0521 | 0.0521 | 0.0468 | 0.0505 |
| HMP | 0.0514 | 0.0533 | 0.0472 | 0.0465 | 0.0478 | 0.0488 |
| MiRKAT-MCO | ||||||
| Kw | 0.0500 | 0.0473 | 0.0474 | 0.0449 | 0.0498 | 0.0457 |
| Ku | 0.0486 | 0.0506 | 0.0487 | 0.0483 | 0.0483 | 0.0538 |
| KBC | 0.0535 | 0.0507 | 0.0487 | 0.0453 | 0.0493 | 0.0485 |
| K5 | 0.0519 | 0.0471 | 0.0489 | 0.0476 | 0.0501 | 0.0486 |
| HMP | 0.0495 | 0.0467 | 0.0481 | 0.0452 | 0.0483 | 0.0475 |
n indicates the number of clusters while N is the number of total observations. g denotes the variance of random effects. The definition of Kw, Ku, KBC, K5, and HMP is the same as Table 1.
FIGURE 2The two-dimensional PCoA plots depicting microbiome composition for different antibiotic treatment groups under various dissimilarity measures. All 499 fecal samples are included in the plots. PAT, therapeutic-dose pulsed antibiotic exposure; STAT, sub-therapeutic continuous antibiotic exposure. The crosses denote the centroid of points of each treatment group. (A) W.UniFrac: weighted UniFrac distance; (B) U.UniFrac: unweighted UniFrac distance; (C) G.UniFrac(0.5): generalized UniFrac distance with tuning parameter a = 0.5; (D) Bray-Curtis: Bray-Curtis dissimilarity.
FIGURE 3The boxplot of Shannon index across BMI categories in United Kingdom twins study. Normal: BMI 25; Overweight: 25 ≤ BMI 30; Obese: BMI ≥30. The circle on each box denotes the mean of Shannon Index in that category.
p-values of testing for the BMI-microbiome association in United Kingdom twins dataset using different methods and kernels.
| CSKAT | GLMM-MiRKAT-Binary | MiRKAT-MCO | MiRKAT-MCN | |
|---|---|---|---|---|
| Kw | 0.1455 | 0.1750 | 0.2223 | 0.3268 |
| Ku | 0.0036 | 0.0182 |
| 0.0033 |
| KBC |
| 0.0021 | 0.0016 | 0.0015 |
| K5 | 0.0278 | 0.0370 |
| 0.0264 |
| HMP | 0.0036 | 0.0075 |
| 0.0040 |
The bold value is the smallest significant p-value across four methods given the kernel/method. The definition of Kw, Ku, KBC, K5, and HMP is the same as Table 1.
Computation efficiency of MiRKAT-MC. Each result is the average time of one association test averaged from running 100 replicate association tests.
| MiRKAT-MCN (s) | MiRKAT-MCO (s) | ||
|---|---|---|---|
| Independent data | |||
| |
| 0.0150 | 0.0139 |
|
| 0.0914 | 0.0796 | |
| |
| 0.0978 | 0.0426 |
|
| 0.7627 | 0.2568 | |
| Longitudinal data | |||
| |
| 6.438 | 2.844 |
|
| 6.672 | 2.994 | |
| |
| 11.964 | 4.758 |
|
| 26.328 | 15.252 | |
For longitudinal data, both random intercepts and random slopes of time are included in the null models. The weighted UniFrac kernel was applied without loss of generalization. n denotes the number of clusters, whereas N is the total sample size. All the computation was conducted on a Macbook Pro (15-inch, 2019) laptop with 2.3 GHz 8-Core Intel Core i9 processor and 16 GB memory, without using parallel or other speed-up strategies.