| Literature DB >> 30991942 |
Nhan Thi Ho1,2, Fan Li3, Shuang Wang4, Louise Kuhn5.
Abstract
BACKGROUND: The rapid growth of high-throughput sequencing-based microbiome profiling has yielded tremendous insights into human health and physiology. Data generated from high-throughput sequencing of 16S rRNA gene amplicons are often preprocessed into composition or relative abundance. However, reproducibility has been lacking due to the myriad of different experimental and computational approaches taken in these studies. Microbiome studies may report varying results on the same topic, therefore, meta-analyses examining different microbiome studies to provide consistent and robust results are important. So far, there is still a lack of implemented methods to properly examine differential relative abundances of microbial taxonomies and to perform meta-analysis examining the heterogeneity and overall effects across microbiome studies.Entities:
Keywords: GAMLSS; Gender; Infant; Meta-analysis; Microbiome; Pooling estimates; Random effect; Relative abundance; Zero-inflated beta
Mesh:
Substances:
Year: 2019 PMID: 30991942 PMCID: PMC6469060 DOI: 10.1186/s12859-019-2744-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Type I error of GAMLSS-BEZI and LMAS
| Sample size | GAMLSS-BEZI | LMAS | GAMLSS-BEZI | LMAS | GAMLSS-BEZI | LMAS |
|---|---|---|---|---|---|---|
| Alpha level = 0.01 | Alpha level = 0.05 | Alpha level = 0.1 | ||||
| 10 | 0.014 | 0.012 | 0.061 | 0.050 | 0.114 | 0.099 |
| 100 | 0.010 | 0.010 | 0.051 | 0.050 | 0.103 | 0.098 |
| 500 | 0.010 | 0.011 | 0.052 | 0.052 | 0.104 | 0.103 |
GAMLSS-BEZI Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family, LMAS linear model with arcsin square root transformation (implemented in the software MaAsLin)
Fig. 1ROC curve and power of GAMLSS-BEZI vs. LMAS. a. ROC curve of GAMLSS-BEZI and LMAS for identifying species with differential abundance between case and control groups. b. Power of GAMLSS-BEZI vs. LMAS for different effect sizes of differential relative abundances between case and control groups. GAMLSS-BEZI: Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family; LMAS: linear model with arcsin squareroot transformation (implemented in the software MaAsLin); ROC curve: Receiver operating characteristic curve; AUC: area under the curve
Type I error of GAMLSS-BEZI and LMAS on real microbiome data
| Taxonomic level | GAMLSS-BEZI | LMAS | GAMLSS-BEZI | LMAS | GAMLSS-BEZI | LMAS |
|---|---|---|---|---|---|---|
| Alpha level = 0.01 (median (IQR)) | Alpha level = 0.05 (median (IQR)) | Alpha level = 0.1 (median (IQR)) | ||||
| Cross-sectional microbiome data | ||||||
| Phylum (5 taxa) | 0.010 (0.007, 0.017) | 0.007 (0.003, 0.010) | 0.043 (0.043, 0.050) | 0.040 (0.033, 0.043) | 0.100 (0.093, 0.113) | 0.090 (0.073, 0.090) |
| Family (33 taxa) | 0.000 (0.000, 0.003) | 0.000 (0.000, 0.007) | 0.007 (0.000, 0.043) | 0.033 (0.007, 0.050) | 0.070 (0.003, 0.103) | 0.083 (0.053, 0.107) |
| Longitudinal microbiome data | ||||||
| Phylum (5 taxa) | 0.007 (0.002, 0.012) | 0.010 (0.008, 0.013) | 0.047 (0.030, 0.060) | 0.067 (0.063, 0.080) | 0.110 (0.075, 0.123) | 0.117 (0.113, 0.132) |
| Family (33 taxa) | 0.003 (0.000, 0.008) | 0.010 (0.007, 0.013) | 0.043 (0.036, 0.053) | 0.050 (0.043, 0.064) | 0.097 (0.082, 0.110) | 0.107 (0.089, 0.117) |
GAMLSS-BEZI Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family, LMAS linear model with arcsin square root transformation (implemented in the software MaAsLin); IQR interquartile range. For longitudinal data, subject random intercepts were added to the models
Fig. 2Relative abundances of bacterial phyla in non-exclusively breastfed vs. exclusively breastfed infants ≤6 months of age. Data from Bangladesh study
Results of GAMLSS-BEZI and LMAS: real microbiome data example 1
| GAMLSS-BEZI | LMAS | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bacterial phyla | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | ||
| Actinobacteria | −0.37 | − 0.65 | − 0.10 |
| 0.0166 | −0.13 | − 0.23 | − 0.03 |
| 0.0207 |
| Bacteroidetes | 0.26 | 0.00 | 0.53 |
| 0.0499 | 0.03 | 0.00 | 0.05 |
| 0.0390 |
| Firmicutes | 0.24 | 0.00 | 0.47 |
| 0.0499 | 0.07 | 0.00 | 0.14 | 0.0668 | 0.0668 |
| Proteobacteria | 0.37 | 0.11 | 0.64 |
| 0.0166 | 0.10 | 0.02 | 0.17 |
| 0.0207 |
Data from Bangladesh study. Comparison of longitudinal monthly gut bacterial relative abundances at phylum level between non-exclusively breastfed (non-EBF) vs. exclusively breastfed (EBF) infants from birth to ≤6 months of age using GAMLSS-BEZI vs. LMAS. Significant p-values (< 0.05) are in bold
GAMLSS-BEZI Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family, LMAS linear model with arcsin square root transformation (implemented in the software MaAsLin), FDR false discovery rate
Fig. 3Relative abundances of bacterial phyla in infants from 6 months to 2 years of age with solid food introduction after 5 months vs. before 5 months. Data from Bangladesh study
Results of GAMLSS-BEZI and LMAS: real microbiome data example 2
| GAMLSS-BEZI | LMAS | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bacterial phyla | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | ||
| Actinobacteria | 0.19 | 0.04 | 0.34 |
| 0.0208 | 0.05 | −0.06 | 0.16 | 0.3451 | 0.3451 |
| Bacteroidetes | −0.26 | −0.42 | − 0.10 |
| 0.0070 | −0.05 | − 0.09 | −0.01 |
| 0.1079 |
| Firmicutes | −0.16 | −0.30 | − 0.03 |
| 0.0208 | −0.04 | − 0.12 | 0.04 | 0.3168 | 0.3451 |
| Proteobacteria | 0.14 | −0.02 | 0.30 | 0.0861 | 0.0861 | 0.02 | −0.02 | 0.07 | 0.2916 | 0.3451 |
Data from Bangladesh study. Comparison of longitudinal monthly gut bacterial relative abundances at phylum level between infants from 6 months to 2 years of age with solid food introduction after 5 months vs. before 5 months of age using GAMLSS-BEZI vs. LMAS. Significant p-values (< 0.05) are in bold
GAMLSS-BEZI Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family, LMAS linear model with arcsin square root transformation (implemented in the software MaAsLin), FDR false discovery rate
Fig. 4Relative abundance of bacterial phyla in infants from 6 months to 2 years of age with diarrhea vs. without diarrhea at the time of stool sample collection stratified by duration of exclusive breastfeeding (EBF). Data from Bangladesh study
Results of GAMLSS-BEZI and LMAS: real microbiome data example 3
| GAMLSS-BEZI | LMAS | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bacterial phyla | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | Estimate | 95% Lower limit | 95% Upper limit | FDR adjusted | ||
| In infants with duration of EBF ≤ 2 months (diarrhea vs. no diarrhea comparison) | ||||||||||
| Actinobacteria | −0.73 | −1.12 | −0.34 |
| 0.0011 | −0.12 | −0.23 | 0.0 |
| 0.0848 |
| Bacteroidetes | −0.29 | −0.68 | 0.10 | 0.1524 | 0.2032 | 0.06 | −0.12 | 0.01 | 0.0852 | 0.1136 |
| Firmicutes | 0.49 | 0.15 | 0.84 |
| 0.0109 | 0.11 | 0.01 | 0.2 |
| 0.0848 |
| Proteobacteria | −0.17 | −0.54 | 0.20 | 0.3729 | 0.3729 | 0.00 | −0.07 | 0.08 | 0.9060 | 0.9060 |
| In infants with duration of EBF > 2 months (diarrhea vs. no diarrhea comparison) | ||||||||||
| Actinobacteria | 0.02 | −0.42 | 0.46 | 0.9243 | 0.9243 | 0.00 | −0.10 | 0.10 | 0.9626 | 0.9989 |
| Bacteroidetes | 0.07 | −0.41 | 0.56 | 0.7680 | 0.9243 | 0.01 | −0.07 | 0.09 | 0.8101 | 0.9707 |
| Firmicutes | −0.02 | −0.40 | 0.36 | 0.9142 | 0.9243 | −0.01 | −0.13 | 0.12 | 0.8927 | 0.9707 |
| Proteobacteria | 0.12 | −0.33 | 0.56 | 0.6043 | 0.9243 | 0.02 | −0.06 | 0.11 | 0.5875 | 0.9191 |
Data from Bangladesh study. Comparison of longitudinal monthly gut bacterial relative abundances at phylum level in infants from 6 months to 2 years of age with diarrhea vs. no diarrhea at the time of stool sample collection stratified by duration of exclusive breastfeeding (EBF). Significant p-values (< 0.05) are in bold
EBF exclusive breastfeeding, GAMLSS-BEZI Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family, LMAS linear model with arcsin squareroot transformation (implemented in the software MaAsLin); FDR false discovery rate
Summary of four published microbiome studies included in meta-analysis
| Published study | Data origin (study population) | Study design/ | Sample size (for only infants ≤ 6 months of age) | Clinical variables used in meta-analysis | Target region of 16S rRNA genes /sequence platform | Starting files used and data processing done in this project |
|---|---|---|---|---|---|---|
| Subramanian et al. (2014). Persistent gut microbiota immaturity in malnourished Bangladeshi children [ | Bangladeshb | Longitudinal gut microbiome data from stool samples collected monthly from birth to 6 months of age of 50 healthy Bangladeshi infants | Number of samples: 322 | Gender, feeding status (EBF, non-EBF, non-BF), infant age at sample collection | V4 /Illumina MiSeq | Assembled 16S reads used for OTU picking (.fna file), mapping and meta-data files. |
| Bender et al. (2016). Maternal HIV infection influences the microbiome of HIV- uninfected infants [ | Haiti | One time gut microbiome data from stool samples of 48 HIV negative infants with age varied from 0 to 6 months whose mothers were HIV negative ( | Number of samples: 48 (female =25, male =21) | Gender, feeding status (EBF, non-EBF), infant age at sample collection | V4 /Illumina MiSeq | |
| Pannaraj et al. (2017). Association Between Breast Milk Bacterial Communities and Establishment and Development of the Infant Gut Microbiome [ | USA (California and Florida) | Longitudinal gut microbiome data from stool samples of 113 healthy full-term infants collected at 0 to 7 days, 8 to 30 days, 31 to 90 days, 91 to 180 days. | Number of samples: 221 (female = 120, male = 101) | Gender, feeding status (EBF, non-EBF, non-BF), infant age at sample collection | V4 /Illumina MiSeq | |
| Thompson et al. (2015). Milk- and solid-feeding practices and daycare attendance are associated with differences in bacterial diversity, predominant communities, and metabolic and immune function of the infant gut microbiome [ | USA (North Carolina) | Longitudinal gut microbiome data from stool samples of 6 healthy full term infants with age varied from 0 to 6 months. | Number of samples: 21 (female = 14, male = 7) | Gender, feeding status (EBF, non-EBF, non-BF), infant age at sample collection | V1–2 /Roche GS FLX Titanium |
aThis healthy cohort was used as reference in the comparison with malnourished cohorts in the original published paper. bThe healthy cohort of this Bangladesh study also contain 674 stool samples > 6 months of age. The data of this healthy cohort were also used in the analyses comparing the performance of GAMLSS-BEZI (Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero inflated beta (BEZI) family) vs. LMAS (linear model with arcsin squareroot transformation) in example 1, 2, 3 above. Data from this study was downloaded from the authors’ website: https://gordonlab.wustl.edu/Subramanian_6_14/Nature_2014_Processed_16S_rRNA_datasets.html. Data from three other studies were obtained directly from the investigators. EBF: exclusive breastfeeding; non-EBF: non-exclusive breastfeeding; non-BF: non-breastfeeding
Fig. 5Meta-analysis for the difference in relative abundances of gut bacterial taxa between male vs. female infants ≤6 months of age. a: Phylum level: heatmap of log (odds ratio) (log (OR)) of relative abundances of all gut bacterial phyla between male vs. female infants for each study and forest plot of pooled estimates across all studies with 95% confidence intervals (95% CI). b: Genus level: heatmap of log (OR) of relative abundances of all gut bacterial genera between male vs. female infants for each study and forest plot of pooled estimates across all studies with 95% CI. All log (OR) estimates of each bacterial taxa from each study were from Generalized Additive Models for Location Scale and Shape (GAMLSS) with beta zero inflated family (BEZI) and were adjusted for feeding status and age of infants at sample collection. Pooled log (OR) estimates and 95% CI (forest plot) were from random effect meta-analysis models with inverse variance weighting and DerSimonian–Laird estimator for between-study variance based on the adjusted log (OR) estimates and corresponding standard errors of all included studies. Bacterial taxa with p-values for differential relative abundances < 0.05 are denoted with * and those with p-values < 0.0001 are denoted with **. Pooled log (OR) estimates with pooled p-values< 0.05 are in red and those with false discovery rate (FDR) adjusted pooled p-values < 0.1 are shown as triangles. Missing (unavailable) values are in white. USA: United States of America; CA: California; FL: Florida; NC: North Carolina
Fig. 6Meta-analysis for the difference in relative abundances of gut microbial KEGG pathways between male vs. female infants ≤6 months of age. Heatmap of log (odds ratio) (log (OR)) of relative abundances of gut microbial KEGG pathways at level 2 between male vs. female infants for each study and forest plot of pooled estimates of all studies with 95% confidence intervals (95% CI). All log (OR) estimates of each pathway from each study were from Generalized Additive Models for Location Scale and Shape (GAMLSS) with beta zero inflated family (BEZI) and were adjusted for feeding status and age of infants at sample collection. Pooled log (OR) estimates and 95%CI (forest plot) were from random effect meta-analysis models with inverse variance weighting and DerSimonian–Laird estimator for between-study variance based on the adjusted log (OR) estimates and corresponding standard errors of all included studies. Pathways with p-values for differential relative abundances < 0.05 are denoted with * and those with p-values < 0.0001 are denoted with **. Pooled log (OR) estimates with pooled p-values< 0.05 are in red and those with false discovery rate (FDR) adjusted pooled p-values < 0.1 are shown as triangles. KEGG: Kyoto Encyclopedia of Genes and Genomes; USA: United States of America; CA: California; FL: Florida; NC: North Carolina