| Literature DB >> 35440670 |
Wenxuan Zuo1, Beibei Wang1,2, Xin Bai1, Yihui Luan2, Yingying Fan3, Sonia Michail4, Fengzhu Sun5.
Abstract
Dysbiosis of human gut microbiota has been reported in association with ulcerative colitis (UC) in both children and adults using either 16S rRNA gene or shotgun sequencing data. However, these studies used either 16S rRNA or metagenomic shotgun sequencing but not both. We sequenced feces samples from 19 pediatric UC and 23 healthy children ages between 7 to 21 years using both 16S rRNA and metagenomic shotgun sequencing. The samples were analyzed using three different types of data: 16S rRNA genus level abundance, microbial species and pathway abundance profiles. We demonstrated that (a) the alpha diversity of pediatric UC cases is lower than that of healthy controls; (b) the beta diversity within children with UC is more variable than within the healthy children; (c) several microbial families including Akkermansiaceae, Clostridiaceae, Eggerthellaceae, Lachnospiraceae, and Oscillospiraceae, contain species that are depleted in pediatric UC compared to controls; (d) a few associated species unique to pediatric UC, but not adult UC, were also identified, e.g. some species in the Christensenellaceae family were found to be depleted and some species in the Enterobacteriaceae family were found to be enriched in pediatric UC; and (e) both 16S rRNA and shotgun sequencing data can predict pediatric UC status with area under the receiver operating characteristic curve (AUROC) of close to 0.90 based on cross validation. We showed that 16S rRNA data yielded similar results as shotgun data in terms of alpha diversity, beta diversity, and prediction accuracy. Our study demonstrated that pediatric UC subjects harbor a dysbiotic and less diverse gut microbial population with distinct differences from healthy children. We also showed that 16S rRNA data yielded accurate disease prediction results in comparison to shotgun data, which can be more expensive and laborious. These conclusions were confirmed in an independent data set of 7 pediatric UC cases and 8 controls.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35440670 PMCID: PMC9018687 DOI: 10.1038/s41598-022-07995-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Metadata demographics. There are no age and gender differences between the UC cases and healthy controls. However, the numbers of reads for the healthy controls are much higher than that for the UC cases for both the 16S rRNA and shotgun sequencing data. Statistical significance (p-value) for the differences was calculated using the Wilcoxon rank-sum test.
| Healthy | UC | p-value | |
|---|---|---|---|
| Number of subjects | 23 | 19 | NA |
| Age (mean ± sd) | 14.04±3.5 | 14.47±3.5 | 0.559 |
| Gender (% male) | 10 (43.48%) | 10 (52.63%) | 0.569 |
| Number of reads (million) for 16S rRNA (mean ± sd) | 0.19 ± 0.31 | 0.09 ± 0.06 | 0.002 |
| Number of reads (million) for shotgun (mean ± sd) | 4.30 ± 0.80 | 2.45± 2.17 | 0.0002 |
Figure 1PCoA plots of the samples, with colors representing disease status and shapes representing therapies of UC patients (A patients taking 5-aminosalicylates, B patients taking biologic therapy, I patients taking immunomodulators, S patients using steroid, BI both B and I). (A) PCoA based on Bray–Curtis distance calculated from the 16S rRNA genus level abundance profiles. (B) PCoA based on the Bray–Curtis distance calculated from the bacterial species level abundance using the shotgun reads data.
Coefficients of variables for multiple regression on matrices. The coefficient for disease status indicates the difference of beta diversity between (D, H) samples versus (H, H) samples and the coefficient for disease status indicates the beta diversity difference between (D, D) samples and (H, H) samples, where D and H represent UC and healthy individuals, respectively. The coefficient for age a measures the contribution of age to beta diversity. The coefficient for gender measures the beta diversity difference between (M, F) individuals versus (M, M) individuals. Similarly, the coefficient for gender measures the beta diversity difference between (F, F) individuals versus (M, M) individuals, where M and F represent male and female, respectively. If a partial regression coefficient is reported, its significance level is < 0.05. *P 0.01 and **P 0.001.
| 16S genus level abundance | Species level abundance | |
|---|---|---|
| Disease status | Not significant | 0.249(*) |
| Disease status | 0.090 | 0.189(**) |
| Age group | Not significant | Not significant |
| Gender | Not significant | Not significant |
| Gender | Not significant | Not significant |
Joint contributions of disease status, age and gender on alpha diversity. Coefficients and statistical significance of the factors are quantified by a linear model 2. If a regression coefficient is reported, its significance level (t test) is < 0.05. *P 0.01 and **P 0.001.
| 16S genus level abundance | Species level abundance | |
|---|---|---|
| Disease status | − 0.223 (**) | − 0.140 (*) |
| Age | Not significant | Not significant |
| Gender | Not significant | Not significant |
Figure 2Box plots of Shannon indices for gut samples of pediatric UC cases and healthy controls stratified by disease status. P-values were calculated using the Wilcoxon rank sum test. (A) Shannon indices were calculated based on the 16S genus level abundance. (B) Shannon indices were calculated using the bacterial species abundance based on shotgun reads data.
Figure 3Bacterial families associated with pediatric UC ranked according to their statistical significance and heatmap using the (A) 16S, (B) shotgun profiling data. Each row represents a bacteria family. The left panel is dot plot of negative log10 transformed FDR value and the right two panels are heatmaps of abundance with log transformation for healthy controls and UC cases. Purple indicates high abundance, while pink indicates low abundance.
Figure 4Comparison of UC associated species using 16S rRNA gene and shotgun reads data with the associated species from Vila et al.[46]. For each microbial family, the numbers of species that were increased (green) or decreased (red) are shown.
Figure 5Pathways associated with pediatric UC ranked according to their statistical significance and heatmap using the shotgun pathway abundance.
The prediction performance of random forests algorithm measured by AUROC and its 95% confidence interval based on the genus level abundance profiles of 16S rRNA gene, microbial species and pathway abundance from shotgun sequencing data.
| Type of abundance | AUROC | 95% CI | Age involved | Gender involved |
|---|---|---|---|---|
| 16S genus level abundance | 0.908 | (0.810,1) | FALSE | FALSE |
| 0.929 | (0.843,1) | TRUE | TRUE | |
| Shotgun species-level abundance | 0.910 | (0.813, 1) | FALSE | FALSE |
| 0.903 | (0.802, 1) | TRUE | TRUE | |
| Shotgun pathway abundance | 0.951 | (0.878, 1) | FALSE | FALSE |
| 0.964 | (0.902, 1) | TRUE | TRUE |
Figure 6Validation ROC curves based on the random forests models with 500 trees developed from the training data. Numbers in the square brackets represent the confidence interval of the AUROC score.