| Literature DB >> 33444181 |
Anthony Fodor1, Penny Gordon-Larsen2,3, Bing Zhang4, Shan Sun1, Huijun Wang4, Matthew Cb Tsilimigras5,3,6, Annie Green Howard5,7, Wei Sha1, Jiguo Zhang4, Chang Su4, Zhihong Wang4, Shufa Du5,3, Michael Sioda1, Farnaz Fouladi1.
Abstract
OBJECTIVE: The human gut microbiota plays important roles in human health but is also known to be highly diverse between populations from different regions. Yet most studies inadequately account for this regional diversity in their analyses. This study examines the extent to which geographical variation can act as a confounding variable for studies that associate the microbiota with human phenotypic variation.Entities:
Keywords: Gut microbiota; geographic differences; machine learning; microbiota-host associations
Mesh:
Year: 2020 PMID: 33444181 PMCID: PMC7678355 DOI: 10.1136/bmjopen-2020-038163
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 3.006
Characteristics* of CHNS microbiota study participants
| Participants | |
| N | 2164 |
| Age, year | 51.8 (14.0) |
| Female, % | 50.2 |
| Province/megacity, % | |
| Beijing | 6.05 |
| Liaoning | 6.52 |
| Heilongjiang | 10.86 |
| Shanghai | 6.47 |
| Jiangsu | 6.75 |
| Zhejiang | 6.56 |
| Shandong | 6.05 |
| Henan | 6.38 |
| Hubei | 6.19 |
| Hunan | 6.56 |
| Guangxi | 5.96 |
| Guizhou | 6.33 |
| Yunnan | 6.01 |
| Chongqing | 6.84 |
| Shaanxi | 6.47 |
| Urban population†, % | 39.2 |
| Urbanisation index‡ | 75.4 (17.5) |
| BMI, kg/m2 | 24.4 (4.1) |
*Mean (SD) or percentage.
†Government urban/rural status (according to National Bureau of Statistics of China).
‡Community-level, multidimensional 12-component urbanisation index derived from household and community surveys, range from 29.2 to 104.4 in this cohort. (Jones-Smith JC, Popkin BM. Understanding community context and adult health changes in China: development of an urbanicity scale. Social Science & Medicine. 2010;71(8):1436-46.)
BMI, body mass index; CHNS, China Health and Nutrition Survey.
Figure 1The geographic variation of human gut microbiota in our CHNS cohort and the AGP. (A) PCoA ordinations of microbial composition in the CHNS cohort coloured by provinces/megacities. Ellipses indicate 95% confidence limits. (B) The estimated effect sizes of major host factors driving microbial variation in the CHNS cohort as measured by R2 in PERMANOVA tests. (C) The PCoA ordinations of microbial composition in the AGP cohort coloured by states. Ellipses indicate 95% confidence limits. (D) The estimated effect sizes of major factors driving microbial variation in the AGP cohort as measured by R2 in PERMANOVA tests. AGP, American Gut project; CHNS, China Health andNutrition Survey; PCoA, principal coordinates analysis; PERMANOVA, permutational multivariate analysis of variance.
Figure 2Significant variation of the genera compositions across provinces/megacities analysed with Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC). Keys indicate z-scores of the relative abundance of genera. Only genera with relative abundance >0.01% are shown in the figure. The legend for region numbers is shown in figure 1. The ANCOM-BC outputs are shown in online supplemental table S2.
Figure 3Variation of estimated effect sizes (R2) of the associations between major host factors and microbial composition in different provinces/megacities. To scale these values for visualisation, for each host factor (rows) PERMANOVA R2 values were z-transformed (to yield the number of SD from the mean) as shown by the key (insert) with no colour indicating an adjusted p-value (FDR)>0.1. Only factors significant in more than two provinces are shown in the figure. The legend for region numbers is shown in figure 1. PERMANOVA, permutational multivariate analysis of variance.
Figure 4Within-region and cross-region performance of random forest models predicting host factors from microbial composition. The top panel shows the performance of random forest classification of categorical factors with y axes as true positive rates (TPR). The rest of the panels show the performance of random forest regression of continuous factors with y axes as relative root mean square error (rRMSE). The leftmost plots in each panel are within-region models with training and testing data from the same region. The second to left plots are the same models with randomly shuffled factor categories as outcome labels. The third from left plots are cross-region models (between) with training data from one province/megacity and testing data from each of the other 14 regions. The rightmost plots are the same cross-region models with the host factors randomly shuffled as outcome labels. The TPRs and rRMSEs of models were compared with t-tests, and their means, SD and statistics are shown in online supplemental table S6.