| Literature DB >> 31940321 |
Tanvi P Honap1,2, Krithivasan Sankaranarayanan1,3, Stephanie L Schnorr1, Andrew T Ozga1,2, Christina Warinner1,2, Cecil M Lewis1,2.
Abstract
CrAssphage (cross-assembly phage) is a bacteriophage that was first discovered in human gut metagenomic data. CrAssphage belongs to a diverse family of crAss-like bacteriophages thought to infect gut commensal bacteria belonging to Bacteroides species. However, not much is known about the biogeography of crAssphage and whether certain strains are associated with specific human populations. In this study, we screened publicly available human gut metagenomic data from 3,341 samples for the presence of crAssphage sensu stricto (NC_024711.1). We found that crAssphage prevalence is low in traditional, hunter-gatherer populations, such as the Hadza from Tanzania and Matses from Peru, as compared to industrialized, urban populations. Statistical comparisons showed no association of crAssphage prevalence with variables such as age, sex, body mass index, and health status of individuals. Phylogenetic analyses show that crAssphage strains reconstructed from the same individual over multiple time-points, cluster together. CrAssphage strains from individuals from the same study population do not always cluster together. Some evidence of clustering is seen at the level of broadly defined geographic regions, however, the relative positions of these clusters within the crAssphage phylogeny are not well-supported. We hypothesize that this lack of strong biogeographic structuring is suggestive of an expansion event within crAssphage. Using a Bayesian dating approach, we estimate that this expansion has occurred fairly recently. Overall, we determine that crAssphage presence is associated with an industrialized lifestyle and the absence of strong biogeographic structuring within global crAssphage strains is likely due to a recent population expansion within this bacteriophage.Entities:
Mesh:
Year: 2020 PMID: 31940321 PMCID: PMC6961876 DOI: 10.1371/journal.pone.0226930
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Publicly available gut metagenomic datasets used in this study.
| Dataset | Description | Samples | Individuals | Accession | Reference |
|---|---|---|---|---|---|
| BCK | Urban mother-infant pairs from Sweden | 400 | 200 | ERP005989 | [ |
| CNA | Urban individuals of Cheyenne, Arapaho, and non-native ancestry from Oklahoma, USA | 61 | 61 | PRJNA299502 | [ |
| HAD | Hadza hunter-gatherers from Tanzania | 67 | 67 | SRP056480, SRP110665 | [ |
| HMP | Urban individuals from USA | 204 | 123 | phs000228.v3.p1 | [ |
| ITA | Urban individuals from Italy | 11 | 11 | SRP056480 | [ |
| ISR | Urban individuals from Israel | 950 | 851 | PRJEB11532 | [ |
| KRL | Urban individuals from Sweden | 145 | 145 | ERP002469 | [ |
| LIU | Traditional pastoralists from Mongolia | 110 | 110 | SRP080787 | [ |
| MAT | Matses hunter-gatherers from Peru | 25 | 25 | PRJNA268964 | [ |
| MHC | Urban individuals from China | 363 | 363 | SRA045646, SRA050230 | [ |
| MHE | Urban individuals from Sweden and Denmark | 756 | 606 | ERP003612, ERP004605, ERP002061 | [ |
| XIE | Urban twin-pairs from the UK | 249 | 249 | ERP010708 | [ |
Association of health status with prevalence of crAssphage.
| Dataset | Total number of individuals | Health Status | ||||||
|---|---|---|---|---|---|---|---|---|
| Healthy | IGT | T2D | CD | IBD | UC | Not specified | ||
| 39 (200) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 14 (37) | 0 (0) | 11 (18) | 0 (0) | 0 (0) | 0 (0) | 2 (6) | ||
| 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (67) | ||
| 47 (123) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 185 (851) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 0 (11) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 15 (43) | 12 (49) | 20 (53) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 11 (110) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 2 (25) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
| 37 (185) | 0 (0) | 23 (176) | 0 (0) | 0 (0) | 0 (0) | 1 (2) | ||
| 87 (350) | 0 (0) | 0 (0) | 2 (9) | 7 (25) | 12 (48) | 53 (174) | ||
| 64 (211) | 0 (0) | 5 (10) | 0 (0) | 0 (0) | 0 (0) | 12 (28) | ||
Values indicate number of crAss-positive individuals (total number of individuals in the category). Health status categories refer to: IGT–Impaired Glucose Tolerance, T2D –Type 2 Diabetes, CD–Crohn’s Disease, IBD–Inflammatory Bowel Disorder, UC–Ulcerative Colitis.
Association of age, sex, and BMI of individuals with prevalence of crAssphage.
| Dataset | CNA | HMP | KRL | MHC | MHE | XIE |
|---|---|---|---|---|---|---|
| 9 (23) | 47 (123) | 0 (0) | 16 (74) | 11 (35) | 1 (4) | |
| 4 (12) | 0 (0) | 0 (0) | 19 (103) | 47 (215) | 44 (134) | |
| 1 (2) | 0 (0) | 15 (43) | 2 (7) | 2 (19) | 19 (73) | |
| 5 (18) | 30 (65) | 0 (0) | 19 (95) | 31 (128) | 0 (0) | |
| 9 (19) | 17 (58) | 15 (43) | 18 (90) | 29 (142) | 64 (211) | |
| 0 (0) | 0 (0) | 1 (1) | 2 (17) | 0 (3) | 0 (4) | |
| 2 (13) | 29 (68) | 6 (18) | 17 (104) | 29 (130) | 35 (108) | |
| 12 (24) | 18 (55) | 8 (24) | 18 (64) | 58 (207) | 29 (98) | |
Values denote number of crAss-positive individuals (total number of individuals in the category)
Fig 1Phylogenomic analysis of crAssphage strains.
The Maximum Likelihood tree was based on a multi-genome alignment comprising 97,065 sites. The tree was built using the Generalized Time-Reversible model with gamma-distributed rate variation and proportion of invariant sites. Bootstrap support was estimated from 100 replicates; only values greater than 50% are shown. Strains are color-coded according to the geographic location: dark green–The Americas, light green–Asia, light violet–Europe, and dark purple–Middle East (Israel). Symbols are used to denote strains from the same individual.
Fig 2Relationships of crAssphage strains recovered from the same individual and twin-pairs.
The Maximum Likelihood tree was based on the multi-gene alignment comprising 12,642 nucleotide sites. Sites with missing data were eliminated. The tree was built using the Generalized Time-Reversible model with gamma-distributed rate variation and proportion of invariant sites. Bootstrap support values estimated from 100 replicates are given; only values greater than 50% are shown. CrAssphage strains recovered from multiple samples from the same individual are color-coded accordingly (MHE, ISR, HMP, and BCK datasets). Symbols are used to denote crAssphage strains recovered from the same twin-pair (XIE dataset).