Breck A Duerkop1, Manuel Kleiner2, David Paez-Espino3, Wenhan Zhu4, Brian Bushnell3, Brian Hassell5, Sebastian E Winter4, Nikos C Kyrpides3, Lora V Hooper6,7. 1. Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA. breck.duerkop@ucdenver.edu. 2. Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA. manuel_kleiner@ncsu.edu. 3. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA. 4. Department of Microbiology, University of Texas Southwestern Medical Center, Dallas, TX, USA. 5. Department of Immunology, University of Texas Southwestern Medical Center, Dallas, TX, USA. 6. Department of Immunology, University of Texas Southwestern Medical Center, Dallas, TX, USA. lora.hooper@utsouthwestern.edu. 7. Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA. lora.hooper@utsouthwestern.edu.
Abstract
The dysregulation of intestinal microbial communities is associated with inflammatory bowel diseases (IBD). Studies aimed at understanding the contribution of the microbiota to inflammatory diseases have primarily focused on bacteria, yet the intestine harbours a viral component dominated by prokaryotic viruses known as bacteriophages (phages). Phage numbers are elevated at the intestinal mucosal surface and phages increase in abundance during IBD, suggesting that phages play an unidentified role in IBD. We used a sequence-independent approach for the selection of viral contigs and then applied quantitative metagenomics to study intestinal phages in a mouse model of colitis. We discovered that during colitis the intestinal phage population is altered and transitions from an ordered state to a stochastic dysbiosis. We identified phages specific to pathobiotic hosts associated with intestinal disease, whose abundances are altered during colitis. Additionally, phage populations in healthy and diseased mice overlapped with phages from healthy humans and humans with IBD. Our findings indicate that intestinal phage communities are altered during inflammatory disease, establishing a platform for investigating phage involvement in IBD.
The dysregulation of intestinal microbial communities is associated with inflammatory bowel diseases (IBD). Studies aimed at understanding the contribution of the microbiota to inflammatory diseases have primarily focused on bacteria, yet the intestine harbours a viral component dominated by prokaryotic viruses known as bacteriophages (phages). Phage numbers are elevated at the intestinal mucosal surface and phages increase in abundance during IBD, suggesting that phages play an unidentified role in IBD. We used a sequence-independent approach for the selection of viral contigs and then applied quantitative metagenomics to study intestinal phages in a mouse model of colitis. We discovered that during colitis the intestinal phage population is altered and transitions from an ordered state to a stochastic dysbiosis. We identified phages specific to pathobiotic hosts associated with intestinal disease, whose abundances are altered during colitis. Additionally, phage populations in healthy and diseased mice overlapped with phages from healthy humans and humans with IBD. Our findings indicate that intestinal phage communities are altered during inflammatory disease, establishing a platform for investigating phage involvement in IBD.
Inflammatory bowel diseases (IBD), including Crohn’s disease and ulcerative colitis, are chronic disorders characterized by persistent inflammation of the intestine[1]. IBD imposes significant health and economic burdens worldwide[2], inspiring intense investigation into the host and environmental factors involved in IBD. The intestinal microbiota is integral to intestinal health[3-7] and has been implicated in IBD[8]. Efforts to define the composition and function of intestinal microbial communities during health and disease have yielded key insights into how the immune system maintains a beneficial relationship with the microbiota[9].Studies aimed at understanding the contribution of the microbiota to inflammatory diseases have focused primarily on the bacterial component. However, metagenomic studies have revealed a significant viral component of the microbiota[10-13]. While both eukaryotic and prokaryotic viruses are part of the microbiota, prokaryotic viruses, known as bacteriophages (phages) dominate. Intestinal phages exist in two states: as infectious lytic particles or as lysogenic prophage elements integrated within bacterial chromosomes[14]. Stimuli such as nutrient cues and reactive oxygen species, common signals associated with intestinal inflammation, can induce prophage excision and replication from bacterial chromosomes[15-18]. Increased phage infectivity during inflammation points to the immune system and inflammation as a driver of prophage excision and phage particle biosynthesis in the intestine[19]. However, the exact signals and mechanisms that promote prophage excision and lytic viral particle production in the intestine remain to be determined and the role of phages in intestinal inflammation is unclear.Phages associate with intestinal mucosal surfaces and elevated levels of certain double stranded DNA phages have been implicated in IBD[20-23]. However, there is a limited understanding of the effects of inflammation on phage communities and whether phages play a role in the development of IBD. To begin to address these questions we used a quantitative metagenomics approach to define the abundances of host associated phage populations in an experimental mouse model of T cell-mediated colitis. We observed that the abundance of intestinal phages is altered during colitis, and that phages that infect members of the microbiota with pathogenic potential (pathobionts) are elevated during colitis. Our data show that the host inflammatory response correlates with compositional changes in intestinal phage communities, and highlight the need for further investigation of phage biology during intestinal disease.
Results
A mouse model of intestinal colitis for the isolation of virus-like particles (VLPs).
To determine if colitis alters the intestinal phage community we studied a mouse model of T cell adoptive transfer that exhibits chronic inflammation of the colon (Fig. S1a)[24,25]. Rag1mice were injected with CD4+ CD45RBHigh T cells or saline, housed in individual cages and monitored for six weeks (42 days) for weight loss and inflammatory markers (Fig. S1b & S1c). Feces were collected from each mouse at day 0, 14, and 42 (Fig. S1a), and genomic DNA from the whole metagenome and virus-like particles (VLPs) was isolated. Whole metagenome and VLP DNAs were sequenced without amplification. Similar to Kang et al., the resulting viromes are among the first quantitative viromes available from the microbiota[26]. A similar number of VLPs were recovered from feces when counted by epifluorescence microscopy (Fig. S1d). Bacterial composition analysis of the fecal microbiota showed an increased abundance of Proteobacteria at 42 days (Fig. S1e). Similar results were obtained in mice co-housed with their treatment groups for 30 days after T cell transfer (Fig. S2a). These data support prior reports showing that Proteobacterial communities expand during intestinal inflammation[27-29].
Generation and verification of de novo VLP contigs
To alleviate the limitation of a priori knowledge of virus gene sequence features during virus contig identification we developed a sequence independent approach. Our method employs the quantitative comparison of contigs between whole metagenomes and VLPmetagenomes, making use of information obtained from physical VLP enrichments. We used sequence based approaches to validate these contigs as viral (see Materials and Methods). We used a quantitative metagenomic approach to detect viruses whose abundances differ between healthy and inflamed mice using an established method of read recruitment to the de novo assembled contigs (Fig. S3)[26,30]. Although we used a read mapping ratio approach to limit contigs of potential bacterial origin within our VLP contig dataset, we cannot rule out the presence of bacterial genomic DNA sequences in viral contigs originating from induced prophages. Our custom VLP database consisted of 1104 contigs ranging in size from 2 kb to 486 kb. Approximately 80% of contigs were 2 to 25 kb in length, 16% were 25 to 100 kb, and the remaining 4% were >100 kb.To validate the viral origin of the contigs we employed two separate approaches. First, we compared their open reading frames to a set of 25,281 viral protein families (VPFs) from known viruses described in Paez-Espino et al[31]. We identified 278 viral contigs with >70% of genes belonging to VPFs (Table S1). We found 600 and 118 viral contigs with 35–70% and less than 35% of their genes belonging to VPFs, respectively (Table S1). The remaining 108 contigs contained genes that did not match any VPF. Contrary, to the standard application of VPFs for the extraction of viral sequences from metagenomes, which requires meeting stringent criteria, we considered our viral contigs validated if they contained at least one VPF. We further validated our VLP contigs using VirSorter, a tool that identifies viral sequences from complex microbial DNA samples[32]. VirSorter identified 156 of the VLP contigs as phages and of these, 152 were also identified using VPFs (Table S2).We next compared our contigs to the IMG/VR viral database, which contains over 264,000 metagenomic viral contigs and 8,000 viral isolate genomes[33]. We used a sequence-based classification framework to group closely-related viral sequences[31]. Of the 1104 contigs in our database, 382 viral contigs (representing 34% of the total contigs) grouped within 142 viral clusters (Table S3). All viral contigs from IMG/VR that clustered with the contigs from this study originated from intestinal viruses (Table S3). The remaining 722 viral contigs did not group with any viral contigs in IMG/VR or our curated database and were considered singletons.
Quantitative metagenomics reveals alterations to phage communities during colitis.
To determine whether compositional changes occur within the phage community during colitis, we mapped our VLP reads to all publicly available Caudovirales genomes downloaded from NCBI, all viral isolate and metagenomes in IMG/VR and our curated VLP contig database.Mapping to phage reference genomes from NCBI showed divergence of the phage community in T cell-treated animals at 42 days, suggesting that disease status correlates with an altered fecal phage community (Fig. 1a). Enterobacteriaceae-specific phages were enriched in T cell-treated mice compared to healthy controls at day 42 (Fig. 1b and Table S4), including phages resembling those that infect Proteobacteria that expand during inflammation (Fig. S1f & S2b). Reads that mapped to phages predicted to infect Enterococci were also more abundant in T cell-treated animals on day 42 (Fig. 1b and Table S4). These data correlated with an increased proportion of phage reads that mapped to known viruses in colitic mice (Fig. S4). The presence of some phages correlated with an increased abundance of prospective host bacteria, suggesting that phage interact with known intestinal pathobionts during colitis. (Fig. S1f & Fig. S2b). In addition, VLP reads that mapped to Caudovirales phage families, including the Siphoviridae, Myoviridae and Podoviridae, were elevated in T cell-treated animals (Fig. 1c). These data accord with the altered phage abundances in human IBD patients[21]. The VLP reads that mapped to the Caudovirales genome dataset accounted for less than 0.01% of the total VLP reads per group of mice, indicating that most of the phage specific VLP reads are unique to our dataset or fall below our identity cutoff.
Figure 1.
Caudovirales phage abundances are altered during T cell-mediated colitis. (a) Heat map indicating the abundance of VLP sequencing reads with at least 97% identity to select sequenced Caudovirales phage genomes deposited in NCBI. Relative read abundances were summed by phage-host genus and z-scores were normalized by column. (b) Quantitative read mapping abundances for select groups of phages shown in panel A. Esch. – Escherichia phages, Sal. – Salmonella phages, Shig. – Shigella phages and Entero. – Enterococcus phages. **** p=0.0003, *** p=0.003, ** p=0.02, * p=0.05 by two-tailed nonparametric Mann-Whitney test. (c) Mapped read abundances for the Caudovirales phage families Siphoviridae (Sipho.), Myoviridae (Myo.) and Podoviridae (Podo.) at Days 0, 14 and 42 post T cell transfer. Mock – saline injected control mice, T cells – mice receiving CD4+ CD45RBHi T cells. Z-scores show how many standard deviations a value in a column is above and below the population mean. The data represent three independent mice per group. Error bars represent ± standard error of the mean. * p=0.007 by one-way ANOVA, NS – not significant.
To improve VLP read mapping for our samples we mapped the VLP reads to the IMG/VR database, as well as the 1104 viral contigs extracted from assemblies of our dataset. IMG/VR contains viral isolate genomes and metagenomes from ocean, soil, sewage and animal hosts[33]. A total of 1068 viral genomes from IMG/VR recruited reads from our mouseVLP samples (Table S5). 958 of these viral genomes originate from intestinal samples including 406 from humans, 417 from mouse, 102 from rat and 33 from other mammals (Fig. S5 & Table S5). This increased our read mapping percentage by two orders of magnitude (1.2%).We next mapped each sample’s VLP reads to our curated database of 1104 viral contigs (Fig 2a). Using the curated contig dataset allowed us increase our read mapping average to 22% (Table S6), yet 78% of the reads were not mapped to any database. Comparative read mapping analysis of day 0 samples showed both T cell-treated and control animals had a similar viral community. However, as colitis progressed the viral community shifted, with the most marked shift occurring 42 days after T cell treatment (Fig. 2a, 2b & 2c). This shift in viral community composition was also observed in animals that remained co-housed with their treatment group for the duration of the experiment (Fig. S6). Thus, T cell-mediated colitis was associated with a divergence of the intestinal viral community.
Figure 2.
Phage community alterations during colitis were identified using a curated VLP contig database. (a) Heat map showing the differential abundance analysis of VLP sequencing reads mapped to a curated database consisting of 1104 VLP contigs during colitis. Numbers 1–3 represent individual mice within each group. Each row representing a unique phage contig was z-score normalized. (b) Principal component analysis showing that phage community abundances are different between healthy and T cell treated animals. Ellipses were manually added to guide the reader and do not indicate confidence intervals. (c) Bray-Curtis dissimilarity β diversity metric quantifying the dissimilarity of the phage communities between healthy and T cell treated animals across the time series. The closer the value is to 1 the more dissimilar the communities are in terms of species composition. The whiskers represent the 5–95 percentile of an all against all comparison with the central line in each box representing the median value of the data set. (d) Volcano plots of two-tailed t-tests testing for changes to viral contig abundances during colitis. The multiple-testing problem was corrected by calculating a permutation-based false discovery rate (FDR). A FDR cutoff of 0.05 was used. Horizontal lines indicate the FDR based p-value cut off for statistical significance for each time point when comparing the contig abundances between healthy and diseased animals. Data points highlighted in blue and red indicate contigs that are significantly reduced or enriched in colitic animals respectively and grey data points indicate contigs for which phage host taxonomic assignments could be made. The data represent three independent mice per group. ** p=0.0001, * p=0.01 by Kruskal-Wallis test and a post hoc Dunn’s test for multiple comparisons.
The shift in phage community composition at day 42 reflected a decrease in viral contig abundances (Table S7) and a significant loss of diversity leading to an increased dissimilarity between healthy and diseased mice (Fig. 2c). Rarefaction analysis revealed sufficient read mapping coverage for all samples, ruling out low sampling as a cause of the divergence (Fig. S7). In addition to altered phage abundances there was a marked divergence in the bacterial community during colitis (Fig. S8). We found that some viral contigs differed significantly between control and diseased mice at day 42 (Fig. 2d). Of the contigs that were statistically significant, 35 contigs were at least 10-fold more abundant and 41 contigs were at least 8-fold less abundant during colitis.When comparing the presence of various contigs between healthy and colitic animals, we observed a stochastic pattern consistent with a random distribution of contigs within animals at various time points (Fig. 3a). 4% of the VLP contigs were shared between healthy animals as compared to only 1% in animals with colitis (Fig. 3a). Furthermore, the colitic animals contained a greater number of singleton viral contigs (Fig 3b). Although contigs shared between healthy animals accounted for a small proportion of the total VLPs, these data suggest that the viral community diverges as the intestine transitions from a healthy to dysbiotic state.
Figure 3.
Colitic animals share less VLP connections relative to healthy animals. (a) Connected VLP contigs (both shared and singleton contigs) across all time points and animals, from only healthy animals (Core healthy) which includes animals prior to T cell treatment and T cell treated animals 14 and 42 days post T cell administration (Core colitis). Circle diameter indicates the number of singleton VLP contigs that do not cluster with contigs in IMG/VR or to contigs within the curated database. Line thickness represents the number of shared contigs between animals and time points. (b) Abundance of singleton VLP contigs between mock and T cell treated animals across the time series. The data represent three independent mice per group. Error bars represent ± standard error of the mean. * p=0.02 by two-tailed unpaired Student’s t-test, NS – not significant.
At day 42 there was a dramatic reduction in the number of reads recruited to VLP contigs in colitic mice. This was most pronounced for mouse number two (Fig. 2a). 20% of the mapped reads from this animal aligned with four highly abundant contigs. Three of these contigs were among the most differentially abundant contigs at the height of disease (Fig. 2d). An 18.4 kb contig recruited 8.3% of the reads and the other three contigs of 136.6 kb, 145.5 kb and 176.1 kb recruited the remaining 11.7 % of the reads (Fig. S9 & Table S8). Although these contigs recruited a large proportion of the VLP reads from T cell-treated animals, read recruitment to these contigs varied between animals (Fig. S9).We were able to assign phage taxonomy or bacterial host information to two of these four contigs. We assigned phage taxonomy or host information using a list of signature phage orthologous groups (POGs), phage-encoded tRNAs, and clustered regularly interspaced short palindromic repeat (CRISPR) spacers that serve as a historical record of phage-bacteria interactions[34-36]. The 145.5 kb contig contained a tRNA 100% identical to the genus Acholeplasma, a member of the Mollicutes. This contig had a region of ~8 kb that resembled known phage genes (Fig. S9 & Table S8). The 136.7 kb contig matched a POG belonging to Spounaviridae, a subfamily of the Myoviridae composed of phages known to infect pathogens such as Staphylococcus aureus[37].Of the remaining 35 statistically significant contigs of high abundance, only three other contigs could be assigned taxonomy or host information (Table S9). A tRNA broadly connected one contig to either the Firmicutes or Tenericutes, one contig was assigned to a POG representing the genus Betalipothrixviruses which infect Archaea[38], and the third contig was linked to the Clostridiales through a CRISPR spacer. Although we assigned taxonomy or host information to only a few significantly different contigs, this was expected based on previous studies[21,39]. This highlights the power of our approach for identifying VLPs independent of existing database information. Of the 41 contigs with significantly reduced abundance, CRISPR spacer and tRNA matching revealed six to be potential Clostridiales phages and two were identified as phages that infect the Lachnospiraceae family within the Clostridiales. These data indicate that colitis is associated with a reduction in Clostridiales-like phages.
VLP identification reveals changes in the phage community during colitis
We next determined taxonomic or host assignments for all 1104 contigs in the curated VLP database. We were able to assign 116 contigs (11%) to a specific phage taxonomic group or host designation using one or more of the three identification methods described above. The number of total assigned contigs is smaller than the number of contigs predicted to be of viral origin using VPFs because numerous VPFs only contain viral hallmark genes and lack taxonomic or host information. CRISPR spacer matching resulted in the greatest number of assigned contigs (50 contigs), followed by tRNAs (41 contigs) and POGs (36 contigs) (Fig. S10). Both tRNA and CRISPR spacer matching identified 11 shared contigs, accounting for 27% and 22% of the total contigs identified using those methods respectively. Of the 36 contigs identified using POGs, 26% belonged to the Myoviridae, 18% to the Siphoviridae and 37% to the Podoviridae. The remaining 19% of contigs matched several archaeal viruses belonging to the Fuselloviridae and Lipothrixviridae (Fig. S10a).POG matching identified numerous phages with taxonomic signatures of known Enterobacteriaceae-like phages. This is consistent with the increased abundance of Proteobacteria in the metagenomes of T cell-treated mice and the mouseVLP reads mapping with higher abundance to Proteobacterial phage genomes (Fig. S1e & Fig. 1a). Since viruses within a given POG can infect very different bacterial hosts, we cannot completely exclude that the contigs classified as Enterobacteriaceae-like phages are from viruses that infect bacteria outside the Proteobacteria. Additionally, POGs identified phages taxonomically related to phiCD119, phiKZ, phiHp1, AHJD and Spounavirinae phages that infect pathobionts including Clostridium difficile, Pseudomonas aeruginosa, Haemophilus influenza, and Staphylococcus aureus[37,40-42] (Fig. S10a & Table S9). We were able to obtain order, family and in some cases genus-specific identification of multiple phage contigs using tRNA and CRIPSR spacer matching. As expected, the most highly represented phages in our database matched hosts in the phyla Bacteroidetes and Firmicutes. Firmicute-specific phages were dominated by those predicted to infect Clostridiales bacteria including phages specific for Lachnospiraceae, Ruminococcaceae and Clostridia. The next most prevalent were phages specific for the Streptococci. Bacteroidetes phages were almost exclusively represented by Alistipes phages (Fig. S10b, S10c & Table S9).To identify differences in the abundance of host-assigned phage contigs we grouped the contigs based on host identity, summed the read-mapping derived abundances and averaged these values for healthy and colitic animals. At the order level there was a significant decrease in the abundance of Clostridiales phages in T cell-treated animals (Fig. 4a). This supports our finding, from the analysis of the differentially abundant contigs that a2 reduction in Clostridiales phages occurs in colitis. When these data were stratified into family- and genus-specific contigs we discovered that phages predicted to infect Clostridium sp. KLE 1755 and some Lachnospiraceae bacteria decreased, whereas Lachnospiraceae phages that target strain COE1 were more abundant during colitis (Fig. 4b, 4c & 4d). Additionally, Clostridiales phages predicted to infect Clostridium difficile were more abundant in the T cell-treated animals (Fig. 4e).
Figure 4.
Phage taxonomy and host bacterial assignments for curated VLP contigs reveals differential abundances of phages that infect both commensals and pathobionts during colitis. (a) All phage contigs linked to the order Clostridiales. Each data point represents a unique Clostridiales phage contig found in both healthy and diseased animals. (b-e) Family and genus-specific phages constituting lower taxonomic ranks of the Clostridiales. (f-g) Abundances of Streptococcal phages and Alistipes phages. (h) Bacterial 16s rDNA sequencing read abundances for the C – Clostridiales, L – Lachnospiraceae, S – Streptococci and A – Alistipes. In panels b-g, lines connect shared contigs between healthy and diseased animals. The data represent three independent mice per group. Error bars represent ± standard error of the mean. **** p<0.0001, *** p=0.0002, ** p=0.008, *p=0.004 by two-tailed nonparametric Mann-Whitney test.
Streptococcus sp. phages were more abundant in T cell-treated animals, supporting a prior report on intestinal phages from IBD patients[21], as well as phages predicted to infect Alistipes (Fig. 4f & 4g). The 16S rRNA gene read abundances for the host bacteria infected by these phages were not statistically different between healthy and colitic animals, although Alistipes trended toward lower abundance during colitis (Fig. 4h). This suggests that shifts in phage abundance are not solely dependent on bacterial host density. Together, these data support the idea of a defined phage community during colitis.
Mouse VLP reads map to phages from human phageomes
To determine if mouseVLP sequencing reads matched intestinal phages from humans, we mapped our VLP reads from healthy and colitic animals to groups of phage contigs derived from healthy humans and humans with IBD[21,39]. A total of 38 contigs, accounting for 25% of the healthy human phageome, recruited reads from both healthy and colitic animals (Fig. 5a). Comparative analysis of the read mapping frequency revealed multiple healthy human phage contigs that recruited fewer reads from colitic animals and seven phages that were exclusively in healthy animals (Fig. 5a). Consistent with the observation that more contig connections were maintained under healthy conditions across time points (Fig. 3a), these data suggest that the healthy component of the murine phageome overlaps with that of humans and that the abundances of these phages decrease during colitis.
Figure 5.
VLP reads from healthy and colitic animals share identity to human associated intestinal phages. (a) VLP reads from healthy (mock) and colitic (T cells) mice 42 days post T cell treatment were mapped against a cohort of conserved phage contigs found in healthy humans. (b) VLP reads from healthy (mock) and colitic (T cells) mice 42 days post T cell treatment were mapped against phage contigs assembled from IBD patients. Panel b shows the top 85 IBD phage contigs with read recruitment. Normalized read abundance is the total number of mapped reads divided by the length of the contig in base pairs. Contigs highlighted in red indicate contigs which are only found in healthy or colitic mice. The data represent three independent mice per group. Error bars represent ± standard error of the mean. * p<0.1 by a two sample unpaired t-test with a permutation-based false discovery rate cutoff of 0.1 (Supplementary Table 10).
We next assessed the abundance of VLP reads that mapped to a set of 519 VLP contigs generated from the Chicago cohort of Crohn’s and ulcerative colitispatients described by Norman et al.[21]. A total of 169 contigs, accounting for 33% of VLP contigs generated from IBD patients, recruited reads from either healthy or colitic animals (Fig. 5b). Phages from diseased humans recruited more reads from colitic animals and in some cases only human IBD phage contig specific reads were represented in colitic mice (Fig. 5b). Thus, the murine phageome overlaps with that of humans both during intestinal health and disease.
Discussion
Using a metagenomics approach, we discovered that intestinal phage communities in colitic animals undergo compositional shifts that are similar to those observed in human IBD patients[21]. Accordingly, we observed a decrease in phage community diversity and an expansion of subsets of phages in animals with colitis. We observed decreased abundance of certain phages, such as Clostridiales phages, during colitis. Phage populations that expanded during colitis were frequently connected to pathobiont hosts that benefit from or are linked to intestinal inflammation. Not all colitis-specific phage expansion reflected increased abundances of assigned bacterial hosts, suggesting that alternative mechanisms drive altered phage abundances during colitis.We hypothesize that inflammation or other host defenses alter phage abundances during colitis. Such stresses could produce ecological disturbances in the intestinal environment, driving alterations within the viral community. Disturbances leading to the stochastic assembly of microbial communities have been described in other ecological networks[43,44], suggesting a plausible explanation for the high level of variation and divergence observed in the viral communities of colitic animals.Changes in viral community structure during colitis could have several underlying causes. For instance, most intestinal phages are integrated into bacterial chromosomes as lysogenic prophages[10,11]. Host inflammatory products could enhance prophage excision leading to increased phage abundances. Nutrient availability may also impact prophage excision. During inflammation, the metabolic landscape of the intestine dramatically changes and bacteria adapt to this environment[45]. Furthermore, prophage excision from bacterial chromosomes and progression to the lytic cycle are linked to nutrient availability[14,18]. As dysbiotic bacterial communities establish during colitis and competition for resources ensues, increased prophage induction could account for the emergence of specific phages.Our findings suggest that members of the Spounaviridae subfamily of phages could serve as informative markers for colitis. Additionally, during colitis there was an overall heightened abundance of phages predicted to infect Streptococci and phages that target Alistipes bacteria. Streptococcal phages are abundant in human IBD patients[21] thus highlighting the possibility of using these phages as disease markers. Elevated abundance of Alistipes phages was unexpected as these phages associate with members of the Bacteroidetes whose abundances decline during dysbiosis[46,47]. Thus, prophage excision in the Bacteroidetes might be driven by colitis and could be an indicator of disease.Overall, we observed the most dramatic changes in phages predicted to infect pathobionts. Since phages infect bacteria with strain-level specificity[48-50], elevated abundances of specific phages during IBD could serve as a proxy for strain-level resolution of disease-causing bacteria. This correlated with both reduced and increased abundances of phage DNAs in colitic animals that had similarity to phage genomes from healthy humans and humans with IBD respectively. Previous studies have shown that there is a high degree of inter-individual variation among human intestinal phage communites[10,11]. We were surprised to discover that there is significant overlap of murine intestinal phage reads with phages from both healthy humans and humans with IBD. These data suggest that experimental colitis in mice is a suitable model for the study of phage-bacterial interactions in the intestine, and that the findings from this model could be relevant to human disease.
Materials and Methods
Animals.
Two-week-old male C57BL6/J Rag1mice[51] were weaned and co-housed together in the same cage for four weeks. At six weeks of age three mice C57BL6/J Rag1mice were administered 5×105 naïve CD4+ CD45RBHigh T cells via retro-orbital injection. A second group of three C57BL6/J Rag1mice were given a saline injection as a control. All mice were housed in either separate cages or by treatment group for the duration of the experiment. The sample size of three mice per group was used because this was the minimum number of mice needed for reliable statistical analyses. Four male wild-type C57BL6/J mice aged six to eight weeks were used as spleen donors for T cell isolation. Male germ free C57BL6/J Rag1mice were reared in sterile isolators as described previously[52]. Fecal samples from three eight week old germ free Rag1 were pooled in duplicate. After co-housing for four weeks experimental mice were randomly assigned to treatment groups. It was not possible to blind researchers to the treatment conditions of the animals prior to experimentation. The Institutional Animal Care and Use Committee of UT Southwestern Medical Center approved all animal protocols.
T cell isolation.
The spleens from four male wild-type C57BL6/J mice were dissected and added to RPMI (BD Biosciences, San Jose, CA) containing 10% fetal bovine serum (Gemini Bio-Products, Sacramento, CA). The spleens were homogenized using the frosted ends of sterile microscope slides and passed through a 70 μm cell strainer. CD4 cells were negatively selected using a Dynabeads Untouched MouseCD4 Cells Kit (Invitrogen, Carlsbad, CA). Purified mouseCD4 cells were Fc-blocked using purified rat anti-mouse CD16/CD32, clone 2.4G2, (BD Biosciences) followed by two color staining using anti-mouseCD4-FITC, clone RM4–5 and anti-mouseCD45 RB-APC, clone C363–16A (Thermo Fisher, Waltham, MA). CD4+ CD45RBHigh cells were collected on a MoFlo XDP cell-sorting machine at the UT Southwestern Flow Cytometry Core Facility.
Metagenomic DNA isolation.
VLPs and their DNA were isolated as described previously[53]. Briefly, fecal pellets were homogenized in 2 ml of SM+ buffer[50] and 1.3 ml sample volume was clarified by centrifugation (2500g for 5 min followed by 5000g for 15 min at 4ºC). The supernatant was filtered through a 0.45 ¼m Millex-HV syringe filter (EMD Millipore, Burlington, MA) and treated with 100 units of DNase I (Roche, Pleasanton, CA) and 15 units of RNase A (Sigma, St. Louis, MO) for 1 hr at 37ºC. VLPs were not further purified by cesium chloride (CsCl) gradient centrifugation. Although CsCl purification is efficient at removing potential contaminating nucleic acids, this method is not suited for quantitative metagenomic comparison of a large number of samples due to its limited throughput[53]. Additionally, this method discriminates against specific phages and can vary in reproducibility due to the necessity of removing the VLPs from precise locations within the gradient that are difficult to normalize between samples[53]. Therefore, we proceeded to isolate VLP DNA from filtered and DNase/RNase treated samples by adding 100 mM EDTA, 50 ¼g/ml Proteinase K (New England Biolabs, Ipswich, MA) and 0.5% sodium dodecyl sulfate to each sample and to incubate them at 56ºC for 30 min. The samples were extracted with an equal volume of phenol:chloroform:isoamyl alcohol 25:24:1 (Sigma) followed by an additional extraction using just chloroform. The DNA was precipitated with isopropanol and cleaned using a MinElute Reaction Cleanup Kit (Qiagen, Hilden, Germany). Following a rigorous VLP purification strategy can greatly reduce the possibility of contaminating prokaryotic host genomic DNA that can lead to false positive viral gene identification from the VLP fraction[54,55]. For the isolation of total metagenomic DNA, the remaining 0.7 ml of fecal material was treated with 5 mg/ml lysozyme (Amersco, Solon, OH) at 37ºC for 30 min. The samples were transferred to 2 ml Lysing Matrix B tubes (MP Biomedicals, Solon, OH) and were disrupted by bead-beating in a Bullet Blender 5 (Next Advance, Troy, NY) at a speed setting of 9 for 1 minute followed by a 2 minute incubation on ice. This was repeated 5 times. Samples were centrifuged at 16,300g for 1 minute and DNA from the supernatant was extracted and purified as described above.
Metagenomic DNA sequencing.
DNA sequencing was performed using Illumina HiSeq 2500 Rapid Run at 150 cycles in paired-end mode. DNA concentration and quality was determined using the Qubit dsDNA HS kit (Invitrogen) and a Bioanalyzer 2100 (Agilent, Santa Clara, CA). DNA libraries were prepared using a KAPA High Throughput Library Preparation Kit with standard PCR library amplification (KAPA Biosystems, Wilmington, MA). TruSeq adapters (Illumina, San Diego, CA) with 6 bp indices were used to enable multiplexed sequencing. Illumina libraries were size selected using AMPure XP beads (Beckman Coulter, Indianapolis, IN). Library quality was measured on a Bioanalyzer 2100 by product size and concentration. On average, 43 million paired-end whole metagenome reads and 30 million paired-end VLP reads were generated for each sample. Sequencing reads were demultiplexed using the 6 bp TruSeq adapter sequences allowing for one mismatch using the Illumina CASAVA software. DNA quality control, HiSeq 2500 sequencing and library preparation was performed by the UT Southwestern Eugene McDermott Center for Human Growth and Development Next Generation Sequencing Core.
Decontamination and processing of reads.
Read decontamination and trimming was performed using the BBMap short read aligner v36.99 (https://sourceforge.net/projects/bbmap). Raw reads were mapped to the internal Illumina control phiX174 (J02482.1) and the mouse (mm10) and human (hg38) reference genomes using the bbsplit algorithm with default settings. The resulting unmapped reads were adapter trimmed and low-quality reads and reads of insufficient length were removed using the bbduk algorithm with the following parameters: ktrim=lr, k=20, mink=4, minlength=20, qtrim=f. These reads went through a second round of decontamination by mapping to a set of reference contigs assembled from samples obtained from the feces of germ-free C57BL6/J Rag1mice using bbsplit. These final unmapped reads constituted the cleaned M and P read sets.
Read based quantification of taxa.
To assess microbial community composition we used phyloFlash Version 2.0 (https://github.com/HRGV/phyloFlash) to reconstruct 16S and 18S rRNA genes from the Illumina read datasets and quantified taxa by read mapping. 16S and 18S rRNA genes were classified with phyloFlash against the SILVA SSU Ref NR 99 database 123[56]. Taxa abundances were summed at the phylum level and the genus level for specific genera.
Metagenomic assembly.
Whole metagenome, VLP and germ free mouse decontamination reference assemblies were performed using the IDBA-UD assembler v1.1.1[57]. Prior to assembly, paired end reads were interleaved using the fq2fa merge filter command within IDBA-UD. Assemblies were generated in the absence of error correction using the default parameters.
Construction of the curated VLP contig database.
VLP reads were assembled into 18 sets of contigs, with one set per animal per time point. VLP contigs smaller than 2 kb were removed from each sample’s contig set. Using these pruned contig sets, each sample’s VLP and whole metagenome reads were mapped to their corresponding sample’s contigs using the bbmap algorithm within BBMap with the following settings: ambiguous=random, qtrim=lr, minid=0.97. Read mapping coverage for each contig was generated by setting the ‘covstats’ parameter. The percentage of reads that mapped to each contig was used to calculate a read-mapping ratio per contig (VLP/whole metagenome). We assumed that if this ratio was greater than 1 (i.e., more VLP reads than whole metagenome reads mapped to a contig), then these contigs were more highly represented in the viral fraction and likely represented genuine viral contigs. From each of the 18 read mapping sets we selected and pooled the top 100 contigs based on the largest ratios. We removed redundancy by sequence clustering using cd-hit-est[58] at an identity threshold of 99%, resulting in a collection of 1104 unique contigs that constituted our curated viral contig database.
Quantitative read mapping.
The complete genomes of 1993 Caudovirales genomes downloaded from NCBI, a group of 154 phage genomes found in healthy humans from the Manrique study[39], contigs obtained from a Chicago cohort of IBD patients from the Norman study[21] and our curated viral contig database generated from healthy and colitic mice were used as mapping references using the bbmap algorithm. For the Caudovirales genomes and the curated viral contig database we used the following bbmap parameters: ambiguous=random, qtrim=lr, minid=0.97. For the Manrique and Norman datasets we decreased the identity threshold to minid=0.90, to account for increased heterogeneity between mouse and human samples[39]. Read mapping statistics for each contig were generated by setting the ‘scafstats’ parameter. For the Caudovirales genomes, the Manrique dataset and the Norman dataset the total number of unambiguous reads per contig were tallied and averaged across animal treatments and time points. For the curated viral contig database, the percent ambiguous and unambiguous mapped reads were combined for each contig and compared between treatments and time points.
Confirming viral contigs, establishing level of similarity with existing viruses, and defining host associations.
We used a curated set of 25,281 viral protein families (VPFs)[31] and the VirSorter database[32] to compare open reading frames within the 1104 viral contigs from this study for viral protein similarity. VPFs are a recently developed tool yet to be widely used for virus discovery. VPFs have been benchmarked for high specificity for viruses and only in rare cases a VPF may include a prokaryotic host protein. Additionally, we obtained taxonomic information for viral contigs by using phage orthologous gene clusters (POGs) that are recognized as taxon-specific signatures for virus classification[35]. For all Hidden Markov Models searches we used the algorithm hmmsearch with trusted cutoffs from the hmmer web server[59] and an e-value below 1e-05. To cluster viral sequences we used the public viral database IMG/VR composed of over 264 thousand metagenomic viral contigs and almost 8 thousand virus isolates[33]. We used the sequence-based classification framework described in Paez-Espino et al.[31] for grouping closely related viral sequences. The method relies on both nucleotide identity and the fraction of total alignment for pairwise comparisons of viral sequences. We used BLASTn in the Blast+ package[60] to find hits with an e-value cutoff of 1e−10 and at least 50% identity across ≥ 50% of the shortest sequence length. We used three different computational methods to predict host-virus connections: i) Grouping with isolated viruses, metagenomic viral contigs and host or taxonomic association using POGs. ii) Prokaryotic CRISPR spacer matching to viral contigs requiring at least 95% identity over the whole spacer length. To acquire CRISPR spacers from our experimental dataset each animal’s whole metagenome reads from each time point were combined and contigs were assembled using IDBA-UD. Binning of metagenome assemblies to retrieve genome bins was performed using MetaWatt[61] with default settings except avg read length=75, contig length threshold=500 and minimum bin seed size=5000. Taxonomy was assigned to each bin based on MetaWatt taxonomy binning. CRISPR repeat patterns and spacers were extracted from bins using metaCRT[62] with the following settings minNR=4, minRL=19, maxRL=48. Spacers were extracted from CRISPRs and aligned to the curated viral contig database using the seal algorithm in the BBMap software suite with the k parameter set to 19. iii) Matching viral transfer RNA sequences to bacterial transfer RNAs. We identified tRNAs from our viral contig set using ARAGORN v1.2[63] and compared these against isolate microbial genomes in IMG/M[64] using BLASTn. Only hits with a 100% identity over 100% of the length were considered. There are multiple methods available for phage host identification from VLPmetagenomes[65]. We chose to use CRISPRs and tRNAs to make host connections because CRISPR matching does not depend on preexisting databases for sequence comparison and tRNAs are common features of phage genomes that provide host specific information without the need for a reference phage genome.
Data Availability
The DNA sequencing reads and the curated VLP contigs database associated with this study are deposited at the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under accession number PRJEB22710.
Authors: Elizabeth A Dinsdale; Robert A Edwards; Dana Hall; Florent Angly; Mya Breitbart; Jennifer M Brulc; Mike Furlan; Christelle Desnues; Matthew Haynes; Linlin Li; Lauren McDaniel; Mary Ann Moran; Karen E Nelson; Christina Nilsson; Robert Olson; John Paul; Beltran Rodriguez Brito; Yijun Ruan; Brandon K Swan; Rick Stevens; David L Valentine; Rebecca Vega Thurber; Linda Wegley; Bryan A White; Forest Rohwer Journal: Nature Date: 2008-03-12 Impact factor: 49.962
Authors: David Paez-Espino; Emiley A Eloe-Fadrosh; Georgios A Pavlopoulos; Alex D Thomas; Marcel Huntemann; Natalia Mikhailova; Edward Rubin; Natalia N Ivanova; Nikos C Kyrpides Journal: Nature Date: 2016-08-17 Impact factor: 49.962
Authors: Pilar Manrique; Benjamin Bolduc; Seth T Walk; John van der Oost; Willem M de Vos; Mark J Young Journal: Proc Natl Acad Sci U S A Date: 2016-08-29 Impact factor: 11.205
Authors: Dmitry V Ostanin; Jianxiong Bao; Iurii Koboziev; Laura Gray; Sherry A Robinson-Jackson; Melissa Kosloski-Davidson; V Hugh Price; Matthew B Grisham Journal: Am J Physiol Gastrointest Liver Physiol Date: 2008-11-25 Impact factor: 4.052
Authors: Jeremy J Barr; Rita Auro; Mike Furlan; Katrine L Whiteson; Marcella L Erb; Joe Pogliano; Aleksandr Stotland; Roland Wolkowicz; Andrew S Cutting; Kelly S Doran; Peter Salamon; Merry Youle; Forest Rohwer Journal: Proc Natl Acad Sci U S A Date: 2013-05-20 Impact factor: 11.205
Authors: Andrew J Hryckowian; Bryan D Merrill; Nathan T Porter; William Van Treuren; Eric J Nelson; Rebecca A Garlena; Daniel A Russell; Eric C Martens; Justin L Sonnenburg Journal: Cell Host Microbe Date: 2020-07-10 Impact factor: 21.023
Authors: Mária Džunková; Soo Jen Low; Joshua N Daly; Li Deng; Christian Rinke; Philip Hugenholtz Journal: Nat Microbiol Date: 2019-08-05 Impact factor: 17.745
Authors: Lasha Gogokhia; Kate Buhrke; Rickesha Bell; Brenden Hoffman; D Garrett Brown; Christin Hanke-Gogokhia; Nadim J Ajami; Matthew C Wong; Arevik Ghazaryan; John F Valentine; Nathan Porter; Eric Martens; Ryan O'Connell; Vinita Jacob; Ellen Scherl; Carl Crawford; W Zac Stephens; Sherwood R Casjens; Randy S Longman; June L Round Journal: Cell Host Microbe Date: 2019-02-13 Impact factor: 21.023