| Literature DB >> 30409177 |
Qiyun Zhu1,2, Christopher L Dupont1, Marcus B Jones3,4, Kevin M Pham1,5, Zhi-Dong Jiang6, Herbert L DuPont6, Sarah K Highlander7,8.
Abstract
BACKGROUND: Travelers' diarrhea (TD) is often caused by enterotoxigenic Escherichia coli, enteroaggregative E. coli, other bacterial pathogens, Norovirus, and occasionally parasites. Nevertheless, standard diagnostic methods fail to identify pathogens in more than 40% of TD patients. It is predicted that new pathogens may be causative agents of the disease.Entities:
Keywords: Dark matter; Escherichia coli; Strain-level; TM7; Travelers’ diarrhea; Virulence factor; crAssphage
Mesh:
Year: 2018 PMID: 30409177 PMCID: PMC6225641 DOI: 10.1186/s40168-018-0579-0
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Phylum-level taxonomic profiles. Bar lengths represent relative abundances of sequences classified in taxonomic groups. a 16S rRNA gene-based profile, in which the baseline is the pool of all classified 16S rRNA sequences. Phyla with less than ten sequences in total are not displayed. “Unclassified” represents sequences marked as “unclassified Bacteria” by mothur. b WGS-based profile. Phyla with an average relative abundance lower than 0.001% are not displayed. “Unclassified” represents sequences not mapped to any of the reference sequences in the database. Samples are sorted by the 16S rRNA gene-based relative abundance of Firmicutes from low to high
Fig. 216S rRNA gene-based beta diversity of samples. a Scatter plot of the top three axes by principal coordinates analysis (PCoA). The four highly Proteobacteria-dominant samples, 160, 678, 6163 and 50076, formed a distinct cluster on the PC1 axis (vs. other TDs, AMOVA p value < 0.001). Three Proteobacteria-rich samples (76, 156, and 6165) also mapped near this cluster. The two Firmicutes-predominant samples, 147 and 6128, formed a small cluster (vs. other TDs AMOVA p value = 0.012). b Dendrogram reconstructed using the UPGMA algorithm based on the average Yue & Clayton measure of dissimilarity between pairs of samples
Fig. 3Illustration of metagenomic contig clustering pattern and binning process. a–d VizBin-computed, k-mer signature-based scatter plots of contigs ≥ 1 kb of the low-diversity sample 6163, in which E. coli was the dominant species (91.3%, by WGS reads, same below) and multiple E. coli genomes were detected and separated. The area of each dot is proportional to the contig size. a Taxonomic assignments of contigs. Genera with relative abundance ≥ 0.2% are colored. A contig is colored if ≥ 75% of reads mapped to it were mapped to a single genus. The dashed area shows a manually selected cluster of mostly Escherichia contigs. The kernel density function of the Escherichia contigs is plotted aside, with peaks manually divided to represent genomes of multiple E. coli strains. b Contig coverage indicated by opacity. c Taxonomic assignment rate (proportion of reads mapped to the reference genome database) indicated by color depth. d Contigs with SSU(s) are highlighted. e High-diversity sample 101 from which multiple known and “dark matter” genomes were isolated. f Sample 76 featured by the presence of multiple Enterobacteriaceae genera. g Sample 540, a healthy traveler control with moderate diversity
Fig. 4Basic statistics of the 565 genome bins extracted from 29 metagenomes. The three axes indicate relative abundance (calculated as sum of length × coverage of member contigs, normalized by the whole assembly), CheckM-computed completeness, and taxonomic assignment rate (proportion of classifiable reads mapped to member contigs), respectively. Dot area is proportional to the total length of contigs of each bin. Color scale indicates the number of SSUs identified in each bin
Features of predicted pathogenic E. coli strains by sample. Relative abundance, predicted serotype, predicted MLST type, and predicted pathogenic type are reported. Extended detail is provided in Additional file 1: Tables S8-S10
| Relative abundance (%) | Predicted serotype | Predicted MLST type | Predicted pathogenic type | |
|---|---|---|---|---|
| 10 (a) | 6.28 | H4 | ST-10 | ExPEC |
| 10 island | 0.06 | NT | NT | TSS genes |
| 78 (a) | 17.63 | O1:H7 | Unknown | DAEC |
| 78 island 2 | 0.06 | NT | NT | EHEC gene |
| 101 | 1.05 | O162:H33 | ST-378 | EPEC |
| 101 island | 0.01 | NT | NT | EHEC genes |
| 538 | 3.37 | O89:H33 | NT | ExPEC |
| 678 (b) | 13.12 | O69:H5 | NT | EHEC |
| 678 island | 3.52 | NT | NT | ExPEC genes |
| 715 (b) | 0.12 | H15 | NT | ExPEC genes |
| 715 island | 0.02 | NT | NT | EHEC genes |
| 6163 (a) | 78.90 | O145 | ST-10 | EPEC |
| 6163 (b) | 14.70 | O111:H8 | NT | EHEC |
| 6163 (c) | 0.73 | O166:H15 | NT | ExPEC |
| 6165 island | 0.66 | NT | NT | EPEC/EAEC genes |
| 6168 (b) | 0.78 | O111 | NT | EPEC? |
| 50076 (a) | 67.00 | H2 | ST-10 | ExPEC? |
| 50076 (c) | 4.54 | O99:H33 | NT | ExPEC? |
| 50076 island 1 | 8.97 | NT | NT | ExPEC genes |
| 50076 island 3 | 0.26 | NT | NT | TTSS genes |
| 50395 | 15.00 | H8 | ST-590 | EPEC |
| 80129 (b) | 0.15 | H34 | NT | ExPEC/NMEC |
| 80142 | 8.70 | H8 | Unknown | ExPEC |
| 80142 island 1 | 0.14 | NT | NT | EPEC genes |
Defining features and definitions:
Enterotoxigenic E. coli (ETEC): heat labile toxin, heat stable toxin
Enteroaggregative E. coli (EAEC): Aaf fimbriae, dispersin
Enteropathogenic E. coli (EPEC): LEE, STX-, bundle-forming pilus
Enterohemorrhagic E. coli (EHEC): LEE, STX+, Efa1 adhesin, ToxB
Diffusely adherent E. coli (DAEC): Afa/Dr. fimbriae
Neonatal meningitis E. coli (NMEC): K1 capsule, Ibe invasion proteins
LEE locus of enterocyte effacement, TTSS type three secretion system, STX Shiga toxin, ND not determined, NT not tested, ? probable but not conclusive
Fig. 5Phylogenetic tree of identified E. coli genomes. The tree was reconstructed using the maximum likelihood method using a conserved set of protein sequences. Multiple reference E. coli genomes were included to indicate the phylogenetic positions of the identified E. coli strains. Only near-complete (completeness ≥ 80%) genomes were included in the analysis. The tree is rooted with Salmonella as an outgroup. Nodal labels represent bootstrap support values (out of 100 replicates). Strains marked with an asterisk were those that were part of a polymicrobial sample. Group A is shaded yellow, B1 and B2 blue, D is green, E is violet and F is peach
Fig. 6Phylogenetic tree of 320 bins representing cellular organisms. Taxon labels are sample ID dot bin ID (see Additional file 1: Table S7). Black and gray lines represent branches with ≥ and < 75 out of 100 bootstrap support, respectively. Branch labels are taxonomic groups to which all child taxa except for unidentified organisms belong. The circular bar plots represent relative abundance (red, square root scale), completeness as a cellular organism (blue, linear scale), and proportion of reads mapped to the reference genome database (green, linear scale). All three plots are in a 0 to 100% range. Unidentified organisms (assignment < 40%) are indicated by gray lines (clusters) and dots (singletons) around the circle
Putative cellular “dark matter” genomes identified in the metagenomes. Eight dark matter (dm) groups representing monophyletic, closely related genomes are listed, along with 22 singleton genomes that are also included in the phylogenetic tree (Fig. 6). The size of a group is calculated as the maximum size of its member bins. Numeric cell values represent the number of bins isolated per sample. Question marks indicate that there is clear evidence (clusters of contigs with high sequence similarity with other member genomes) that at least one genome is present in this sample. However, it was not isolated as bin(s) or included in the phylogenetic analysis because its relative abundance or completeness is low, or because its member contigs are mixed with those from other genomes in the plot, making it difficult to separate. The background color depth is proportional to the total relative abundance of the genome(s)
Fig. 7Clustering patterns of crAssphage and “crish” viruses. a Examples of the contig co-clustering patterns in the k-mer signature-based scatter plot in samples 3, 50395, and 540. The large panels are the zoom-in views of the red boxes in the small panels, which represent the entire microbiomes. The size and opacity of a dot are proportional to the length and coverage of the contig, respectively. Contigs mapped to five representative bacteria in proximity to the viruses are colored. Extracted virus bins are highlighted by red edges and labeled by the bin ID and the virus cluster name. b Pairwise average nucleotide identity (ANI) matrix of crAssphage’s and nine clusters of “crish” viruses (assigned by letters A to I). ANI values below 70% are grayed out. The dendrogram shows the hierarchical clustering result based on the ANI matrix. The reference crAssphage genome is included for comparison. Bins that are too fragmented, incomplete, and/or low abundance are not included. Singletons are not included
Putative viral genomes isolated from the metagenomes. crAssphage and 24 novel putative virus groups (namely ph1 to ph24, sorted by the number of isolated genomes (bins) from high to low), as well as 21 singleton putative viral bins are listed. Cell values represent the number of bins per sample. The background color depth is proportional to the total relative abundance of the genome(s). The size of a group is calculated as one if there is one or more complete (circular) genomes, using the median of their sizes; two if not, using the maximum size of the bins with least number of contigs