| Literature DB >> 30990927 |
Emmanuel Gonzalez1,2, Frederic E Pitre3,4, Nicholas J B Brereton3.
Abstract
Analysis of 16S ribosomal RNA (rRNA) gene amplification data for microbial barcoding can be inaccurate across complex environmental samples. A method, ANCHOR, is presented and designed for improved species-level microbial identification using paired-end sequences directly, multiple high-complexity samples and multiple reference databases. A standard operating procedure (SOP) is reported alongside benchmarking against artificial, single sample and replicated mock data sets. The method is then directly tested using a real-world data set from surface swabs of the International Space Station (ISS). Simple mock community analysis identified 100% of the expected species and 99% of expected gene copy variants (100% identical). A replicated mock community revealed similar or better numbers of expected species than MetaAmp, DADA2, Mothur and QIIME1. Analysis of the ISS microbiome identified 714 putative unique species/strains and differential abundance analysis distinguished significant differences between the Destiny module (U.S. laboratory) and Harmony module (sleeping quarters). Harmony was remarkably dominated by human gastrointestinal tract bacteria, similar to enclosed environments on earth; however, Destiny module bacteria also derived from nonhuman microbiome carriers present on the ISS, the laboratory's research animals. ANCHOR can help substantially improve sequence resolution of 16S rRNA gene amplification data within biologically replicated environmental experiments and integrated multidatabase annotation enhances interpretation of complex, nonreference microbiomes.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30990927 PMCID: PMC6851558 DOI: 10.1111/1462-2920.14632
Source DB: PubMed Journal: Environ Microbiol ISSN: 1462-2912 Impact factor: 5.491
Figure 5Destiny and Harmony Module differential abundance.
A. Fold change and normalized mean counts. Fold change (FC Log2) is relative differences in abundance between locations. +/− INF (demarcated by the dashed red line) indicates ‘infinite’ fold change, where an OTU had detectable counts in samples from only a single location. Normalized mean counts originate from DESeq2 basemean output. Species are grouped by phylum.
B. Chord diagram illustrates the putative association of each DA OTU alongside the location where they were detected in the greatest abundance. The complete differential abundance table including relative abundance, fold change, annotation, count distribution, blast statistics, alternative database hits and sequences are provided in Supplementary file 5. Interactive figures are available at https://github.com/gonzalezem/ANCHOR/tree/master/article. [Correction added on 18 June 2019, after first online publication: Figure 5 caption has been corrected in this version]. [Color figure can be viewed at http://wileyonlinelibrary.com]
Kozich's mock community data set expected species information as found from OTU annotation in ANCHOR.
| ANCHOR OTUs | Expected species | Tax level | Ambiguous annotation | Identity % | Total counts |
|---|---|---|---|---|---|
| Acinetobacter baumannii_1 |
| Species | Unique | 100.0 | 407 |
| Actinomyces odontolyticus_1 |
| Species | Unique | 100.0 | 356 |
| Bacillus MS_1 |
| Species | 8 = | 100.0 | 377 |
| Bacteroides vulgatus_1 |
| Species | Unique | 100.0 | 204 |
| Bacteroides vulgatus_2 |
| Species | Unique | 100.0 | 42 |
| Bacteroides vulgatus_3 |
| Species | Unique | 100.0 | 21 |
| Clostridium MS_1 |
| Species | 4 = | 100.0 | 277 |
| Clostridium beijerinckii_1 |
| Species | Unique | 100.0 | 27 |
| Deinococcus radiodurans_1 |
| Species | Unique | 100.0 | 116 |
| Enterobacterales MS_1 |
| Species | 8 = | 100.0 | 198 |
| Enterococcus MS_1 |
| Species | 14 = | 100.0 | 196 |
| Helicobacter pylori_1 |
| Species | Unique | 100.0 | 355 |
| Lactobacillus MS_1 |
| Species | 4 = | 100.0 | 139 |
| Listeria MS_1 |
| Species | 5 = | 100.0 | 156 |
| Neisseria meningitidis_1 |
| Species | Unique | 100.0 | 303 |
| Porphyromonas gingivalis_1 |
| Species | Unique | 100.0 | 104 |
| Pseudomonas aeruginosa_1 |
| Species | Unique | 100.0 | 144 |
| Rhodobacter MS_1 |
| Species | 3 = | 100.0 | 53 |
| Staphylococcus MS_1 |
| Species | 13 = | 99.605 | 4 |
| Staphylococcus MS_2 |
| Species | 12 = | 100.0 | 599 |
| Streptococcus agalactiae_1 |
| Species | Unique | 100.0 | 218 |
| Streptococcus MS_1 |
| Species |
| 100.0 | 24 |
| Streptococcus mutans_1 |
| Species | Unique | 100.0 | 226 |
Ambiguity refers to annotation for a given OTU comprising multiple species with equal BLASTn scores. The parameters were a high‐count threshold of 3, 99% ANCHOR annotation selection and 98% low‐count sequences capture (see method). Data available in Supplementary File 3.
Retained for interest but flagged as a potential chimera by UCHIME during QC (difference from C. beijerinckii falls between 1–40 nt, which is 100% similar to bacillus and both staph sequences).
Kleiner's mock community data set expected species information as found from OTU annotation in ANCHOR.
| ANCHOR OTUs | Expected species | Tax level | Ambiguous annotation | Identity % | Total counts |
|---|---|---|---|---|---|
| Agrobacterium fabrum_1 |
| Species | Unique | 100 | 17,289 |
| Alteromonas MS_1 |
| Species | 4 = | 100 | 5413 |
| Alteromonas macleodii_1 |
| Species | Unique | 100 | 1342 |
| Bacillus MS_1 |
| Species | 2 = | 100 | 16,021 |
| Bacillus MS_2 |
| Species | 2 = | 100 | 2543 |
| Bacillus MS_3 |
| Species | 2 = | 100 | 2370 |
| Bacillus subtilis_1 |
| Species | Unique | 100 | 5442 |
| Bacillus subtilis_2 |
| Species | Unique | 99.6 | 268 |
| Chromobacterium MS_1 |
| Species | 3 = | 100 | 12,685 |
| Cupriavidus metallidurans_1 |
| Species | Unique | 100 | 34,913 |
| Desulfovibrio vulgaris_1 |
| Species | Unique | 100 | 276 |
| Enterobacterales MS_1 |
| Species | 5 = | 100 | 12,066 |
| Enterobacteriaceae MS_1 |
| Species | 2 = | 99.8 | 131 |
| Paracoccus MS_1 |
| Species | 4 = | 100 | 5958 |
| Pseudomonas MS_1 |
| Species | 5 = | 100 | 16,681 |
| Pseudomonas MS_3 |
| Species | 2 = | 100 | 2114 |
| Pseudomonas fluorescens_1 |
| Species | Unique | 100 | 10,881 |
| Pseudomonas MS_2 |
| Species | 4 = | 100 | 14,714 |
| Rhizobiaceae MS_1 |
| Species | 27 = | 100 | 24,671 |
| Rhodobacteraceae MS_1 |
| Genus | 3 = | 100 | 5652 |
| Salmonella enterica_1 |
| Species | Unique | 100 | 40,898 |
| Salmonella enterica_2 |
| Species | Unique | 100 | 6234 |
| Salmonella enterica_3 |
| Species | Unique | 100 | 137 |
| Salmonella enterica_4 |
| Species | Unique | 99.8 | 51 |
| Salmonella enterica_5 |
| Species | Unique | 100 | 100 |
| Staphylococcus MS_2 |
| Species | 2 = | 100 | 6352 |
| Bacteria MS_1 |
| Species | 7 = | 100 | 18,638 |
| Thermus thermophilus_1 |
| Species | Unique | 100 | 4307 |
Ambiguity refers to annotation for a given OTU comprising multiple species with equal BLASTn scores. The parameters were a high‐count threshold of 3, 99% ANCHOR annotation selection and 98% low‐count sequences capture (see method). Data available in Supplementary File 4.
Paracoccus dentrificans ATCC 17741 recognized as mistakenly archived P. pantotrophus LGM 4218 (start with Fig. 5 Goodwin et al., 1996 (Goodhew et al., 1996; Rainey et al., 1999; Kelly et al., 2006)).
Mistaken for Ps. denitrificans, nomen rejiciendum (Bacteriology, 1982).
is not currently classified to a species or genera, and ANCHOR annotation was in the family Rhodobacteraceae as the consensus phylogenetic placement between RDP and Silva; however, the assembled ANCHOR OTU was 100% similar to the original isolate, Uncultured bac AK199 (NCBI: JQ256816)(Lenk et al., 2012).
The Staphylococcus succinus (NCBI: KJ534522.1) is mistakenly annotated within the NCBI nt database; this was easily observed by both the high taxon disparity and the ambiguous annotation. This OTU would require manual curation to be relabelled correctly as Xanthomonadaceae AS (removal of the erroneous Staph hit) but highlights database integrity challenges here.
Kozich's mock community data set gene copies from expected species.
| Identified species with reference genomes | No. of gene copies (Variant @ full length) | Amplified Region | ANCHOR OTU (100% similarity to gene copy) | ANCHOR OTU counts | ||
|---|---|---|---|---|---|---|
| Variant | Distribution | Gene copies | ||||
|
| 6(1) | 1 | 1 | Ab‐ | Acinetobacter baumannii_1 | 407 |
|
| 2(1) | 1 | 1 | Ao‐ | Actinomyces odontolyticus_1 | 356 |
|
| 12(3) | 1 | 1 | Bc‐ | Bacillus MS_1 | 377 |
|
| 7(6) | 3 | 5 | Bv‐ | Bacteroides vulgatus_1 | 204 |
| 1 | Bv‐ | Bacteroides vulgatus_2 | 42 | |||
| 1 | Bv‐ | Bacteroides vulgatus_3 | 21 | |||
|
| 14(14) | 2 | 13 | Cb‐ | Clostridium MS_1 | 277 |
| 1 | Cb‐ | Clostridium beijerinckii_1 | 27 | |||
|
| 3(2) | 1 | 1 | Dr‐ | Deinococcus radiodurans_1 | 116 |
|
| 4(2) | 1 | 1 | Ef‐ | Enterococcus MS_1 | 196 |
|
| 7(6) | 1 | 1 | Ec‐ | Enterobacterales MS_1 | 198 |
|
| 2(2) | 1 | 1 | Hp‐ | Helicobacter pylori_1 | 355 |
|
| 6(1) | 1 | 1 | Lg‐ | Lactobacillus MS_1 | 139 |
|
| 6(4) | 1 | 1 | Lm‐ | Listeria MS_1 | 156 |
|
| 4(1) | 1 | 1 | Nm‐ | Neisseria meningitidis_1 | 303 |
|
| 4(1) | 1 | 1 | Pg‐ | Porphyromonas gingivalis_1 | 104 |
|
| 4(2) | 1 | 1 | Pa‐ | Pseudomonas aeruginosa_1 | 144 |
|
| 3(2) | 1 | 1 | Rs‐ | Rhodobacter MS_1 | 53 |
|
| 5(5) | 1 | 1 | Sa‐ | Staphylococcus MS_2 | 599 |
|
| 5(5) | 2 | 4 | Se‐ | ||
| 1 | Se‐ | X | ‐ | |||
|
| 7(1) | 1 | 1 | Stra‐ | Streptococcus agalactiae_1 | 218 |
|
| 5(2) | 1 | 1 | Strm‐ | Streptococcus mutans_1 | 226 |
|
| 4(1) | 1 | 1 | Strp‐ | Streptococcus MS_1 | 24 |
Full length expected gene copies from Kleiner's Mock were manually extracted from strain specific reference genomes (Supplementary File 3). The number of gene copies per genome was validated against the (very useful) University of Michigan Centre for Microbial Systems Ribosomal RNA Database (Klappenbach et al., 2001). Gene copies are named using E. coli nomenclature but are assigned a letter based on arbitrary occurrence in specific strain genome assembly to aid data navigation (these labels for specific copies should not be considered phylogenetically/across strains). Data available in Supplementary File 3.
No E. coli strain was provided but K12 (MG1655) had 100% similarity at the amplified region.
There are ambiguous nt calls in the amplified region of the H. Pylori 26695 assembly (none disagree with the ANCHOR OTU).
S. epidermidis (4/5) and S. aureus (5/5) gene copies share 100% identity for the amplified region.
Figure 1ANCHOR sequence processing diagram.
Four design targets were: (1) Fastq‐ready, no preprocessing required from users, (2) no sequence modification (sequence integrity retained), (3) low resource demanding, and (4) integrated exhaustive cross‐database annotation. [Correction added on 18 June 2019, after first online publication: Figure 1 caption has been corrected in this version]. [Color figure can be viewed at http://wileyonlinelibrary.com]
Kleiner's mock community assessed using five different methods.
| Mothur | Qiime1 | Dada2 | MetAmp | Anchor | |
|---|---|---|---|---|---|
| Number of expected species | 23 | 23 | 23 | 23 | 23 |
| Expected species (Species ID) | N/A | 8 | 5 | N/A | 16 |
| Expected species (Genera ID) | 19 | 11 | 11 | 17 | 1 |
| No. of unexpected OTUs/ASVs | 17,037 | 297 | 31 | 8 | 6 |
| Average count per OTU/ASV | 5 | 360 | 4478 | 4864 | 8013 |
| Total counts (% raw reads) | 275,610 (53.6%) | 340,895 (66.2%) | 259,699 (50.5%) | 126,459 (24.6%) | 272,941 (53.0%) |
Kleiner's mock community is composed of 12 samples: 3 conditions (types) × 4 sample replicates. Only amplicons within the length range of 436–467 nt were selected to allow for comparisons across methods. Method‐specific parameters used (defaults where possible) and resulting data are available in Supplementary File 4.
‐ = Not detected.
High taxon OTUs (phylum, class, order, family). is not currently classified to a species or genera, ANCHOR annotation was in the family Rhodobacteraceae as the consensus phylogenetic placement between RDP and Silva; however, the assembled ANCHOR OTU was 100% similar to the original isolate, Uncultured bac AK199 (NCBI: JQ256816)(Lenk et al., 2012). Rhodobacteraceae OTUs/ASVs from other methods are also presented as potentially representing Uncultured bac AK199.
Kleiner's mock community data set gene copies from expected species.
| Identified species with reference genomes | No. gene copies (variant @ full length) | Variant @ amplified region | OTU (100% similarity to gene copy) | OTU counts | % Celleq avg. | % Proteq avg. | % Uneven avg. | ||
|---|---|---|---|---|---|---|---|---|---|
| Variant | Distribution | Gene copy labels | |||||||
|
| 4 (1) | 1 | 4 | Af‐ | Agrobacterium fabrum_1 | 17,289 | 21.13 | 33.50 | 45.37 |
|
| 5 (3) | 2 | 3 | Am‐ | Alteromonas macleodii_1 | 1342 | 27.20 | 70.34 | 2.46 |
| 2 | Am‐ | Alteromonas MS_1 | 5413 | 27.90 | 69.94 | 2.16 | |||
|
| 10 (9) | 5 | 2 | Bs‐ | Bacillus subtilis_1 | 5442 | 59.45 | 38.66 | 1.89 |
| 5 | Bs‐ | Bacillus MS_1 | 16,021 | 58.85 | 39.34 | 1.81 | |||
| 1 | Bs‐ | Bacillus MS_2 | 2543 | 58.08 | 40.07 | 1.85 | |||
| 1 | Bs‐ | Bacillus MS_3 | 2370 | 56.92 | 41.35 | 1.73 | |||
| 1 | Bs‐ | X | X | ‐ | ‐ | ‐ | |||
|
| 8 (1) | 1 | 8 | Cv‐ | Chromobacterium MS_1 | 12,685 | 82.71 | 15.03 | 2.25 |
|
| 4 (1) | 1 | 4 | Cm‐ | Cupriavidus metallidurans_1 | 34,913 | 9.56 | 23.92 | 66.51 |
|
| 5 (4) | 2 | 4 | Dv‐ | Desulfovibrio vulgaris_1 | 276 | 0.00 | 0.00 | 100.00 |
| 1 | Dv‐ | X | ‐ | ‐ | ‐ | ||||
|
| 7 (1) | 7 | 7 | Ec‐ | Enterobacterales MS_1 | 12,066 | 30.41 | 47.39 | 22.20 |
|
| 1 | 1 | 1 | Ppa‐ | Paracoccus MS_1 | 5958 | 24.32 | 65.36 | 10.32 |
|
| 5 (3) | 1 | 5 | Psp‐ | Pseudomonas MS_1 | 16,681 | 41.20 | 44.90 | 13.90 |
|
| 6 (3) | 2 | 5 | Pf‐ | Pseudomonas fluorescens_1 | 10,881 | 28.69 | 41.81 | 29.50 |
| 1 | Pf‐ | Pseudomonas MS_3 | 2114 | 30.09 | 44.18 | 25.73 | |||
|
| 5 (3) | 1 | 5 | Pps‐ | Pseudomonas MS_2 | 14,714 | 56.94 | 40.04 | 3.02 |
|
| 3 | 1 | 3 | Rl‐ | Rhizobiaceae MS_1 | 24,671 | 22.15 | 57.00 | 20.85 |
|
| 7 (5) | 2 | 6 | Se‐ | Salmonella enterica_1 | 40,898 | 26.83 | 34.35 | 38.81 |
| 1 | Se‐ | Salmonella enterica_2 | 6234 | 27.25 | 34.66 | 38.08 | |||
|
| 6 (5) | 2 | 1 | Pa1‐ | X | X | ‐ | ‐ | ‐ |
| 5 | Pa1‐ | Staphylococcus MS_2 | 6352 | 5.81 | 84.08 | 10.11 | |||
|
| 6 (3) | 1 | 6 | Pa2‐ | |||||
|
| 2 (1) | 1 | 1 | Tt‐ | Thermus thermophilus_1 | 4307 | 48.57 | 45.48 | 5.94 |
| Unexpected species | |||||||||
|
| 6 (5) | 4 | 3 | SeR1‐ | Staphylococcus MS_1 | 2126 | 9.83 | 87.30 | 2.87 |
| 1 | SeR1‐ | Staphylococcus epidermidis_1 | 537 | 9.87 | 87.90 | 2.23 | |||
| 1 | SeR1‐ | X | X | ‐ | ‐ | ‐ | |||
| 1 | SeR1‐ | X | X | ‐ | ‐ | ‐ | |||
Full length expected gene copies from Kleiner's Mock were manually extracted from strain specific reference genomes (Supplementary file 4). The number of gene copies per genome was validated against the (very useful) University of Michigan Centre for Microbial Systems Ribosomal RNA Database (Klappenbach et al., 2001). Gene copies are named using E. coli nomenclature but are assigned a letter based on arbitrary occurrence in specific strain genome assembly to aid data navigation (these labels for specific copies should not be considered phylogenetically/across strains). Data available in Supplementary File 4.
Genome and megaplasmid.
Only one copy mined from all four current partial P. pantotrophus genomes: strains J40, J46, DSM1403, DSM 11073 (100% to amplicon in each).
Figure 2ANCHOR OTU and gene copy alignment for B. vulgatus ATCC 8482 in Kozich's Mock community.
The B. vulgatus ATCC 8482 genome (GCA_000012825.1 ASM1282v1) was downloaded from NCBI and explored using Geneious 7.1.9 (https://www.geneious.com). All sequences are provided in Supplementary File 4. All seven expected 16S rRNA gene copies of B. vulgatus ATCC 8482 are illustrated at full length (Bv‐rrsA‐H) with the three corresponding ANCHOR OTUs (amplicons) highlighted. [Correction added on 18 June 2019, after first online publication: Figure 2 caption has been corrected in this version]. [Color figure can be viewed at http://wileyonlinelibrary.com]
A comparison of most abundant organisms found in Lang et al. (Lang et al., 2017).
| ANCHOR OTU 19 most abundant species | % Total raw counts | Amplicon ambiguity |
|---|---|---|
| Staphylococcus MS_3 | 8.77 | 12 = |
| Lawsonella clevelandensis_1 | 4.32 | Unique |
| Lactobacillus MS_5 | 3.98 | 4 = |
| Streptococcus MS_6 | 2.52 | 5 = |
| Corynebacterium tuberculostearicum_1 | 2.20 | Unique |
| Homo Sapiens_53 | 2.15 | Unique |
| Homo Sapiens_40 | 1.52 | Unique |
| Pseudomonas MS_4 | 1.39 | 9 = |
| Akkermansia muciniphila_1 | 0.93 | Unique |
| Haemophilus parainfluenzae_1 | 0.92 | Unique |
| Pseudomonas lini_1 | 0.82 | Unique |
| Alistipes_2 | 0.81 | Unique |
| Corynebacterium MS_9 | 0.81 | 3 = |
| Homo Sapiens_4 | 0.80 | Unique |
| Finegoldia magna_1 | 0.73 | Unique |
| Corynebacterium MS_12 | 0.72 | 2 = |
| Bacteroides fragilis_1 | 0.68 | Unique |
| Acinetobacter johnsonii_1 | 0.65 | Unique |
Equivalent ANCHOR OTUs to the stated dominant genera are provided (the dominant genus in the order did not include the most abundant species in all cases). All 3347 ANCHOR OTUs, relative abundance and annotation as well as count distribution, blast statistics, alternative database hits and sequences are provided in Supplementary file 5.
Corynebacterium has now been placed in the order Corynebacteriales (Corynebacteriales ord. nov. Goodfellow and Jones 2015);
The second most abundant Corynebacterium genus annotated OTU in Lang et al. (Lang et al., 2017) was equivalent to ANCHOR OTU C. tuberculostearicum_1 at 100% similarity.
Revised from presented data in Lang et al. (Lang et al., 2017) using their raw data.
Figure 3Total community makeup from International Space Station Destiny and Harmony module surface swabs.
Krona graph [139] presenting the overview of OTUs and their abundance across all samples. The complete OTU table and including relative abundance, annotation, count distribution, blast statistics, alternative database hits, and sequences are provided in Supplementary file 5. MS, MG and MF refer to annotation as potentially multiple species, genera or families do to sequence conservation at the amplified region. Interactive figure available at https://github.com/gonzalezem/ANCHOR/tree/master/article. [Correction added on 18 June 2019, after first online publication: Figure 3 caption has been corrected in this version]. [Color figure can be viewed at http://wileyonlinelibrary.com]
Figure 4Destiny and Harmony module community comparison.
A. Diagram of the ISS (https://www.nasa.gov/feature/facts-and-figures) with the Destiny Module is labelled as U.S. Lab while Harmony Module is labelled as Node 2 (includes sleeping stations).
B. Photograph (ISS016‐E‐012617, 24 Nov. 2007) of the Destiny Module and Harmony Module; Astronaut Peggy Whitson (expedition 16 commander, in frame) works over a 7‐h, 4‐min spacewalk with astronaut Daniel Tani (out of shot) outfitting Harmony module in position in front of the Destiny module.
C. Destiny and Harmony Module microbial community richness as measured by Shannon and Inverse Simpson were found to be significantly different (t‐test, p < 0.05).
D. Composition of ISS communities in Harmony and Destiny modules represented by PCoA on Bray Curtis distances (PERMANOVA, Pr < 0.05).
The first coordinate explains 22.3% of the total variation and the second 17.0%. Destiny n = 4 and Harmony n = 10 samples. Further richness and ordination is available at https://github.com/gonzalezem/ANCHOR/tree/master/article. [Correction added on 18 June 2019, after first online publication: Figure 4 caption has been corrected in this version]. [Color figure can be viewed at http://wileyonlinelibrary.com]