Literature DB >> 33004967

Non-specific amplification of human DNA is a major challenge for 16S rRNA gene sequence analysis.

Sidney P Walker1,2,3,4, Maurice Barrett3,4, Glenn Hogan1,2, Yensi Flores Bueso1,2,3, Marcus J Claesson3,4, Mark Tangney5,6,7.   

Abstract

The targeted sequencing of the 16S rRNA gene is one of the most frequently employed techniques in the field of microbial ecology, with the bacterial communities of a wide variety of niches in the human body have been characterised in this way. This is performed by targeting one or more hypervariable (V) regions within the 16S rRNA gene in order to produce an amplicon suitable in size for next generation sequencing. To date, all technical research has focused on the ability of different V regions to accurately resolve the composition of bacterial communities. We present here an underreported artefact associated with 16S rRNA gene sequencing, namely the off-target amplification of human DNA. By analysing 16S rRNA gene sequencing data from a selection of human sites we highlighted samples susceptible to this off-target amplification when using the popular primer pair targeting the V3-V4 region of the gene. The most severely affected sample type identified (breast tumour samples) were then re-analysed using the V1-V2 primer set, showing considerable reduction in off target amplification. Our data indicate that human biopsy samples should preferably be amplified using primers targeting the V1-V2 region. It is shown here that these primers result in on average 80% less human genome aligning reads, allowing for more statistically significant analysis of the bacterial communities residing in these samples.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33004967      PMCID: PMC7529756          DOI: 10.1038/s41598-020-73403-7

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

This communication highlights off-target amplification of human DNA in 16S rRNA gene sequencing, detailing the circumstances necessary for this to occur, and the effects on ensuing research. Such artefacts are not a universal problem, and only occur in samples containing an overwhelming ratio of human to bacterial DNA. This leaves stool samples and skin samples which contain less than 10% and 90% human DNA respectively, unaffected, but can critically impact on analysis of human biopsy samples, where over 97% of the DNA present is of human origin[1]. Given the increased use of human biopsies from a number of body sites in microbiome research[2-5], this communication serves as a timely and, to our knowledge, unique methodological warning and remedy, particularly as only one mention of this issue can currently be found in the literature[6]. Currently, comparisons of primer pairs and the hypervariable regions they target in the 16S rRNA gene have focused exclusively on differing levels of taxonomic resolution and specificity[7,8]. The degree to which bacterial resolution is lost to the production human-derived amplicons has, so far, received no attention. This is because workflows for the analysis of 16S rRNA gene sequencing data typically remove reads falling too far from the mean or median sequence length, or if they are not classified taxonomically as originating from bacterial DNA. This is effective in ensuring that the presence of amplified human DNA does not have any impact on downstream analysis. Unaddressed is the fact that in a sequencing experiment yielding a finite amount of data (up to 15 Gb on a typical Miseq run[9]), a significant proportion of these can be wasted due to this off target amplification. This affects sequencing studies in two ways. Prospectively: If this loss of data is anticipated, fewer samples can be sequenced on a given sequencing run, adding to the expense which is already prohibitive for smaller labs. Retrospectively: If this loss if data is not anticipated, insufficient bacterial reads may be yielded to accurately characterise the samples being sequenced, particularly if attempting to identify the prevalence of rare taxa between different treatment groups. Here, we show that the most commonly-used primer set for 16S rRNA sequencing, targeting the V3–V4 hypervariable regions, is particularly susceptible to this off-target amplification, while another commonly used primer set, targeting the V1–V2 primer region, shows almost no off-target amplification, as outlined in Fig. 1 below. While this off-target amplification does not appear to affect research using stool or skin swab samples, we would urge all groups carrying out metataxonomic analysis of low microbial biomass human biopsy samples using high throughput sequencing to use the V1–V2 primer set in future.
Figure 1

Proposed mechanism for off target amplification of mammalian DNA by V3–V4 primers, as opposed to V1–V2. (A) DNA extracted from human biopsies is known to contain large proportions of human DNA. In these circumstances V3–V4 degenerate primers, which also align to region in human mitochondrial DNA as shown can bind and amplify human DNA. There is no such alignment for V1–V2 degenerate primers. (B) Off target amplification significantly alters the 16S rRNA gene sequencing profile of a sample.

Proposed mechanism for off target amplification of mammalian DNA by V3–V4 primers, as opposed to V1–V2. (A) DNA extracted from human biopsies is known to contain large proportions of human DNA. In these circumstances V3–V4 degenerate primers, which also align to region in human mitochondrial DNA as shown can bind and amplify human DNA. There is no such alignment for V1–V2 degenerate primers. (B) Off target amplification significantly alters the 16S rRNA gene sequencing profile of a sample.

Materials/methods

Sample collection

Breast tissue was collected from women undergoing breast surgery at Cork University Hospital, Cork, Ireland. Breast tumour core-biopsies were aseptically resected using an Achieve 14G Breast Biopsy System (Iskus Health, UT, USA). The specimens were transported in sterile PBS to the lab, where they were flash-frozen and kept at − 80 °C until further processing. DNA from the specimens was purified following the protocol and reagents provided in the Ultra Deep Microbiome Prep (Molzym, GmbH & Co. KG., Bremen, Germany) and eluted in 100 µl of Tris–HCl.

DNA purification

Samples were processed and DNA purified following the procedures specified in protocols listed in Table 1. In all cases, DNA was eluted in Tris–HCl buffer and stored at − 20 °C until further analysis.
Table 1

Samples and corresponding DNA extraction strategy.

SampleDNA extraction strategy
Breast: tumour and normalMolzym Ultradeep Microbiome (Molzym, Bremen, Germany)
Oesophageal biopsiesAllPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany) with modifications[10]
Skin Swab samplesQIAamp UCP Pathogen Mini Kit (Qiagen, Hilden, Germany)
Stool samplesRepeated bead beating method as previously described, with modifications[11, 12]
Samples and corresponding DNA extraction strategy.

16S rRNA gene sequencing library preparation

Genomic DNA was amplified by PCR with primers targeting the hypervariable V1–V2 region or the V3–V4 region of the 16S rRNA gene. Table 2 details the primers sequences (underlined) included for compatibility with the Illumina 16S Metagenomic Sequencing Protocol (Illumina, CA, USA).
Table 2

Primers used for 16S rRNA gene sequencing analysis.

RegionNameF/RSequence
V1–V2[13, 14]S-D-Bact-0027-b-S-20F5′-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG AGM GTT YGA TYM TGG CTC AG
S-D-Bact-0338-a-A-18R5′-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA G GCT GCC TCC CGT AGG AGT
V3–V4[15]S-D-Bact-0341-b-S-17F5′ TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACG GGN GGC WGC AG
S-D-Bact-0785-a-A-21R5′ GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA G GAC TAC HVG GGT ATC TAA TCC
Primers used for 16S rRNA gene sequencing analysis. For Breast Tumour and Normal Adjacent samples, amplification was performed in 50 µl reactions, containing 1X NEBNext High Fidelity 2X PCR Master Mix (NEB, USA), 0.5 µM of each primer, 8 µl template (5–15 ng/µl) and 12 µl nuclease free water. The thermal profile included an initial 98 °C × 30 s denaturation, followed by 25 cycles of denaturation at 98 °C × 10 s, annealing at 55 °C × 30 s for V3–V4 or 62 °C × 30 s for V1–V2 and extension at 72 °C × 30 s. Plus a final extension at 72 °C × 5 min. Amplification was confirmed by running 5 µl of PCR product on a 2% agarose gel, by visualisation of a ≈ 310 bp band for V1–V2 and ≈ 460 bp band for V3–V4. Faecal microbial genomic DNA was amplified using Phusion High-Fidelity DNA Polymerases (Thermo Scientific, Massachusetts, USA) with the PCR thermocycler protocol as follows: Initiation step of 98 °C for 3 min followed by 25 cycles of 98 °C for 30 s, 55 °C for 60 s, and 72 °C for 20 s, and a final extension step of 72 °C for 5 min. Oesophageal biopsies and skin swab samples microbial genomic DNA was amplified using MTP Taq DNA Polymerase (Merck KGaA, Darmstadt, Germany) with the PCR thermocycler protocol as follows: Initiation step of 94 °C for 1 min followed by 35 cycles of 94 °C for 60 s, 55 °C for 45 s, and 72 °C for 30 s, and a final extension step of 72 °C for 5 min. An index PCR was performed to add sample specific DNA barcodes to sample amplicons in accordance with the Illumina 16S Metagenomic Sequencing Protocol (Illumina, California, USA)[16]. Libraries DNA concertation was quantified using a Qubit fluorometer (Invitrogen) using the ‘High Sensitivity’ assay and samples were pooled at a standardised concentration[16]. The pooled library was sequenced on the Illumina MiSeq platform (Illumina, California, USA) utilising 2 × 300 bp chemistry.

16S rRNA sequence analysis

The quality of the paired-end sequencing data was visualised using FastQC v (0.11.9), and trimmed using Trimmomatic v (0.39) ensuring a minimum average quality of 25. Reads were then imported into R environment v (3.6.3)[17] to be resolved into Amplicon Sequence Variants by the DADA2 package v (1.12).

Contamination control

In all samples a contamination control strategy was implemented in keeping with the RIDE checklist as proposed by Eisenhofer et al.[18], incorporating aseptic techniques and a variety of negative controls from different stages of the sample-to-sequence data process. Retrospective contamination assessment and removal based on sequencing data from negative controls was also performed following published guidelines[19].

Retrospective bioinformatics based removal of human amplicons

Sequencing reads aligning to the human genome (GRCh38) within the fasta file generated by DADA2 were identified using bowtie2[20]. To confirm reads mapped to the human genome were not erroneously aligned bacterial reads, all human aligning reads were classified with Mothur[21], using the RDP database v (11.4) as a reference.

Statistical analysis and data visualisation

All statistical analysis was carried out in the R environment, using the following libraries: Phyloseq v (1.30), Vegan v (2.5.6), ggplot2 v (3.3.0), reshape2 v (1.4.3).

Ethical approval

All procedures in this study were performed in accordance to national ethical guidelines, following ethical approval from the University College Cork Clinical Research Committee.

Informed consent

Patients provided written informed consent for sample collection and subsequent analyses.

Results and discussion

All three sampled biopsy sites where an overwhelming ratio of host DNA was expected (breast, breast tumour and oesophageal) showed significant off target amplification of human DNA when amplified using the V3–V4 primer set (Fig. 2). This was not seen when sequencing samples with lower levels of human DNA, such as skin swabs and stool samples. An average of 34.1% of all Amplicon Sequence Variants (ASV) detected in normal breast tissue samples were shown to align to the human genome GRCh38 using bowtie2.This included the most prevalent ASV, which was identified further using BLAST as Homo sapiens haplogroup H8 mitochondrion, complete genome (Accession no. MN986463.1) with an E-value of 7e − 138 and 100% identity. In the breast tumour samples, 77.2% of all ASV’s detected aligned to the human genome, with the most prevalent ASV again being identified as Homo sapiens haplogroup H8 mitochondrion, complete genome (Accession no. MN986463.1) with an E-value of 7e − 138 and 100% identity. This situation was identical in Oesophageal biopsies, with a 55.6% of ASVs aligning to the human genome (Homo sapiens haplogroup H8 mitochondrion, complete genome (Accession no. MN986463.1) with an E-value of 7e − 138 and 100% identity). The skin swab samples showed a much lower level of amplification of human DNA, but these reads aligned to chromosomal DNA, most frequently Homo sapiens chromosome 17, clone RP11-646F1, complete sequence and were present in very low levels.
Figure 2

The scale of the problem of off-target amplification. % of sequencing reads produced by Miseq 2 × 300 bp sequencing of amplicons produced by primers targeting the V3–V4 regions shown to align to the human genome.

The scale of the problem of off-target amplification. % of sequencing reads produced by Miseq 2 × 300 bp sequencing of amplicons produced by primers targeting the V3–V4 regions shown to align to the human genome. While human contamination is a very common problem in amplification-free shotgun metagenomic sequencing strategies[22], it is under reported as an issue for 16S rRNA gene sequencing, due to the use of bacteria/archaea specific primers. However, degenerate primers are routinely used for 16S rRNA sequencing[23]. This increases coverage, in terms of the number of 16S rRNA sequences matched by at least one primer, but also allows for off target amplification of non-bacterial DNA. Figure 1A shows that the V3–V4 primers align to a region within the human mitochondrial DNA. We show here that when the ratio of host:bacterial DNA is overwhelming, human mitochondrial DNA can be amplified by primers targeting the 16S rRNA gene region. To ensure the validity of the results, reads identified as aligning to the human genome using Bowtie2 were classified using the Mothur[21] classifier trained on the RDP database. In all cases the reads identified as aligning to the human genome could not be classified when screened against the RDP database as shown in Table 3 below.
Table 3

Summary of Mothur output when classifying reads identified as aligning to the human genome by Bowtie2.

Sample% reads unclassified at Kingdom Level% reads unclassified at Phylum level
Oesophageal samples99.53732350.4626765
Normal adjacent samples98.8675761.132424
Tumour samples98.7100271.289973
Skin samples99.85884680.1411532
Summary of Mothur output when classifying reads identified as aligning to the human genome by Bowtie2. The most heavily affected sample type in our study (breast tumour tissue) was reanalysed by performing a pairwise comparison of samples amplified with the V3–V4 and V1–V2 primer sets (Fig. 3).
Figure 3

Rarefaction curve generated by plotting observed species vs read depth on a per sample basis. (A) Rarefaction curve prior to removal of human genome aligning reads. (B) Rarefaction curve following removal of human genome aligning reads.

Rarefaction curve generated by plotting observed species vs read depth on a per sample basis. (A) Rarefaction curve prior to removal of human genome aligning reads. (B) Rarefaction curve following removal of human genome aligning reads. Looking initially at the rarefaction curves produced by the sequencing data corresponding to the previously mentioned paired V1–V2 and V3–V4 primer pair amplified breast tumour sample there is a clear difference between the two groups. This is done by plotting new species against number of reads per sample. Figure 3A below shows that the distribution of samples in this 2D plane appears to be stochastic prior to the removal of human reads. Figure 3B, following removal of human reads, shows clearly that samples amplified with the V1–V2 primer pair consistently yield more observable species, a greater number of reads per sample, and a plateauing of the rarefaction curve which suggests sufficient sampling depth is available for accurate characterisation. The community structure in samples amplified with V1–V2 primers was visually similar to those amplified with V3–V4 primers (Fig. 4A) and no bacterial family was found to be significantly elevated using one primer set over the other as per Wilcoxon signed-rank test, once p-values had been corrected for multiple testing using the FDR method (Supplementary Table 1). There was also no significant difference in terms of Shannon diversity (Fig. 4B), indicating choice of primers did not have any adverse effect on the downstream results. Of considerable interest to any groups carrying out low biomass research in the future, is the huge discrepancy in the number of reads yielded once human contamination had been filtered out. As can be seen in Fig. 4C, samples amplified with primers targeting the V1–V2 region have a consistently and significantly higher number of ASVs per sample following the removal of ASV’s aligning to the human genome.
Figure 4

Pairwise comparison of matched samples using primers targeting the V1–V2 and V3–V4 regions of the 16S rRNA gene fragment. (A) Sample composition at the family level of paired samples. (B) Average Shannon Diversity comparison between samples amplified using V1–V2 primers (red) and V3–V4 primers (blue). (C) Percentage of total sequencing reads aligning to human genome. In both (B) and (C) statistical testing is performed using Wilcoxon signed-rank test.

Pairwise comparison of matched samples using primers targeting the V1–V2 and V3–V4 regions of the 16S rRNA gene fragment. (A) Sample composition at the family level of paired samples. (B) Average Shannon Diversity comparison between samples amplified using V1–V2 primers (red) and V3–V4 primers (blue). (C) Percentage of total sequencing reads aligning to human genome. In both (B) and (C) statistical testing is performed using Wilcoxon signed-rank test.

Future perspectives

Third generation sequencing technologies, such as those produced by Oxford Nanopore Technologies and Pacific BioSiences are now being utilised in 16S rRNA gene sequencing experiments. The Pacific BioSciences SMRT platform has seen the greatest promise in this regard with the implementation of “Circular Consensus Sequencing” in conjunction with denoising algorithms, allowing for the production of long reads of high quality[24]. Earl et al. showed that this new method using degenerate primers targeting the entire 16S rRNA gene, still resulted in off target amplification of the human genome[25]. This study also noted that this off target amplification was related to the ratio of human to bacterial DNA. The human genome must be considered when designing or choosing primers now and in the future. Supplementary Table 1.
  23 in total

1.  Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.

Authors:  Patrick D Schloss; Sarah L Westcott; Thomas Ryabin; Justine R Hall; Martin Hartmann; Emily B Hollister; Ryan A Lesniewski; Brian B Oakley; Donovan H Parks; Courtney J Robinson; Jason W Sahl; Blaz Stres; Gerhard G Thallinger; David J Van Horn; Carolyn F Weber
Journal:  Appl Environ Microbiol       Date:  2009-10-02       Impact factor: 4.792

2.  Towards standards for human fecal sample processing in metagenomic studies.

Authors:  Paul I Costea; Georg Zeller; Shinichi Sunagawa; Eric Pelletier; Adriana Alberti; Florence Levenez; Melanie Tramontano; Marja Driessen; Rajna Hercog; Ferris-Elias Jung; Jens Roat Kultima; Matthew R Hayward; Luis Pedro Coelho; Emma Allen-Vercoe; Laurie Bertrand; Michael Blaut; Jillian R M Brown; Thomas Carton; Stéphanie Cools-Portier; Michelle Daigneault; Muriel Derrien; Anne Druesne; Willem M de Vos; B Brett Finlay; Harry J Flint; Francisco Guarner; Masahira Hattori; Hans Heilig; Ruth Ann Luna; Johan van Hylckama Vlieg; Jana Junick; Ingeborg Klymiuk; Philippe Langella; Emmanuelle Le Chatelier; Volker Mai; Chaysavanh Manichanh; Jennifer C Martin; Clémentine Mery; Hidetoshi Morita; Paul W O'Toole; Céline Orvain; Kiran Raosaheb Patil; John Penders; Søren Persson; Nicolas Pons; Milena Popova; Anne Salonen; Delphine Saulnier; Karen P Scott; Bhagirath Singh; Kathleen Slezak; Patrick Veiga; James Versalovic; Liping Zhao; Erwin G Zoetendal; S Dusko Ehrlich; Joel Dore; Peer Bork
Journal:  Nat Biotechnol       Date:  2017-10-02       Impact factor: 54.908

3.  Tumor Microbiome Diversity and Composition Influence Pancreatic Cancer Outcomes.

Authors:  Erick Riquelme; Yu Zhang; Liangliang Zhang; Maria Montiel; Michelle Zoltan; Wenli Dong; Pompeyo Quesada; Ismet Sahin; Vidhi Chandra; Anthony San Lucas; Paul Scheet; Hanwen Xu; Samir M Hanash; Lei Feng; Jared K Burks; Kim-Anh Do; Christine B Peterson; Deborah Nejman; Ching-Wei D Tzeng; Michael P Kim; Cynthia L Sears; Nadim Ajami; Joseph Petrosino; Laura D Wood; Anirban Maitra; Ravid Straussman; Matthew Katz; James Robert White; Robert Jenq; Jennifer Wargo; Florencia McAllister
Journal:  Cell       Date:  2019-08-08       Impact factor: 41.582

4.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies.

Authors:  Anna Klindworth; Elmar Pruesse; Timmy Schweer; Jörg Peplies; Christian Quast; Matthias Horn; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-08-28       Impact factor: 16.971

5.  A non-endoscopic device to sample the oesophageal microbiota: a case-control study.

Authors:  Daffolyn R Fels Elliott; Alan W Walker; Maria O'Donovan; Julian Parkhill; Rebecca C Fitzgerald
Journal:  Lancet Gastroenterol Hepatol       Date:  2016-11-12

6.  Can Targeting Non-Contiguous V-Regions With Paired-End Sequencing Improve 16S rRNA-Based Taxonomic Resolution of Microbiomes?: An In Silico Evaluation.

Authors:  Nishal Kumar Pinna; Anirban Dutta; Mohammed Monzoorul Haque; Sharmila S Mande
Journal:  Front Genet       Date:  2019-07-12       Impact factor: 4.599

7.  Impact of Host DNA and Sequencing Depth on the Taxonomic Resolution of Whole Metagenome Sequencing for Microbiome Analysis.

Authors:  Joana Pereira-Marques; Anne Hout; Rui M Ferreira; Michiel Weber; Ines Pinto-Ribeiro; Leen-Jan van Doorn; Cornelis Willem Knetsch; Ceu Figueiredo
Journal:  Front Microbiol       Date:  2019-06-12       Impact factor: 5.640

8.  High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.

Authors:  Benjamin J Callahan; Joan Wong; Cheryl Heiner; Steve Oh; Casey M Theriot; Ajay S Gulati; Sarah K McGill; Michael K Dougherty
Journal:  Nucleic Acids Res       Date:  2019-10-10       Impact factor: 16.971

9.  Improving saliva shotgun metagenomics by chemical host DNA depletion.

Authors:  Clarisse A Marotz; Jon G Sanders; Cristal Zuniga; Livia S Zaramela; Rob Knight; Karsten Zengler
Journal:  Microbiome       Date:  2018-02-27       Impact factor: 14.650

10.  Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes.

Authors:  Joshua P Earl; Nithin D Adappa; Jaroslaw Krol; Archana S Bhat; Sergey Balashov; Rachel L Ehrlich; James N Palmer; Alan D Workman; Mariel Blasetti; Bhaswati Sen; Jocelyn Hammond; Noam A Cohen; Garth D Ehrlich; Joshua Chang Mell
Journal:  Microbiome       Date:  2018-10-23       Impact factor: 14.650

View more
  4 in total

1.  Whole genome sequencing of cyanobacterium Nostoc sp. CCCryo 231-06 using microfluidic single cell technology.

Authors:  Yuguang Liu; Patricio Jeraldo; William Herbert; Samantha McDonough; Bruce Eckloff; Dirk Schulze-Makuch; Jean-Pierre de Vera; Charles Cockell; Thomas Leya; Mickael Baqué; Jin Jen; Marina Walther-Antonio
Journal:  iScience       Date:  2022-04-25

Review 2.  The Microbiota of the Human Mammary Ecosystem.

Authors:  Leónides Fernández; Pia S Pannaraj; Samuli Rautava; Juan M Rodríguez
Journal:  Front Cell Infect Microbiol       Date:  2020-11-20       Impact factor: 5.293

3.  Formalin-Fixed Paraffin-Embedded (FFPE) samples are not a beneficial replacement for frozen tissues in fetal membrane microbiota research.

Authors:  Rochelle Hockney; Caroline H Orr; Gareth J Waring; Inge Christiaens; Gillian Taylor; Stephen P Cummings; Stephen C Robson; Andrew Nelson
Journal:  PLoS One       Date:  2022-03-17       Impact factor: 3.240

4.  Higher off-target amplicon detection rate in MiSeq v3 compared to v2 reagent kits in the context of 16S-rRNA-sequencing.

Authors:  James A Groot; Raiza Hasrat; Mari-Lee Odendaal; Mei Ling J N Chu; Eelco Franz; Debby Bogaert; Thijs Bosch; Wouter A A de Steenhuijsen Piters
Journal:  Sci Rep       Date:  2022-10-01       Impact factor: 4.996

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.