| Literature DB >> 32887913 |
Stéphanie Nouws1,2, Bert Bogaerts1,2, Bavo Verhaegen3, Sarah Denayer3, Denis Piérard4, Kathleen Marchal2,5,6, Nancy H C Roosens1, Kevin Vanneste1, Sigrid C J De Keersmaecker7.
Abstract
Whole genome sequencing (WGS) has proven to be the ultimate tool for bacterial isolate characterization and relatedness determination. However, standardized and harmonized workflows, e.g. for DNA extraction, are required to ensure robust and exchangeable WGS data. Data sharing between (inter)national laboratories is essential to support foodborne pathogen control, including outbreak investigation. This study evaluated eight commercial DNA preparation kits for their potential influence on: (i) DNA quality for Nextera XT library preparation; (ii) MiSeq sequencing (data quality, read mapping against plasmid and chromosome references); and (iii) WGS data analysis, i.e. isolate characterization (serotyping, virulence and antimicrobial resistance genotyping) and phylogenetic relatedness (core genome multilocus sequence typing and single nucleotide polymorphism analysis). Shiga toxin-producing Escherichia coli (STEC) was selected as a case study. Overall, data quality and inferred phylogenetic relationships between isolates were not affected by the DNA extraction kit choice, irrespective of the presence of confounding factors such as EDTA in DNA solution buffers. Nevertheless, completeness of STEC characterization was, although not substantially, influenced by the plasmid extraction performance of the kits, especially when using Nextera XT library preparation. This study contributes to addressing the WGS challenges of standardizing protocols to support data portability and to enable full exploitation of its potential.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32887913 PMCID: PMC7474065 DOI: 10.1038/s41598-020-71207-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of selected DNA extraction kit characteristics and performance.
| DNA extraction kit | DNA extraction method | Average completion time (h:min) | Average DNA yield (µg/ml culture) ± s.d. | Average DNA concentration (ng/µl) ± s.d. | DNA purity (average ± s.d.) | Length range of fragments (kb) | Average DIN** ± s.d. | General convenience of protocol | Plasmid extraction performance | Remark | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample cost (€)*/total no. reactions | A260/280 | A260/230 | |||||||||
| DNeasy Blood & Tissue***- Qiagen | Solid-phase (silica column)[ | 2h30 | 2.40 ± 0.42 | 23.99 ± 4.24 | 1.99 ± 0.01 | 1.78 ± 0.13 | [19.88,25.49] | 7.07 ± 0.24 | + + + | Impaired | Used by CDC[ |
| €3.76/250 | |||||||||||
| DNeasy UltraClean Microbial- Qiagen | Solid-phase (bead beating)[ | 2h50 | 0.78 ± 0.31 | 28.13 ± 11.04 | 1.92 ± 0.04 | 1.87 ± 0.17 | [22.50, > 60.00] | 8.42 ± 0.33 | + + | Impaired | Bead-beating: also Gram positive bacteria[ |
| €2.87/250 | |||||||||||
| Easy-DNA Genomic DNA Purification***- Invitrogen | Salting-out[ | 3h30 | 1.98 ± 0.83 | 19.82 ± 8.28 | 1.90 ± 0.01 | 2.09 ± 0.04 | [22.81, > 60.00] | 8.96 ± 0.83 | + | Impaired | HMW DNA[ |
| €2.29/200 | |||||||||||
| GenElute Bacterial Genomic DNA***- Sigma-Aldrich | Solid-phase (silica column)[ | 2h40 | 5.53 ± 0.95 | 33.16 ± 5.70 | 1.91 ± 0.02 | 2.19 ± 0.13 | [23.37,51.59] | 9.13 ± 0.18 | + + + | Good | Especially for Gram-negative bacteria[ |
| €2.59/350 | |||||||||||
| Genomic-tip 20/G- Qiagen | Solid-phase (anion-exchange)[ | 8h20 | 1.17 ± 0.29 | 17.58 ± 4.40 | 1.84 ± 0.04 | 1.71 ± 0.19 | [53.31, > 60.00] | 8.93 ± 0.37 | + | Good | HMW DNA ensured[ |
| €10.88/75 | |||||||||||
| MasterPure Complete DNA Purification***- Lucigen | Salting-out[ | 2h50 | 2.67 ± 0.53 | 38.14 ± 7.51 | 1.87 ± 0.03 | 1.79 ± 0.21 | [58.58, > 60.00] | 9.53 ± 0.21 | + | Moderate | Recommended by Illumina[ |
| €3.61/200 | |||||||||||
| NucliSENS miniMag- bioMérieux | Solid-phase (magnetic beads) | 1h50 | 0.72 ± 0.18 | 7.15 ± 2.50 | 2.09 ± 0.07 | 0.76 ± 0.69 | [2.46, > 60.00] | 7.47 ± 0.70 | + + | Moderate | Possibility of automation |
| €7.19/48 | |||||||||||
| Wizard Genomic DNA Purification***- Promega | Salting-out[ | 2h50 | 2.06 ± 0.65 | 22.66 ± 4.55 | 1.94 ± 0.03 | 2.10 ± 0.09 | > 60.00 | 9.40 ± 0.34 | + | Impaired | High quality DNA ensured[ |
| €0.69/500 | |||||||||||
The number of cells used as starting material ranged from 7.36 × 108 ± 8.56 × 107 per ml. All averages and ranges were calculated from the three replicates of the seven DNA extracts. Both DNA concentration and yield are shown since the applied workflows accompanying the kits differed in recommended DNA elution/rehydration volume. General convenience (labor intensity, turn-around time and handling convenience) of the kits is indicated as experienced in this study, from less ( +) to more (+ + +) convenient. Excluding the Genomic-tip 20/G kit with its one-day protocol, all solid-phase procedures were experienced as user-friendly, while the salting-out procedures were experienced as less convenient. Similarly, the plasmid extraction performance as observed in this study from uniquely mapped plasmid reads per million input reads were rated from less (Impaired), medium (Moderate) to best (Good) performing. Fragment lengths are shown as ranges because the TapeStation Genomic DNA ScreenTape only gives exact measurements until 60 kb. Note that only for the NucliSENS miniMag, extra DNA fragments of ~ 2.5 and ~ 5.5 kb were systematically observed across all DNA extracts, resulting in the very large range.
NGS next generation sequencing, HMW high molecular weight.
*Prices as of August 2020 (excl. TVA, shipping, and handling costs). Prices were calculated from kits with highest throughput. Cost of extra products or materials required but not provided with the kit were not taken into account.
**DIN: DNA Integrity Number, ranging from 1.00 for highly degraded to 10.00 for highly intact DNA[30].
***DNA was eluted/rehydrated in 10 mM Tris–HCl (pH 8.5) instead of in the EDTA-containing buffer provided with the kit (see Supplementary Table S1 online).
Characteristics of the selected isolates.
| TIAC reference | Outbreak/non-outbreak | Origin | Characterization results with conventional methods | Characterization results with WGS analyses | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Serotype | AMR phenotype | Serotype | AMR genes identified with all kits | |||||||
| TIAC1151 | O | Beef | O157:H7 | 1 | 1 | 1 | 1 | Susceptible | O157:H7θ | |
| TIAC1152 | O | Beef | O157:H7 | 1 | 1 | 1 | 1 | Susceptible | O157:H7 | |
| TIAC1153 | N | Swab carcass bovine | O157:H7 | 1 | 1 | 1 | 1 | AMP KAN, STR SUL TET TMP | O157:H7 | |
| TIAC1165 | O | Human feces | O157:H7 | 1 | 1 | 1 | 1 | Susceptible | O157:H7o | |
| TIAC1169 | O | Human feces | O157:H7 | 1 | 1 | 1 | 1 | Susceptible | O157:H7* | |
| TIAC1638 | N | Human feces | O157:H7 | 1 | 1 | 1 | 1 | Not tested | O157:H7 | |
| TIAC1660 | N | Human feces | O113:H21 | 0 | 1 | 0 | 1 | Not tested | O113:H21* | |
Results on STEC serotyping, AMR susceptibility (disc diffusion), and presence of virulence genes assessed previously[18] are shown (for food isolates according to ISO/TS13136:2012). Gene presence is indicated with ‘1’, absence with ‘0’. The O- and H-type, and AMR genotype determined with WGS for each sample are indicated. Obtained serotypes and AMR genotypes per sample were independent from the applied kit. AMR gene names refer to the ResFinder[31,32] database. When no AMR genes were detected, this is indicated as “–”. No ambiguities with regard to H-typing were observed. Exceptions in O-typing are represented with a symbol:
θAmbiguous O-typing was retrieved when processed with the DNeasy Blood & Tissue kit, in one of three sequencing run replicates (allele 201 was not called with BLAST + for both the O157-encoding wzx and wzy genes), but could be resolved in the other two sequencing replicates.
oAmbiguous O-typing was retrieved when processed with the DNeasy UltraClean Microbial kit, in two of three sequencing run replicates (allele 201 was not called with BLAST + for both the O157-encoding wzx and wzy genes), but could be resolved in one of the three sequencing replicates.
*Ambiguous O-typing was retrieved when processed with the Genomic-tip 20/G kit (TIAC1169: allele 201 was not called with BLAST + for both the O157-encoding wzx and wzy genes; TIAC1660: allele 115 was not called with BLAST + for gene wzx encoding the O113-genotype).
AMP ampicillin, KAN kanamycin, STR streptomycin, SUL sulphonamides, TET tetracycline, TMP trimethoprim.
Figure 1Overview of median mapping depths against the Sakai E. coli O157:H7 reference genome and the Sakai E. coli pO157 plasmid for sequencing run replicates TIAC1151 and TIAC1165 per kit, sequenced in run 2. The median read mapping depth for each sample was calculated using a sliding window of 10,000 bases shifted by 5,000 bases for each data point. Abbreviations: DNeasy Blood & Tissue kit (DNeasy B & T), DNeasy UltraClean Microbial kit (DNeasy UltraClean), Easy-DNA gDNA Purification kit (Easy-DNA), GenElute Bacterial gDNA kit (GenElute), Genomic-tip 20/G kit (gTip 20), MasterPure Complete DNA Purification kit (MasterPure), NucliSENS miniMag (NucliSens), Wizard gDNA Purification kit (Wizard).
Figure 2Overview of the virulence genotype obtained for all samples. Presence and absence of virulence genes are indicated in green and red, respectively, as determined using BLAST + and SRST2. *Virulence genes detected only with SRST2; —Missed virulence genes, referred to as false negatives (neither detected with SRST2 nor BLAST + while presence of the gene was expected, i.e. detected in the same isolate processed with a different kit, or detected in a sequencing run replicate of the isolate).
Figure 3Average number of reads mapping uniquely to the Sakai E. coli pO157 plasmid reference normalized per one million trimmed input reads for the eight kits. Number of reads mapping uniquely against the Sakai E. coli pO157 plasmid reference per million input reads when mapping simultaneously against the Sakai E. coli pO157 plasmid (NC_002128.1) and Sakai E. coli O157:H7 genome (NC_002695.2) reference. Values are averaged over all E. coli O157:H7 samples (TIAC1151, TIAC1152, TIAC1153, TIAC1165, TIAC1169 and TIAC1638) that were generated with each kit, without inclusion of the sequencing run replicate results for TIAC1151 and TIAC1165. Bars represent the standard deviation across samples for each kit. Significant differences in average plasmid reads per million trimmed input reads were identified with the Kruskal–Wallis test (n: 48, α: 0.05, p-value: 2.80 × 10–7) followed by Dunn post-hoc analysis with Holm correction, as depicted in the accompanying table with significant values depicted in bold.
Figure 4cgMLST-based tree of all samples. A minimum spanning tree was created with GrapeTree using the MSTreeV2 method on all outbreak and non-outbreak samples generated with the eight kits, excluding sequencing run replicates. All outbreak samples (TIAC1151, TIAC1152, TIAC1165 and TIAC1169) consistently cluster together, while non-outbreak samples TIAC1153, TIAC1638 and TIAC1660 are separated from the outbreak cluster and delineated per isolate. The scale bar represents the number of cgMLST allele differences between samples. One cgMLST allele difference with other outbreak samples was observed for only four samples (TIAC1152 generated with the Genomic-tip 20/G, TIAC1152 generated with the DNeasy UltraClean Microbial kit, TIAC1165 generated with the DNeasy UltraClean Microbial kit, and TIAC1153 generated with the Easy-DNA gDNA Purification kit), which is not visible in the figure, because of the large scale.
Figure 5SNP-based tree of all O157:H7 samples. A maximum likelihood SNP tree was generated using the K2 nucleotide substitution model, containing all O157:H7 samples. Non-O157:H7 samples (TIAC1660) were excluded from SNP calling, due to high divergence from the Sakai E. coli O157:H7 reference genome. All outbreak samples (TIAC1151, TIAC1152, TIAC1165 and TIAC1169) consistently clustered together, irrespective of the employed kit. Within the outbreak clade, for all TIAC1165 samples, a limited number of discrepant SNPs with other outbreak samples existed, largely confined to a hypothetical transposase region (ydcC gene). The non-outbreak samples (TIAC1153 and TIAC1638) were separated from the outbreak clade, and clustered together per isolate. Notably, for TIAC1153 samples, a small number of SNPs different with the reference genome between the sample generated with the MasterPure Complete DNA Purification kit and all other TIAC1153 samples, was observed. This difference was solely due to masking of a low-quality region (see “Results”). The distance scale bar represents the average number of nucleotide substitutions per site.