Literature DB >> 28053217

High Interlaboratory Reproducibility and Accuracy of Next-Generation-Sequencing-Based Bacterial Genotyping in a Ring Trial.

Alexander Mellmann¹, Paal Skytt Andersen², Stefan Bletz³, Alexander W Friedrich⁴, Thomas A Kohl^5,6, Berit Lilje², Stefan Niemann^5,6, Karola Prior⁷, John W Rossen⁴, Dag Harmsen⁷.

Abstract

Today, next-generation whole-genome sequencing (WGS) is increasingly used to determine the genetic relationships of bacteria on a nearly whole-genome level for infection control purposes and molecular surveillance. Here, we conducted a multicenter ring trial comprising five laboratories to determine the reproducibility and accuracy of WGS-based typing. The participating laboratories sequenced 20 blind-coded Staphylococcus aureus DNA samples using 250-bp paired-end chemistry for library preparation in a single sequencing run on an Illumina MiSeq sequencer. The run acceptance criteria were sequencing outputs >5.6 Gb and Q30 read quality scores of >75%. Subsequently, spa typing, multilocus sequence typing (MLST), ribosomal MLST, and core genome MLST (cgMLST) were performed by the participants. Moreover, discrepancies in cgMLST target sequences in comparisons with the included and also published sequence of the quality control strain ATCC 25923 were resolved using Sanger sequencing. All five laboratories fulfilled the run acceptance criteria in a single sequencing run without any repetition. Of the 400 total possible typing results, 394 of the reported spa types, sequence types (STs), ribosomal STs (rSTs), and cgMLST cluster types were correct and identical among all laboratories; only six typing results were missing. An analysis of cgMLST allelic profiles corroborated this high reproducibility; only 3 of 183,927 (0.0016%) cgMLST allele calls were wrong. Sanger sequencing confirmed all 12 discrepancies of the ring trial results in comparison with the published sequence of ATCC 25923. In summary, this ring trial demonstrated the high reproducibility and accuracy of current next-generation sequencing-based bacterial typing for molecular surveillance when done with nearly completely locked-down methods.

Entities: CellLine Chemical Disease Gene Species

Keywords: cgMLST; interlaboratory reproducibility; molecular subtyping; ring trial; whole-genome sequencing

Mesh：

Year: 2017 PMID： 28053217 PMCID： PMC5328459 DOI： 10.1128/JCM.02242-16

Source DB: PubMed Journal: J Clin Microbiol ISSN： 0095-1137 Impact factor: 5.948

INTRODUCTION

Today, next-generation sequencing (NGS) is increasingly used to determine the genetic relationships of bacteria on a nearly whole-genome level for infection control purposes and phylogenetic studies. In a shotgun approach, fragmented bacterial DNA is usually sequenced in a highly parallel way, resulting in millions of short reads (up to 400 nucleotides in length) that are either compared to an ideally closely related reference genome (mapping) or are assembled de novo for the subsequent extraction of genomic information. Currently, two different approaches, based on single nucleotide polymorphisms (SNPs) (1, 2) or allelic changes (core genome multilocus sequence typing [cgMLST]) (3–5), are used to extract whole-genome sequencing (WGS) information for subsequently displaying the genotypic relationship. For continuous infection control surveillance, typing methods should be highly reproducible, ideally generating identical typing results across different laboratories. Previously, we demonstrated that this is the case for spa typing that is based on the DNA sequence determination of a repetitive region of the protein A gene (spa) of Staphylococcus aureus using Sanger sequencing (6). For NGS data, it is known that different sequencing technologies exhibit different error characteristics at the read level (7, 8). Moreover, the analysis pipelines, including assemblers and analytical parameters, can influence the final typing results (7, 9, 10). However, it is unknown how reproducible the overall process of WGS-based bacterial typing is when applied in a multicenter study. Therefore, we investigated the reproducibility and accuracy of microbial WGS-based typing, employing an international ring trial of five laboratories in three European countries (Denmark, Germany, and The Netherlands).

RESULTS AND DISCUSSION

All five laboratories met the minimum run quality criteria in a single run without repetition (Table 1). Mean sample coverage was 131-fold. However, the coverage per sample varied markedly between 29- and 256-fold, but only samples NGSRT07C1 and NGSRT16C3 exhibited coverages of <75-fold (see Table S1 in the supplemental material). Also, the mean N50 assembly metric parameters differed markedly, whereas the mean percentages of called cgMLST targets were quite even between the laboratories (Table 1). Sample N50 values and percentages of called cgMLST targets were consistently low in samples with <75-fold coverage (see Table S1). All of the reported spa types, sequence types (STs), ribosomal STs (rSTs), and cluster types (CTs) were identical (see Table S1). Also, Sanger sequencing-based spa typing and BIGSdb revealed identical spa types, STs, and rSTs. Only in the two low-coverage samples, the rST, CT, and also the ST for NGSRT16C3, were not assigned. Moreover, in sample NGSRT13C3, the sequence of the 16 repeats containing spa type t032 was not determined.

TABLE 1

Summary of sequencing run characteristics and cumulative analysis results from the five participating laboratories

Laboratory designation	Sequencing run characteristics					Sequencing analysis results
Laboratory designation	Cluster density (K/mm²)	Run output (Gb)	% reads >Q30	Mean read length (bp)^a	Mean fold coverage (SD^b, range)^c	Mean N50	Mean % called cgMLST targets (SD, range)
C1	935	8.6	89.9	225	129 (27, 43–170)	56,620	97.5 (5.7, 73.3–99.6)
C2	715	6.7	92.2	237	106 (17, 76–144)	153,228	99.2 (0.4, 98.0–99.8)
C3	1,297	10.6	87.7	238	169 (54, 29–256)	253,745	98.9 (2.2, 89.7–100)
C4	878	8.2	88.4	221	123 (16, 98–164)	101,873	99.2 (0.4, 98.5–99.9)
C5	1,247	10.8	78.1	180	129 (24, 86–187)	225,594	99.3 (0.3, 98.7–100)
Total mean	1,014	9.0	87.3	220	131 (27, 66–184)	158,212	98.8 (1.8, 91.6–99.9)

Read lengths after Illumina base calling and adapter removal.

SD, standard deviation.

Coverage is calculated for an S. aureus genome size of 2.8 Mb.

Summary of sequencing run characteristics and cumulative analysis results from the five participating laboratories Read lengths after Illumina base calling and adapter removal. SD, standard deviation. Coverage is calculated for an S. aureus genome size of 2.8 Mb. In-depth analysis of the up to 1,861 reported cgMLST genes per sample demonstrated that the majority of isolates shared identical allelic profiles (Fig. 1). A comparison with the controls (NGSRT06-15), which exhibited no deviation, further corroborated this high reproducibility independent from DNA extraction. Samples NGSRT11C1 and NGSRT11C4 varied in one gene (hypothetical protein, SACOL0424), very likely due to a misassembly at the end of the gene. Also, for NGSRT02C1, a wrong allele was called in SACOL2642 (hypothetical protein) due to a low local coverage of 2-fold. These findings are in line with those of a previous study, where an N50 plateau effect for Illumina data was noted above a threshold of 75-fold average coverage (7).

FIG 1

Minimum-spanning tree illustrating the comparison of cgMLST results from the 20 S. aureus isolates sent to five laboratories (C1 to C5) in a blinded fashion. Each circle represents a single genotype, i.e., an allelic profile based on up to 1,861 target genes (23) present in the isolates with the “pairwise ignoring missing values” option turned on in the SeqSphere+ software during comparison. The circles are named with the sample ID(s) colored by the participating laboratory, and the sizes are proportional to the number of isolates with an identical genotype. The numbers on connecting lines display the number of differing alleles between the connected genotypes. The control samples colored in white originated from independent cultivations and DNA extractions of samples NGSRT06 to NGSRT15. In total, of 183,927 cgMLST allele calls, only 3 (0.0016%) were wrong, resulting in an average of 0.03 wrong alleles per sample. This low error rate does not significantly affect the ruling in or out of samples during outbreak investigations (11) and ensures high intra- and interlaboratory reproducibilities. To control the accuracy, the 1,861 cgMLST target sequencing results from sample NGSRT16, i.e., ATCC 25923, which were identical for all five laboratories (Fig. 1), were compared with the published genome sequence of this strain (NZ_CP009361) revealing 12 single nucleotide polymorphisms (SNPs) (see Table S2). Suitable primers for amplifying and subsequent Sanger sequencing were designed for the 12 regions spanning these SNPs (see Table S2) and confirmed that all 12 SNPs were correctly determined during the ring trial. Most likely, the discrepancy with the published sequence can be explained by microevolutionary events that occurred during the freezing, thawing, and repeated cultivation of this strain, which was originally isolated from a clinical specimen in 1945 (12). Similar effects were already previously detected (7). Our study comprises four limitations. First, we sent only DNA instead of living organisms to the five laboratories, and thereby did not test the influence of DNA extraction methods. The results from the 10 controls analyzed in parallel in the ring trial organizer's laboratory indicated that this might be of minor importance, as long as high-quality DNA is deployed. Second, we only compared data from one type of sequencing machine and one type of sequencing chemistry. However, there is evidence from the literature that results from a single laboratory are not significantly biased by the sequencing machine nor by the sequencing chemistry (13, 14). Third, all of the ring trial participants used the same software for analysis. To partially address this issue, we used another tool to verify ST and rST assignment and demonstrate reproducibility across different tools. The Global Microbial Identifier (GMI) wet- and dry-laboratory proficiency test attempts to overcome such limitations, but is challenged by making in-depth comparisons of the heterogeneous results (15). Finally, the accuracy was only determined for cgMLST targets. In accordance with recent practices in public health and clinical microbiology (16), the intergenic regions in particular were not controlled here. In summary, with the shown high reproducibility and accuracy of WGS-based microbial typing when using a standardized methodology, our study provides the basis for a proficiency testing program, which is one crucial component for ensuring the quality of next-generation sequencing in clinical laboratory practice (17).

MATERIALS AND METHODS

Twenty Staphylococcus aureus DNA samples (NGSRT01 to NGSRT20) (Table 2), selected from a diverse collection of isolates (livestock-associated, community-/hospital-acquired methicillin-susceptible and -resistant S. aureus from sporadic cases and outbreaks, and a quality control strain), along with duplicates to assess intralaboratory reproducibility, were distributed in a blind-coded manner to the five participating laboratories. DNA samples were prepared using the MagAttract HMW DNA kit (Qiagen, Hilden, Germany) in accordance with the manufacturer's instructions with the addition of 120 U lysostaphin (Sigma, Taufkirchen, Germany) to lyse methicillin-resistant S. aureus (MRSA). In addition, the laboratories received a protocol (supplemental material) for performing a single sequencing run on an Illumina MiSeq sequencer using the Nextera XT library preparation kit and the 250-bp paired-end sequencing chemistry version 2 (Illumina, San Diego, CA, USA). Sequencing indices from the Nextera XT index kit were used for multiplexing; participants were free to choose any index combination for the samples. The run acceptance criteria were a sequencing output >5.6 Gb (to achieve an average sequencing coverage of >100-fold for the 20 samples with genome sizes of 2.8 Mb) and a Q30 read quality score of >75%. Otherwise, the sequencing run had to be repeated. SeqSphere+ software version 2.4 or higher (Ridom GmbH, Münster, Germany) run on a Microsoft Windows operating system was used with default parameters for quality trimming, de novo assembly, and allele calling. Specifically, reads were trimmed at their 5′- and 3′-ends until an average base quality of 30 was reached in a window of 20 bases and subsequently down-sampled to 120-fold coverage. De novo assembly was performed using the incorporated Velvet tool version 1.1.04 and a SeqSphere+ specific k-mer optimization procedure (18). SeqSphere+ searched the defined genes using BLAST (19) with parameters described previously (20). In addition, the genes were assessed for quality, i.e., the absence of frameshifts and ambiguous nucleotides. A gene was called only if all above-mentioned criteria were met. Thus, determined spa types (10), MLST sequence types (ST) (21), ribosomal MLST types (rST) (22), cgMLST cluster types (CT), and allelic profiles of the 1,861 cgMLST genes (23) were reported to A.M., the ring trial organizer.

TABLE 2

Characteristics of the 20 human S. aureus isolates that were sent as DNA samples to the five participating laboratories in a blinded fashion and used as controls

Sample ID		Spa type (based on Sanger sequencing)	Comment/reference
Ring trial	Original	Spa type (based on Sanger sequencing)	Comment/reference
NGSRT01	00468	t011	Livestock-associated MRSA
NGSRT02	00551	t011	Livestock-associated MRSA, identical cgMLST genotype as NGSRT01
NGSRT03	01346	t011	Livestock-associated MRSA
NGSRT04	01354	t010	Classical hospital-acquired MRSA
NGSRT05	01360	t011	Livestock-associated MRSA, identical cgMLST genotype as NGSRT03
NGSRT06^a	02180	t002	Central European community-acquired PVL^b-positive MRSA
NGSRT07^a	02482	t008	US typical community-acquired PVL-positive MRSA
NGSRT08^a	02560	t044	Central European community-acquired PVL-positive MRSA
NGSRT09^a	02638	t012	Classical hospital-acquired MRSA
NGSRT10^a	02786	t843	mecC-positive MRSA
NGSRT11^a	02949	t843	mecC-positive MRSA
NGSRT12^a	02994	t003	Classical hospital-acquired MRSA
NGSRT13^a	03039	t032	Classical hospital-acquired MRSA
NGSRT14^a	COL	t008	MRSA strain COL
NGSRT15^a	COL	t008	Duplicate of MRSA reference strain COL
NGSRT16	ATCC 25923	t021	MSSA quality control strain ATCC 25923
NGSRT17	P1	t001	Isolate P1 from reference 23
NGSRT18	P3	t001	Isolate P3 from reference 23
NGSRT19	P4	t001	Isolate P4 from reference 23, identical cgMLST genotype as NGSRT18
NGSRT20	P12	t001	Isolate P12 from reference 23

These samples were separately cultivated, and DNA was extracted and sequenced as controls.

PVL, Panton-Valentine leukocidin.

Characteristics of the 20 human S. aureus isolates that were sent as DNA samples to the five participating laboratories in a blinded fashion and used as controls These samples were separately cultivated, and DNA was extracted and sequenced as controls. PVL, Panton-Valentine leukocidin. In parallel, ST and rST were also determined from the assembly contigs using the BIGSdb system (21, 22, 24). Moreover, spa typing data using Sanger sequencing were available for all isolates (Table 2). Furthermore, 10 strains (NGSRT06 to NGSRT15; in the following-named control) were separately cultivated, and DNA was extracted and sequenced in the ring trial organizer's laboratory. For detailed analysis and visualization, a minimum-spanning tree based on the reported cgMLST allelic profiles was constructed using SeqSphere+ with the option “pairwise ignoring missing values” turned on. Finally, as we had included as NGSRT16 the well-known quality control strain ATCC 25923 that was recently completely sequenced (12), we determined whether potential discrepancies were due to NGS sequencing errors during the ring trial. Discrepancies that were detected between the published sequence (NZ_CP009361) and the ring trial data from all participants were further analyzed by bidirectional Sanger sequencing from the same DNA that was also sent to the participants. For further confirmatory Sanger sequencing, flanking regions of approximately 250 nucleotides up- and downstream of the detected discrepancies were extracted from the genome sequence, and primers were designed using the NCBI Primer-BLAST service (25). The amplified fragments were purified and Sanger sequenced as described previously (26). The resulting chromatogram files were also analyzed using the SeqSphere+ software.

Accession number(s).

Raw reads are deposited at European Nucleotide Archive (ENA) under study accession number PRJEB15231.

25 in total

1. Performance comparison of benchtop high-throughput sequencing platforms.

Authors: Nicholas J Loman; Raju V Misra; Timothy J Dallman; Chrystala Constantinidou; Saheer E Gharbia; John Wain; Mark J Pallen
Journal: Nat Biotechnol Date: 2012-05 Impact factor: 54.908

2. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

3. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach.

Authors: Thomas A Kohl; Roland Diel; Dag Harmsen; Jörg Rothgänger; Karen Meywald Walter; Matthias Merker; Thomas Weniger; Stefan Niemann
Journal: J Clin Microbiol Date: 2014-04-30 Impact factor: 5.948

4. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction.

Authors: Jian Ye; George Coulouris; Irena Zaretskaya; Ioana Cutcutache; Steve Rozen; Thomas L Madden
Journal: BMC Bioinformatics Date: 2012-06-18 Impact factor: 3.169

5. Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain.

Authors: Keith A Jolley; Carly M Bliss; Julia S Bennett; Holly B Bratcher; Carina Brehony; Frances M Colles; Helen Wimalarathna; Odile B Harrison; Samuel K Sheppard; Alison J Cody; Martin C J Maiden
Journal: Microbiology (Reading) Date: 2012-01-27 Impact factor: 2.777

6. BIGSdb: Scalable analysis of bacterial genome variation at the population level.

Authors: Keith A Jolley; Martin C J Maiden
Journal: BMC Bioinformatics Date: 2010-12-10 Impact factor: 3.169

7. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology.

Authors: Alexander Mellmann; Dag Harmsen; Craig A Cummings; Emily B Zentz; Shana R Leopold; Alain Rico; Karola Prior; Rafael Szczepanowski; Yongmei Ji; Wenlan Zhang; Stephen F McLaughlin; John K Henkhaus; Benjamin Leopold; Martina Bielaszewska; Rita Prager; Pius M Brzoska; Richard L Moore; Simone Guenther; Jonathan M Rothberg; Helge Karch
Journal: PLoS One Date: 2011-07-20 Impact factor: 3.240

8. Identification of intermediate in evolutionary model of enterohemorrhagic Escherichia coli O157.

Authors: Christian Jenke; Shana R Leopold; Thomas Weniger; Jörg Rothgänger; Dag Harmsen; Helge Karch; Alexander Mellmann
Journal: Emerg Infect Dis Date: 2012-04 Impact factor: 6.883

9. Complete Genome Sequence of the Quality Control Strain Staphylococcus aureus subsp. aureus ATCC 25923.

Authors: Todd J Treangen; Rosslyn A Maybank; Sana Enke; Mary Beth Friss; Lynn F Diviak; David K R Karaolis; Sergey Koren; Brian Ondov; Adam M Phillippy; Nicholas H Bergman; M J Rosovitz
Journal: Genome Announc Date: 2014-11-06

10. GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.

Authors: Sebastian Jünemann; Karola Prior; Andreas Albersmeier; Stefan Albaum; Jörn Kalinowski; Alexander Goesmann; Jens Stoye; Dag Harmsen
Journal: PLoS One Date: 2014-09-08 Impact factor: 3.240

33 in total

1. Genotypic Characterization of Livestock-Associated Methicillin-Resistant Staphylococcus aureus Isolates of Clonal Complex 398 in Pigsty Visitors: Transient Carriage or Persistence?

Authors: N Effelsberg; S Udarcev; H Müller; I Kobusch; S Linnemann; M Boelhauve; R Köck; A Mellmann
Journal: J Clin Microbiol Date: 2019-12-23 Impact factor: 5.948

2. Establishment and Evaluation of a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Pseudomonas aeruginosa.

Authors: Hauke Tönnies; Karola Prior; Dag Harmsen; Alexander Mellmann
Journal: J Clin Microbiol Date: 2021-02-18 Impact factor: 5.948

3. Accuracy of Different Bioinformatics Methods in Detecting Antibiotic Resistance and Virulence Factors from Staphylococcus aureus Whole-Genome Sequences.

Authors: Amy Mason; Dona Foster; Phelim Bradley; Tanya Golubchik; Michel Doumith; A Sarah Walker; Angela Kearns; Tim Peto; N Claire Gordon; Bruno Pichon; Zamin Iqbal; Peter Staves; Derrick Crook
Journal: J Clin Microbiol Date: 2018-08-27 Impact factor: 5.948

4. Validation of Whole-Genome Sequencing for Identification and Characterization of Shiga Toxin-Producing Escherichia coli To Produce Standardized Data To Enable Data Sharing.

Authors: Anne Holmes; Timothy J Dallman; Sharif Shabaan; Mary Hanson; Lesley Allison
Journal: J Clin Microbiol Date: 2018-02-22 Impact factor: 5.948

Review 5. Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis.

Authors: Scott Quainoo; Jordy P M Coolen; Sacha A F T van Hijum; Martijn A Huynen; Willem J G Melchers; Willem van Schaik; Heiman F L Wertheim
Journal: Clin Microbiol Rev Date: 2017-10 Impact factor: 26.132

6. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile.

Authors: Stefan Bletz; Sandra Janezic; Dag Harmsen; Maja Rupnik; Alexander Mellmann
Journal: J Clin Microbiol Date: 2018-05-25 Impact factor: 5.948

7. A Quantitative Metagenomic Sequencing Approach for High-Throughput Gene Quantification and Demonstration with Antibiotic Resistance Genes.

Authors: Bo Li; Xu Li; Tao Yan
Journal: Appl Environ Microbiol Date: 2021-07-27 Impact factor: 4.792

8. An inter-laboratory study to investigate the impact of the bioinformatics component on microbiome analysis using mock communities.

Authors: Denise M O'Sullivan; Ronan M Doyle; Sasithon Temisak; Nicholas Redshaw; Alexandra S Whale; Grace Logan; Jiabin Huang; Nicole Fischer; Gregory C A Amos; Mark D Preston; Julian R Marchesi; Josef Wagner; Julian Parkhill; Yair Motro; Hubert Denise; Robert D Finn; Kathryn A Harris; Gemma L Kay; Justin O'Grady; Emma Ransom-Jones; Huihai Wu; Emma Laing; David J Studholme; Ernest Diez Benavente; Jody Phelan; Taane G Clark; Jacob Moran-Gilad; Jim F Huggett
Journal: Sci Rep Date: 2021-05-19 Impact factor: 4.379

9. A Whole-Genome-Based Gene-by-Gene Typing System for Standardized High-Resolution Strain Typing of Bacillus anthracis.

Authors: Mostafa Y Abdel-Glil; Alexandra Chiaverini; Giuliano Garofolo; Antonio Fasanella; Antonio Parisi; Dag Harmsen; Keith A Jolley; Mandy C Elschner; Herbert Tomaso; Jörg Linde; Domenico Galante
Journal: J Clin Microbiol Date: 2021-06-18 Impact factor: 5.948

10. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.

Authors: Jonathan Foox; Scott W Tighe; Charles M Nicolet; Justin M Zook; Marta Byrska-Bishop; Wayne E Clarke; Michael M Khayat; Medhat Mahmoud; Phoebe K Laaguiby; Zachary T Herbert; Derek Warner; George S Grills; Jin Jen; Shawn Levy; Jenny Xiang; Alicia Alonso; Xia Zhao; Wenwei Zhang; Fei Teng; Yonggang Zhao; Haorong Lu; Gary P Schroth; Giuseppe Narzisi; William Farmerie; Fritz J Sedlazeck; Don A Baldwin; Christopher E Mason
Journal: Nat Biotechnol Date: 2021-09-09 Impact factor: 54.908