Literature DB >> 34757834

External Quality Assessment of SARS-CoV-2 Sequencing: an ESGMD-SSM Pilot Trial across 15 European Laboratories.

Fanny Wegner^1,2,3, Tim Roloff^1,2,3, Gilbert Greub^4,5,6, Adrian Egli^1,2,6, Michael Huber⁷, Samuel Cordey⁸, Alban Ramette^9,6, Yannick Gerth¹⁰, Claire Bertelli^4,5, Madlen Stange^1,2,3, Helena M B Seth-Smith^1,2,3,6, Alfredo Mari^1,2,3, Karoline Leuzinger^11,12, Lorenzo Cerutti¹³, Keith Harshman¹³, Ioannis Xenarios¹³, Philippe Le Mercier¹⁴, Pascal Bittel⁹, Stefan Neuenschwander⁹, Onya Opota^4,5,6, Jonas Fuchs¹⁵, Marcus Panning¹⁵, Charlotte Michel¹⁶, Marie Hallin¹⁶, Thomas Demuyser¹⁷, Ricardo De Mendonca¹⁸, Paul Savelkoul^19,6, Jozef Dingemans¹⁹, Brian van der Veer¹⁹, Stefan A Boers²⁰, Eric C J Claas^20,6, Jordy P M Coolen²¹, Willem J G Melchers^21,6, Marianne Gunell^22,23, Teemu Kallonen^22,23, Tytti Vuorinen^22,23, Antti J Hakanen^22,23, Eva Bernhoff²⁴, Marit Andrea Klokkhammer Hetland²⁴, Hadar Golan Berman^25,26, Sheera Adar²⁶, Jacob Moran-Gilad^27,6, Dana G Wolf^25,28, Stephen L Leib^9,6, Oliver Nolte¹⁰, Laurent Kaiser⁸, Stefan Schmutz⁷, Verena Kufner⁷, Maryam Zaheri⁷, Alexandra Trkola⁷, Hege Vangstein Aamot^29,30,6, Hans H Hirsch^12,31,32.

Abstract

This first pilot trial on external quality assessment (EQA) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whole-genome sequencing, initiated by the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) Study Group for Genomic and Molecular Diagnostics (ESGMD) and the Swiss Society for Microbiology (SSM), aims to build a framework between laboratories in order to improve pathogen surveillance sequencing. Ten samples with various viral loads were sent out to 15 clinical laboratories that had free choice of sequencing methods and bioinformatic analyses. The key aspects on which the individual centers were compared were the identification of (i) single nucleotide polymorphisms (SNPs) and indels, (ii) Pango lineages, and (iii) clusters between samples. The participating laboratories used a wide array of methods and analysis pipelines. Most were able to generate whole genomes for all samples. Genomes were sequenced to various depths (up to a 100-fold difference across centers). There was a very good consensus regarding the majority of reporting criteria, but there were a few discrepancies in lineage and cluster assignments. Additionally, there were inconsistencies in variant calling. The main reasons for discrepancies were missing data, bioinformatic choices, and interpretation of data. The pilot EQA was overall a success. It was able to show the high quality of participating laboratories and provide valuable feedback in cases where problems occurred, thereby improving the sequencing setup of laboratories. A larger follow-up EQA should, however, improve on defining the variables and format of the report. Additionally, contamination and/or minority variants should be a further aspect of assessment.

Entities: Chemical

Keywords: NGS; external quality assessment; ring trial; whole-genome sequencing

Mesh：

Year: 2021 PMID： 34757834 PMCID： PMC8769736 DOI： 10.1128/JCM.01698-21

Source DB: PubMed Journal: J Clin Microbiol ISSN： 0095-1137 Impact factor: 5.948

INTRODUCTION

Whole-genome sequencing (WGS) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) isolates has been used in many countries mainly to determine (i) specific viral lineages and (ii) the molecular epidemiological context. WGS will become increasingly important both as a typing technology in virological routine diagnostics of individual patients and for epidemiological surveillance. The European Centre for Disease Prevention and Control (ECDC) recently published a document to support the usage and implementation of WGS of SARS-CoV-2 in European countries (1). Quality management is a central element for ensuring accurate and robust laboratory results for both routine diagnostic and reference laboratories. Internal and external controls are integral to the assessment of quality, e.g., in an ISO-accredited environment. In particular, external quality assessments (EQAs) represent a cornerstone in introducing new test methods, capacity building, and ensuring a baseline quality level. This is even more important in a pandemic situation, where a novel, previously unknown pathogen necessitates prompt development, validation, and rollout of assays for which microbiological expertise and diagnostic knowledge are limited. In this context, EQAs can ensure and improve testing quality and result comparability. They also allow, if sufficiently scaled, the comparison of the test performances of in-house-developed and commercial assays. To date, no EQA results have been reported focusing on WGS of SARS-CoV-2, although some publications have shared quality aspects of a single center’s experiences (2, 3). Along these lines, individual centers in Switzerland have reported protocols on WGS with different epidemiological questions (4, 5). In the past, the Swiss Institute of Bioinformatics has coordinated EQAs for viral metagenomics (6) and bacterial typing (7), which is an important first step in the capacity forming of WGS technology between diagnostic laboratories. Many other European countries are following suit. For this reason, the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) Study Group for Genomic and Molecular Diagnostics (ESGMD) and the Swiss Society of Microbiology (SSM) aimed to conduct a first EQA pilot trial focusing on SARS-CoV-2 WGS with a focus on three key aspects of genome analysis: (i) identification of single nucleotide polymorphisms (SNPs) and deletions, (ii) identification of Pango lineages (8), and (iii) assessing genomic relatedness using a molecular epidemiological approach. The aim is to exchange knowledge and build a framework between diagnostic laboratories in order to improve quality for the continuing demands for high-quality genomes to address epidemiological questions during an ongoing pandemic.

MATERIALS AND METHODS

Design of the external quality assessment.

The EQA was designed such that each laboratory could choose its own sequencing method as well as bioinformatic analysis. This introduces variability and makes disentangling methodological effects more difficult but best reflects clinical reality. Moreover, it provides direct feedback to laboratories concerning their sequencing pipeline. An overview of the individual analysis pipelines is shown in Table 1, and a full description can be found in the supplemental material.

TABLE 1

Summary of the methods used by the participating centers

Center	Primer panel (manufacturer)	Sequencing technology	Bioinformatics pipeline	Reference(s)
1	Artic nCoV-2019 v3	Illumina MiSeq, 150-bp SE	SmaltAlign	14
2	Artic nCoV-2019 v3	Nanopore	Artic bioinformatics pipeline v1.1.3	15
3	Artic nCoV-2019 v3	Illumina MiSeq, 150-bp PE	virSEAK pipeline (JSI Medical Systems)
4	CleanPlex SARS-CoV-2 (Paragon Genomics)	Illumina MiSeq, 150-bp PE	GENCOV	29
5	Artic nCoV-2019 v3	Illumina MiSeq, 150-bp PE	Custom Galaxy pipeline	16, 17
6	Custom	Nanopore	MACOVID pipeline	18, 19
7	EasySeq RC-PCR SARS-CoV-2 (NimaGen)	Illumina, MiniSeq, 150-bp PE	Custom pipeline	20
8	EasySeq RC-PCR SARS-CoV-2 (NimaGen)	Illumina, MiniSeq, 150-bp PE	EasySeq pipeline	21, 22
9	Midnight primer panel (IDT)	Nanopore	Artic bioinformatics pipeline	15
10	Artic nCoV-2019 v3	Nanopore	Artic bioinformatics pipeline	15, 20
11	Artic nCoV-2019 v3	Nanopore	SusCovONT	23
12	QIAseq SARS-CoV-2 primer panel (Qiagen)	Illumina MiniSeq, 150-bp PE	Illumina BaseSpace Dragen COVID lineage
13	Illumina COVIDSeq test	Illumina NovaSeq, 50-bp PE	Health 2030 Genome Center in Geneva pipeline	24
14	Illumina COVIDSeq test	Illumina NovaSeq, 150-bp PE	Custom pipeline	25, 26
15	Artic nCoV-2019 v3	Illumina NextSeq, 150-bp PE	COVGAP	4, 27, 28

A detailed method description by each center can be found in the supplemental material. SE, single end; PE, paired end; RC-PCR, reverse complement PCR.

Summary of the methods used by the participating centers A detailed method description by each center can be found in the supplemental material. SE, single end; PE, paired end; RC-PCR, reverse complement PCR. The desired key aspects for the EQA (SNPs/indels, Pango lineage assignment, and cluster assignment) as well as additional features such as read depth and percentage of missing data were reported back to the sequencing team at the University Hospital Basel (coordinating center for this pilot study).

Samples.

Large quantities of virus suspensions were needed for the EQA. For this reason, it was decided to culture the virus to generate enough material. Vero76 cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) (10% fetal bovine serum, 1% glutamine) in flat-bottom 96-well plates (Thermo Fisher Scientific, MA, USA). One hundred microliters of SARS-CoV-2-positive naso-oropharyngeal fluids was added, and cells were incubated for 48 h at 37°C. The cell culture supernatants were harvested, and SARS-CoV-2 RNA was quantified using the laboratory-developed Basel-SCoV2-112bp nucleic acid test (NAT), as described previously (9), targeting specific viral sequences of the spike glycoprotein S gene. A total of 10 samples (named NGS1 to -10) of the cell culture supernatants were frozen and shipped on dry ice to participating laboratories. The viral isolates originated from routine diagnostic samples from Clinical Virology, University Hospital Basel, reflecting diverse epidemiological backgrounds. The cell culture supernatants used contained a range of viral loads of SARS-CoV-2, reflecting viral loads typically observed in routine diagnostics of acutely ill coronavirus disease 2019 (COVID-19) patients (see Table S1 in the supplemental material). To ensure that no changes occurred during culture, both primary material and the cell culture supernatant were sequenced and compared; the resulting sequences were identical (results not shown).

Assessment of variant calling.

SNPs, compared to the reference Wuhan-Hu-1 strain, were assessed as reported (usually in the form of a list of variants). In order to compare results across centers and samples, a score was developed. As there is no “correct solution” to compare results against, a majority consensus approach was chosen; i.e., an SNP/indel was considered correct if the majority of laboratories detected it (ignoring missing data). If the correct base was called, a score of 1 was given per site. Incorrect base calls were scored as −1; missing data received a score of 0. If an ambiguous base was called where a true SNP occurred and the correct base was included in the ambiguity code (IUPAC), a score of 0.5 was given. Otherwise, reported ambiguous sites were not counted as SNPs. In the case of deletions that were present but not reported, we chose to set the score to −1 given that centers were instructed to report deletions and that a failure to report could be an artifact of the bioinformatics pipeline. The score was finally normalized per sample by the number of correct SNPs.

Assessment of lineage and cluster assignment.

The “correct answer” was again assumed to be the majority consensus. Clusters were relabeled to unify the nomenclature and compare laboratories. We did not provide a strict definition of a cluster but allowed laboratories to determine clusters based on internal criteria. In addition, no classical epidemiological metadata were provided, to help with potential interpretations.

RESULTS

Genome depth, coverage, and assembly.

The mean read depth per center ranged from 313× to 37,172×, which reflects a >100-fold difference across centers. However, this was mostly driven by center 14, which sequenced to an extremely high read depth (Fig. 1A; see also Table S2 in the supplemental material). Centers 7 and 9 are on the lower end of the spectrum (mean depths ± standard deviations [SD] of 325× ± 275× and 313× ± 132×, respectively), whereas all other laboratories usually sequenced to a mean depth of between 1,000× and 8,000×.

FIG 1

(A) Mean read depth per sample (x axis) and center (y axis). Colors have been scaled for high resolution for values of between 0 and 10,000; values larger than this are displayed in the same color. (B) Percentage of N’s in the genome per sample (x axis) and center (y axis). (C) Score for variant detection per sample (x axis) and center (y axis) as well as mean score for each center across all samples and mean score for each sample across centers (ø). The numerical values underlying each plot can be found in Tables S2 to S4 in the supplemental material. The majority of samples could be assembled to a consensus genome by all centers, with the exception of NGS8, for which assembly failed partially for center 7 and completely for center 9 as seen by the percentage of missing data shown in Fig. 1B (numeric values are shown in Table S3).

SNPs and indels.

Variants have been assessed as reported and are displayed in Fig. S1A to J as a dot plot indicating the presence and absence of the variant. Some centers reported mixed sites using ambiguous codes, while others did not. Moreover, not all centers reported deletions. Whether these had been correctly called in the consensus genome was therefore checked for each variation and, if present, specifically marked in Fig. S1. Additionally, Table S5 lists the number of correct, wrong, and missing SNP calls for each sample and laboratory. A variant calling score was developed in order to quantify and compare the variant calls per sample and laboratory (see Materials and Methods). The results are shown in Fig. 1C (numerical values are shown in Table S4), with average scores per sample across all centers (row marked with ø) also shown as a measure of congruence across laboratories. As expected, samples with a higher proportion of missing data produced a lower score if the affected regions harbored many variations (e.g., NGS3 by center 7, which had a coverage of 91%). Samples NGS7, -9, and -10 had many deletions, and laboratories not reporting these deletions received a correspondingly lower score. NGS8, however, was a sample with which many centers had problems. Many laboratories reported missing data for variant loci. Additionally, incorrect base calls were made, in particular by center 15 (Fig. S1H). A combination of several of these factors can in turn result in a lower mean score for a center (e.g., center 7, with an average score of 0.75) (Table S4).

Lineage assignment.

Correct lineage assessment is of course dependent on correct SNP calling and sufficient coverage across the genome. The majority of centers assigned all samples to the correct lineage (Table 2). Two centers with the lowest mean depths failed in correctly assigning the lineage of one sample, NGS8 (B.1.177) (Table S2). Center 7, which provided a 57% complete genome (mean read depth of 39×), could assign the sample to lineage B. Rather surprisingly, the laboratory with by far the highest depth, center 14, assigned the lineages of two samples incorrectly: NGS7 and -9 were both assigned only as lineage A, as opposed to the more accurate correct solution of A.27. This was due to an outdated version of Pangolin.

TABLE 2

Pango lineage assignments

Center	Lineage assignment
Center	NGS1	NGS2	NGS3	NGS4	NGS5	NGS6	NGS7	NGS8	NGS9	NGS10
1	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
2	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
3	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
4	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
5	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
6	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
7	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B	A.27	B.1.1.7
8	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
9	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	NA	A.27	B.1.1.7
10	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
11	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
12	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
13	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7
14	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A	B.1.177	A	B.1.1.7
15	B.1.416.1	B.1.36.17	B.1.177	B.1.258	B.1.36.17	B.1.177	A.27	B.1.177	A.27	B.1.1.7

NA, not applicable; lineage assignment was impossible. Shading highlights cases discussed in more detail in the text.

Pango lineage assignments NA, not applicable; lineage assignment was impossible. Shading highlights cases discussed in more detail in the text.

Cluster identification.

Almost all centers reported the same clusters (Table 3). Samples NGS2 and NGS5 formed one cluster (cluster B); NGS3, NGS6, and NGS8 formed the second cluster (cluster C); and NGS7 and NGS9 formed the third cluster (cluster E).

TABLE 3

Cluster assignments

Center	Cluster assignment
Center	NGS1	NGS2	NGS3	NGS4	NGS5	NGS6	NGS7	NGS8	NGS9	NGS10
1	A	B	C	D	B	C	E	C	E	F
2	A	B	C	D	B	C	E	C	E	F
3	A	B	C	D	B	C	E	C	E	F
4	A	B	C	D	B	C	E	C	E	F
5	A	B	C	D	B	C	E	C	E	F
6	A	B	C	D	B	C	E	C	E	F
7	A	B	C	D	B	C	E	C*	E	F
8	A	B	C	D	B	C	E	C	E	F
9	A	B	C	D	B	C	E	NA	E	F
10	A	B	C	D	B	C	E	C	E	F
11	A	B	C	D	B	C	E	C	E	F
12	B	B	C	B	B	C	E	C	E	F
13	A	B	C	D	B	C	E	C	E	F
14	A	B	C	A	B	C	E	C	E	F
15	A	B	C	D	B	C	E	C	E	F

NA, not applicable; cluster assignment was impossible. Shading highlights discrepant cases discussed in more detail in the text. * marks that the center reported an assumed cluster assignment based on a partial genome.

Cluster assignments NA, not applicable; cluster assignment was impossible. Shading highlights discrepant cases discussed in more detail in the text. * marks that the center reported an assumed cluster assignment based on a partial genome. The low coverage for sample NGS8 was a challenge for the two above-mentioned centers 7 and 9. However, center 7 reported a presumed allocation into the correct cluster using the partial genome (asterisk in Table 3). Center 9 could not identify the cluster due to unsuccessful sequencing (9× mean depth [Table S2, highlighted in red]). This resulted in a too-small cluster. Center 12 had difficulties with two samples (NGS1 and -4) and allocated them incorrectly to cluster B (together with NGS2 and -5) (Table 2, shading). This was despite them falling into different Pango lineages (Table 2). Center 14 incorrectly assigned NGS1 and NGS4 to a separate cluster (Table 2, shading), again despite differing Pango lineage assignments. However, the other clusters were correctly assigned by both laboratories.

DISCUSSION

Impact of methodological choices.

Given that laboratories had free choice over their experimental as well as analytical protocols, disentangling the individual effects of these differences is impossible. A factor known to influence sequencing success is viral load. For example, NGS8, while having a viral load comparable to those of NGS9 and -10 (threshold cycle [C] values of 28.4 and 28.1, respectively), was on the lower end of the spectrum (C value of 28) (see Table S1 in the supplemental material). This could be why many centers had problems with this sample. When grouping the sequencing methods roughly into Illumina single-end versus Illumina paired-end versus Oxford Nanopore Technologies (ONT) methods, a platform-related effect does not seem to have occurred (Fig. S2). In fact, centers 7 and 8 had very similar sequencing setups, with the exception of their analysis pipelines (Table 1). Center 8, however, was able to sequence to a greater depth and was therefore better able to perform accurate genomic analyses as it achieved overall higher coverage across the genome. Moreover, the small genome of SARS-CoV-2 and the lack of long repeat regions allow the use of short reads or single-end sequencing, which would be more problematic for WGS of other pathogens. The mean depth had an effect only insofar as a too-low depth leads to too much missing data. Once a sufficient read depth had been achieved, there was no further clear correlation between the score of variant calling and depth (Fig. S3). In general, depth across the genome can be very uneven, and average depth as a measure does not fully take this into account. Technically, read depths of between 100× and 200× can be enough for genotyping. For example, samples NGS2 and -5 for center 7 have 191× and 131× coverages, respectively, as well as a small amount of missing data and a high variant calling score (Fig. 1). However, when coverage is uneven, missing data can still be an issue, even at a higher average depth (e.g., NGS10 for center 7 at 246×) (Fig. 1; Table S2). For accurately genotyping SARS-CoV-2, it is necessary to capture the entirety of the genome and not just some areas (even of biologically important areas such as the S gene) as the software used to determine the lineage built its models based on whole-genome diversity (the pangoLEARN algorithm within Pangolin) (8). It is therefore important to strive for the best coverage across the genome (i.e., a small amount of missing data), and “sufficient read depth,” as mentioned above, is therefore a function of this. More even coverage in amplicon-based sequencing can, for example, be achieved by balancing primer sets. Instead of average depth, other factors such as variant reporting capacity, mapping quality, as well as interpretation of data play a larger role. This is an important point for diagnostic laboratories with respect to operational costs. The importance of this was highlighted by center 14, which sequenced to by far the highest depth but had difficulties with lineage and cluster assignments despite very good variant calling. Upon receiving a preliminary report, center 14 reexamined its analysis pipeline and found that it had used an outdated Pangolin and pangoLEARN version. The Pango lineage nomenclature is dynamic, meaning that the nomenclature system develops as SARS-CoV-2 evolves, and lineage definitions and names can change over time (8). The pilot EQA provided valuable feedback for the center to improve its workflows. Cluster assignment, on the other hand, highlighted another challenge for the development of any EQA: communication and interpretation. The majority of other centers determined a cluster as a putative transmission cluster that differs between 0 and maximally 2 SNPs (thresholds vary slightly) (see supplemental methods in the supplemental material). Two centers had difficulties, which could be resolved upon feedback. Center 12 had interpreted the terminology “cluster” differently and instead reported the Nextclade assignment (10); center 14 in turn deemed samples NGS1 and NGS4 to belong to a single cluster. While they share an ancestor, most other laboratories deemed them sufficiently different to assign them to two separate clusters. In fact, they differ in 27 SNPs, whereas the other true clusters (clusters B, C, and E in Table 3) had 0 to 1 SNPs between genomes. This highlights that there is a certain element of subjectivity in data interpretation when lacking clear definitions as well as the need to clarify the objective of the task (in this case the assessment of transmission clusters rather than simply related sequences in a phylogenetic tree). An important factor for routine sequencing is cost. In general, the amplicon-based protocols used in this study consist of a reverse transcription step, an amplification step, library preparation, and sequencing. As the first two steps are mostly the same for different sequencing technologies, cost is driven mainly by library preparation and sequencing itself. Here, Oxford Nanopore sequencing allows faster data generation due to real-time base calling, while sequencing on an Illumina machine typically takes slightly more than a day (11). Cost-wise, the price per sample will decrease with increasing throughput. But the many library preparation kits available as well as the wide range of sequencing machines used here (Table 1) make comparisons between the centers difficult. All protocols used by the participating centers in this EQA used amplicon-based sequencing, and primer bias can have an influence on sequencing accuracy. Here, primer sets vary between laboratories (Table 1). For the Artic v3 primers (which are public), we find no apparent bias in the data reported here compared to the other primer panels. However, centers 7 and 8 used the same primer panel but did not detect the variant G21255C in samples NGS3, -6, and -8 (Fig. 1C, F, and H). This SNP is present in almost all representatives of lineage B.1.177 (12). Whether this failure in detection is truly due to primer bias cannot be conclusively answered, however, as commercial primer sequences are often not public. A possibility to deal with this issue bioinformatically is to trim primer sequences prior to assembly. Nevertheless, primer bias is a real issue if it leads to dropouts. Fortunately, this is actively monitored by the community. For example, dropouts of the Artic v3 panel have been reported, especially for Beta and Delta variants. For this reason, a new primer panel has been developed to avoid high-frequency variant sites in the newer lineages (13).

Factors not assessed in this pilot EQA.

This pilot EQA focused on reporting findings related to consensus genome sequences but did not include minority variant reporting. Center 15 reported issues with contamination for sample NGS8, yet lineage and cluster assignments were successful as the key sites were not affected. However, some contamination spilled over into the consensus genome as evidenced by a number of wrong variant calls (Fig. S1H). Similarly, some laboratories reported mixed loci as SNPs in their reports, although we were mostly interested in fixed changes. Differentiating between contamination and true, albeit rare, mixed infections or possible in-host evolution can be very difficult, especially in a clinical setting with high sample throughput. Assessment of contamination and analysis of minority variants would allow the provision of more detailed feedback to the laboratories. Contamination, for example, would likely be an isolated event for a center, resulting in mixed sites, while a true mixture would be prevalent across all centers. At the same time, it would offer an interesting analytical challenge, particularly if samples with true mixed infections were sent to participants.

Conclusion and lessons learned.

The first ESGMD-SSM pilot EQA of SARS-CoV-2 sequencing was overall a success. Most centers generated whole-genome sequences and correctly identified all lineages and clusters. Additionally, there was a consensus regarding the majority of called SNPs despite the strong effect that missing data and unreported deletions (although present in the data) had on the scores of some. This suggests an overall high quality in each participating center. The standardized reporting of important variations in the genome should be the focus of improvement for some bioinformatic pipelines. The most critical aspect was coverage across the genome, which correlated with correct lineage and cluster assignments. For a follow-up EQA, the variables and format of the variables to document have to be more clearly defined. Moreover, minority variants should be included to some degree from samples with mixed infections. Information on primer sets for amplicon-based methods should be carefully recorded, especially in light of new virus lineages. Instead of culture supernatants, it might also be of interest to include primary patient samples diluted in a clinical collection matrix as well as an empty control. Finally, to trigger a discussion on cluster definition, samples with high similarity but 2 to 5 SNP differences could also be included. The COVID-19 pandemic required a rapid global laboratory response involving the development and rollout of new diagnostic assays and diagnostic platforms on an unprecedented scale. In response to the emergence and spread of virus variants of concern, WGS is increasingly being utilized not only for surveillance but also for diagnostic purposes, thus necessitating the rapid deployment and sharing of quality assurance schemes. This EQA pilot provides proof of feasibility for the development and operationalization of an EQA for WGS in a pandemic context, and lessons learned from its design, delivery, and results should inform future pandemic preparedness.

17 in total

1. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands.

Authors: Aura Timen; Marion Koopmans; Bas B Oude Munnink; David F Nieuwenhuijse; Mart Stein; Áine O'Toole; Manon Haverkate; Madelief Mollers; Sandra K Kamga; Claudia Schapendonk; Mark Pronk; Pascal Lexmond; Anne van der Linden; Theo Bestebroer; Irina Chestakova; Ronald J Overmars; Stefan van Nieuwkoop; Richard Molenkamp; Annemiek A van der Eijk; Corine GeurtsvanKessel; Harry Vennema; Adam Meijer; Andrew Rambaut; Jaap van Dissel; Reina S Sikkema
Journal: Nat Med Date: 2020-07-16 Impact factor: 53.440

2. Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial.

Authors: Thomas Junier; Michael Huber; Stefan Schmutz; Verena Kufner; Osvaldo Zagordi; Stefan Neuenschwander; Alban Ramette; Jakub Kubacki; Claudia Bachofen; Weihong Qi; Florian Laubscher; Samuel Cordey; Laurent Kaiser; Christian Beuret; Valérie Barbié; Jacques Fellay; Aitana Lebrand
Journal: Genes (Basel) Date: 2019-08-28 Impact factor: 4.096

3. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR.

Authors: Victor M Corman; Olfert Landt; Marco Kaiser; Richard Molenkamp; Adam Meijer; Daniel Kw Chu; Tobias Bleicker; Sebastian Brünink; Julia Schneider; Marie Luisa Schmidt; Daphne Gjc Mulders; Bart L Haagmans; Bas van der Veer; Sharon van den Brink; Lisa Wijsman; Gabriel Goderski; Jean-Louis Romette; Joanna Ellis; Maria Zambon; Malik Peiris; Herman Goossens; Chantal Reusken; Marion Pg Koopmans; Christian Drosten
Journal: Euro Surveill Date: 2020-01

4. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update.

Authors: Vahid Jalili; Enis Afgan; Qiang Gu; Dave Clements; Daniel Blankenberg; Jeremy Goecks; James Taylor; Anton Nekrutenko
Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971

5. SARS-CoV-2 outbreak in a tri-national urban area is dominated by a B.1 lineage variant linked to a mass gathering event.

Authors: Madlen Stange; Alfredo Mari; Tim Roloff; Helena Mb Seth-Smith; Michael Schweitzer; Myrta Brunner; Karoline Leuzinger; Kirstine K Søgaard; Alexander Gensch; Sarah Tschudin-Sutter; Simon Fuchs; Julia Bielicki; Hans Pargger; Martin Siegemund; Christian H Nickel; Roland Bingisser; Michael Osthoff; Stefano Bassetti; Rita Schneider-Sliwa; Manuel Battegay; Hans H Hirsch; Adrian Egli
Journal: PLoS Pathog Date: 2021-03-19 Impact factor: 6.823

6. Timely intervention and control of a novel coronavirus (COVID-19) outbreak at a large skilled nursing facility-San Francisco, California, 2020.

Authors: Ellora N Karmarkar; Irin Blanco; Pauli N Amornkul; Amie DuBois; Xianding Deng; Patrick K Moonan; Beth L Rubenstein; David A Miller; Idamae Kennedy; Jennifer Yu; Justin P Dauterman; Melissa Ongpin; Wilmie Hathaway; Lisa Hoo; Stephanie Trammell; Ejovwoke F Dosunmu; Guixia Yu; Zenith Khwaja; Wendy Lu; Nawzaneen Z Talai; Seema Jain; Janice K Louie; Susan S Philip; Scot Federman; Godfred Masinde; Debra A Wadford; Naveena Bobba; Juliet Stoltey; Adrian Smith; Erin Epson; Charles Y Chiu; Ayanna S Bennett; Amber M Vasquez; Troy Williams
Journal: Infect Control Hosp Epidemiol Date: 2020-12-14 Impact factor: 3.254

7. SARS-CoV-2 whole-genome sequencing using reverse complement PCR: For easy, fast and accurate outbreak and variant analysis.

Authors: Jordy P M Coolen; Femke Wolters; Alma Tostmann; Lenneke F J van Groningen; Chantal P Bleeker-Rovers; Edward C T H Tan; Nannet van der Geest-Blankert; Jeannine L A Hautvast; Joost Hopman; Heiman F L Wertheim; Janette C Rahamat-Langendoen; Marko Storch; Willem J G Melchers
Journal: J Clin Virol Date: 2021-10-02 Impact factor: 3.168

8. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein.

Authors: James J Davis; S Wesley Long; Paul A Christensen; Randall J Olsen; Robert Olson; Maulik Shukla; Sishir Subedi; Rick Stevens; James M Musser
Journal: Microbiol Spectr Date: 2021-12-08

9. Global Genomic Analysis of SARS-CoV-2 RNA Dependent RNA Polymerase Evolution and Antiviral Drug Resistance.

Authors: Alfredo Mari; Tim Roloff; Madlen Stange; Kirstine K Søgaard; Erblin Asllanaj; Gerardo Tauriello; Leila Tamara Alexander; Michael Schweitzer; Karoline Leuzinger; Alexander Gensch; Aurélien E Martinez; Julia Bielicki; Hans Pargger; Martin Siegemund; Christian H Nickel; Roland Bingisser; Michael Osthoff; Stefano Bassetti; Parham Sendi; Manuel Battegay; Catia Marzolini; Helena M B Seth-Smith; Torsten Schwede; Hans H Hirsch; Adrian Egli
Journal: Microorganisms Date: 2021-05-19

5 in total

1. SARS-CoV-2 Vaccine Alpha and Delta Variant Breakthrough Infections Are Rare and Mild but Can Happen Relatively Early after Vaccination.

Authors: Jelissa Katharina Peter; Fanny Wegner; Severin Gsponer; Fabrice Helfenstein; Tim Roloff; Rahel Tarnutzer; Kerstin Grosheintz; Moritz Back; Carla Schaubhut; Sabina Wagner; Helena M B Seth-Smith; Patrick Scotton; Maurice Redondo; Christiane Beckmann; Tanja Stadler; Andrea Salzmann; Henriette Kurth; Karoline Leuzinger; Stefano Bassetti; Roland Bingisser; Martin Siegemund; Maja Weisser; Manuel Battegay; Sarah Tschudin Sutter; Aitana Lebrand; Hans H Hirsch; Simon Fuchs; Adrian Egli
Journal: Microorganisms Date: 2022-04-21

2. Women in the European Virus Bioinformatics Center.

Authors: Franziska Hufsky; Ana Abecasis; Patricia Agudelo-Romero; Magda Bletsa; Katherine Brown; Claudia Claus; Stefanie Deinhardt-Emmer; Li Deng; Caroline C Friedel; María Inés Gismondi; Evangelia Georgia Kostaki; Denise Kühnert; Urmila Kulkarni-Kale; Karin J Metzner; Irmtraud M Meyer; Laura Miozzi; Luca Nishimura; Sofia Paraskevopoulou; Alba Pérez-Cataluña; Janina Rahlff; Emma Thomson; Charlotte Tumescheit; Lia van der Hoek; Lore Van Espen; Anne-Mieke Vandamme; Maryam Zaheri; Neta Zuckerman; Manja Marz
Journal: Viruses Date: 2022-07-12 Impact factor: 5.818

Review 3. Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics.

Authors: Srinithi Purushothaman; Marco Meola; Adrian Egli
Journal: Int J Mol Sci Date: 2022-08-30 Impact factor: 6.208

4. Increased transmissibility of SARS-CoV-2 alpha variant (B.1.1.7) in children: three large primary school outbreaks revealed by whole genome sequencing in the Netherlands.

Authors: Koen M F Gorgels; Lieke B van Alphen; Brian M J W van der Veer; Volker H Hackert; Audrey Y J Hensels; Casper D J den Heijer; Jozef Dingemans; Paul H M Savelkoul; Christian J P A Hoebe
Journal: BMC Infect Dis Date: 2022-08-29 Impact factor: 3.667

5. COVID-19 infection and transmission includes complex sequence diversity.

Authors: Ernest R Chan; Lucas D Jones; Marlin Linger; Jeffrey D Kovach; Maria M Torres-Teran; Audric Wertz; Curtis J Donskey; Peter A Zimmerman
Journal: PLoS Genet Date: 2022-09-08 Impact factor: 6.020

5 in total