| Literature DB >> 35215779 |
Charles S P Foster1,2, Sacha Stelzer-Braid1,2, Ira W Deveson3,4, Rowena A Bull2,5, Malinna Yeang1,2, Jane-Phan Au1,2, Mariana Ruiz Silva1,2, Sebastiaan J van Hal6,7, Rebecca J Rockett8,9, Vitali Sintchenko8,9,10,11, Ki Wook Kim1,12, William D Rawlinson1,2,12,13.
Abstract
Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.Entities:
Keywords: Pango lineage; SARS-CoV-2; bioinformatics; whole-genome sequencing
Mesh:
Year: 2022 PMID: 35215779 PMCID: PMC8875182 DOI: 10.3390/v14020185
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Histograms depicting the frequency of the number of differences between matched SARS-CoV-2 genome sequences. (a) total number of differences; (b) number of differences where either or both sequences being compared had an IUPAC ambiguity code at a given site; (c) number of differences where both sequences being compared had a standard nucleotide base (A, T, G, C) at a given site.
Figure 2Boxplots depicting the number of differences between matched SARS-CoV-2 genome sequences across 216 pairwise comparisons. (a) total number of differences; (b) number of differences where either or both sequences being compared had an IUPAC ambiguity code at a given site; (c) number of differences where both sequences being compared had a standard nucleotide base (A, T, G, C) at a given site. Within each panel, results are represented based on three classes of analyses: (1) Sequence derived from Lab1 vs. sequence derived from Lab2, both assembled with the bioinformatics protocol of Lab1 (Lab1 vs. Lab2-New); (2) Sequence derived from Lab1 assembled with the bioinformatics protocol of Lab1 vs. sequence derived from Lab2 assembled with the bioinformatics protocol of Lab2 (Lab1 vs. Lab2-Original); (3) Sequence derived from Lab2 assembled with the bioinformatics protocol of Lab1 vs. sequence derived from Lab2 assembled with the bioinformatics protocol of Lab2 (Lab2-New vs. Lab2−Original).
Summary statistics comparing the number of differences in pairwise sequence comparisons between matched samples from the same patient that were sequenced at different lab sites with/without using the same bioinformatics pipeline. Abbreviations: IQR = interquartile range; Lab1: Lab1 sequence assembled with Lab1 bioinformatics pipeline; Lab2-New: Lab2 sequence assembled with Lab1 bioinformatics pipeline; Lab2-Original: Lab2 sequence assembled with Lab2 bioinformatics pipeline.
| IUPAC Ambiguities Included | IUPAC Ambiguities Excluded | |||||||
|---|---|---|---|---|---|---|---|---|
| Pair | Mean | Median | IQR | Range | Mean | Median | IQR | Range |
| Lab1 vs. Lab2-New | 1.97 | 0 | 2 | 0–27 | 0.03 | 0 | 0 | 0–1 |
| Lab1 vs. Lab2-Original | 1.32 | 1 | 2 | 0–14 | 0.04 | 0 | 0 | 0–1 |
| Lab2-New vs. Lab2-Original | 1.72 | 1 | 2 | 0–14 | 0 | 0 | 0 | 0–0 |