| Literature DB >> 36016297 |
Nathalie Mugnier1, Aurélien Griffon1, Bruno Simon2, Maxence Rambaud1, Hadrien Regue2, Antonin Bal2, Gregory Destras2, Maud Tournoud1, Magali Jaillard1, Abel Betraoui1, Emmanuelle Santiago1, Valérie Cheynet1,3, Alexandre Vignola4, Véronique Ligeon1, Laurence Josset2, Karen Brengel-Pesce1,3.
Abstract
Whole-genome sequencing has become an essential tool for real-time genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) worldwide. The handling of raw next-generation sequencing (NGS) data is a major challenge for sequencing laboratories. We developed an easy-to-use web-based application (EPISEQ SARS-CoV-2) to analyse SARS-CoV-2 NGS data generated on common sequencing platforms using a variety of commercially available reagents. This application performs in one click a quality check, a reference-based genome assembly, and the analysis of the generated consensus sequence as to coverage of the reference genome, mutation screening and variant identification according to the up-to-date Nextstrain clade and Pango lineage. In this study, we validated the EPISEQ SARS-CoV-2 pipeline against a reference pipeline and compared the performance of NGS data generated by different sequencing protocols using EPISEQ SARS-CoV-2. We showed a strong agreement in SARS-CoV-2 clade and lineage identification (>99%) and in spike mutation detection (>99%) between EPISEQ SARS-CoV-2 and the reference pipeline. The comparison of several sequencing approaches using EPISEQ SARS-CoV-2 revealed 100% concordance in clade and lineage classification. It also uncovered reagent-related sequencing issues with a potential impact on SARS-CoV-2 mutation reporting. Altogether, EPISEQ SARS-CoV-2 allows an easy, rapid and reliable analysis of raw NGS data to support the sequencing efforts of laboratories with limited bioinformatics capacity and those willing to accelerate genomic surveillance of SARS-CoV-2.Entities:
Keywords: SARS-CoV-2; bioinformatics; genome assembly; mutation screening; next-generation sequencing; nextstrain clade; pango lineage; variant identification
Mesh:
Year: 2022 PMID: 36016297 PMCID: PMC9416160 DOI: 10.3390/v14081674
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.818
Sequencers and reagents used for the kits and sequencing platform comparison study.
| Sequencer | Primer Pool | Kits |
|---|---|---|
| MiSeq (Illumina) | ARTIC v3 | NEBNext® ARTIC SARS-CoV-2 Library Prep Kit (Illumina) (NEB, E7650) |
| ARTIC v4 | NEBNext® ARTIC SARS-CoV-2 Library Prep Kit (Illumina) (NEB, E7650); ARTIC V4 NCOV-2019 Panel (IDT, 10008554) | |
| ARTIC v4.1 | NEBNext® ARTIC SARS-CoV-2 FS Library Prep Kit (Illumina) (NEB, E7658); ARTIC V4.1 NCOV-2019 Panel (IDT, 10011442) | |
| VSS v1 | NEBNext® ARTIC SARS-CoV-2 FS Library Prep Kit (Illumina) (NEB, E7658) | |
| VSS v2 | NEBNext® ARTIC SARS-CoV-2 FS Library Prep Kit (Illumina) (NEB, E7658) | |
| GridION Mk1 (Oxford Nanopore Technologies) | ARTIC v3 | NEBNext® ARTIC SARS-CoV-2 Companion Kit (ONT) (NEB, E7660) |
| ARTIC v4 | NEBNext® ARTIC SARS-CoV-2 Companion Kit (ONT) (NEB, E7660); | |
| ARTIC v4.1 | NEBNext® ARTIC SARS-CoV-2 Companion Kit (ONT) (NEB, E7660); | |
| VSS v1 | NEBNext® ARTIC SARS-CoV-2 Companion Kit (ONT) (NEB, E7660) | |
| VSS v2 | NEBNext® ARTIC SARS-CoV-2 Companion Kit (ONT) (NEB, E7660) |
Abbreviations: IDT, Integrated DNA Technologies; NEB, New England Biolabs; ONT, Oxford Nanopore Technologies; VSS, VarSkip Short.
Figure 1Correlation of the percentage of genome coverage of SARS-CoV-2 sequences evaluated by EPISEQ SARS-CoV-2 vs. the reference method (n = 1632). Spearman (non-parametric) correlation coefficient r = 0.883 (95% confidence interval: 0.871–0.893; p < 0.0001).
Agreement of SARS-CoV-2 sequence analyses by the EPISEQ SARS-CoV-2 vs. the reference pipeline as to clade and lineage identification (samples with >95% coverage based on the reference method; n = 1362).
| Nextstrain Clade | Pango Lineage | |||
|---|---|---|---|---|
| Sequencing Kit | % [95% CI] | % [95% CI] | ||
| ARTIC v3 | 527/527 2 | 100.0% [99.3–100.0] | 525/527 | 99.6% [98.6–99.9] |
| ARTIC v4 | 316/316 3 | 100.0% [98.8–100.0] | 315/316 | 99.7% [98.3–99.9] |
| ARTIC v4.1 | 517/519 4 | 99.6% [98.6–99.9] | 512/519 | 98.7% [97.2–99.5] |
| Total | 1360/1362 | 99.9% [99.5–100.0] | 1352/1362 | 99.3% [98.7–99.7] |
1 n/N is the ratio of the number of sequences attributed the same clade or lineage, respectively, by both bioinformatics pipelines to the number of sequences analysed. 2 Clade distribution (n = 527): 19B, 4; 20A, 80; 20B, 10; 20C, 1; 20D, 1; 20E (EU1), 26; 20H (Beta, V2), 16; 20I (Alpha, V1), 388; 21D (Eta), 1. 3 Clade distribution (n = 316): 20A, 5; 20B, 1; 20H (Beta, V2), 2; 20I (Alpha, V1), 54; 21I (Delta), 16; 21J (Delta), 238. 4 Clade distribution (n = 519): 20A, 7; 20H (Beta, V2), 5; 21I (Delta), 3; 21J (Delta), 43; 21K (Omicron), 204; 21L (Omicron), 257. Abbreviation: CI, confidence interval.
Proportion of SARS-CoV-2 sequences presenting single-nucleotide polymorphisms (SNPs) between consensus sequence assemblies generated by EPISEQ SARS-CoV-2 and the reference method.
| Sequencing Kit | 0 SNP | 1 SNP | 2 SNPs | >2 SNPs |
|---|---|---|---|---|
| ARTIC v3 | 524/527 (99.4%) | 3/527 (0.6%) | 0/527 (0.0%) | 0/527 (0.0%) |
| ARTIC v4 | 253/316 (80.1%) | 55/316 (17.4%) | 8/316 (2.5%) | 0/316 (0.0%) |
| ARTIC v4.1 | 363/519 (69.9%) | 137/519 (26.4%) | 15/519 (2.9%) | 4/519 (0.8%) |
| Total | 1140/1362 (83.7%) | 195/1362 (14.3%) | 23/1362 (1.7%) | 4/1362 (0.3%) |
1 n/N is the ratio of the number of consensus sequences with the indicated number of SNPs (0, 1, 2 or >2, respectively) between analyses by both bioinformatics pipelines to the number of consensus sequences analysed.
Agreement of SARS-CoV-2 sequence analyses by the EPISEQ SARS-CoV-2 vs. the reference pipeline as to amino acid mutation identification (samples with >95% coverage by reference method; n = 1362).
| Sequencing Kit | Spike Mutations, |
|---|---|
| ARTIC v3 | 527/527 (100.0%) |
| ARTIC v4 | 315/316 (99.7%) |
| ARTIC v4.1 | 510/519 (98.3%) |
| Total | 1352/1362 (99.3%) |
1n/N is the ratio of the number of sequences with the same amino acid mutations identified by both bioinformatics pipelines within the spike protein to the number of sequences analysed.
Figure 2Percentage of reference genome coverage determined by EPISEQ SARS-CoV-2 upon whole-genome sequencing with different kits and sequencing platforms (Table 1). (a) Pre-omicron SARS-CoV-2-positive samples (including seven 20A-G (EU1), four alpha, two beta, one gamma, six delta and one Eta SARS-CoV-2 variants; n = 21) were sequenced using four different commercial kits and primer pools (ARTIC v3, v4, v4.1 and VSS v1) on two NGS platforms (Illumina MiSeq and ONT GridION), generating 168 sequencing results. (b) Omicron-positive SARS-CoV-2 samples (including nine BA.1 and 10 BA.2 omicron sub-variants; n = 19) were sequenced using two different commercial kits and primer pools (ARTIC v4.1 and VSS v2) on the same two NGS platforms (Illumina MiSeq and ONT GridION), generating 76 sequencing results. A total of 244 raw NGS data were generated and analysed using EPISEQ SARS-CoV-2. The dashed line indicates 95% coverage (quality control criteria). Abbreviations: Av4.1, ARTIC kit version 4.1; Av4, ARTIC kit version 4; Av3, ARTIC kit version 3; Illumina, San Diego, USA Illumina sequencing; ONT, Oxford, UK Oxford Nanopore Technologies sequencing; VSSv1, VarSkip Short kit version 1; VSSv2, VarSkip Short kit version 2.
Concordance of sequencing results of SARS-CoV-2-positive samples generated by different kits and sequencing platforms and analysed using the EPISEQ SARS-CoV-2 pipeline.
| SARS-CoV-2 Samples | Nextstrain Clade | Pango Lineage |
|---|---|---|
| Pre-omicron variants 1 | 21/21 (100.0%) | 21/21 (100.0%) |
| Omicron variants 2 | 19/19 (100.0%) | 19/19 (100.0%) |
| Total | 40/40 (100.0%) | 40/40 (100.0%) |
1 Samples (n = 21) sequenced using primers ARTIC v4.1, ARTIC v4, ARTIC v3 and VSS v1 on the Illumina and ONT platforms (n = 168 output results); 2 samples (n = 19) sequenced using primers ARTIC v4.1 and VSS v2 on the Illumina and ONT platforms (n = 76 output results).
Figure 3Spike mutations identified by EPISEQ SARS-CoV-2. (a) Pre-omicron SARS-CoV-2 variants (including 4 alpha, 2 beta, 1 gamma, 6 delta and 8 other SARS-CoV-2 variants; n = 21); (b) SARS-CoV-2 omicron variants (including 9 BA.1 [21K] and 10 BA.2 [21L] sub-variants; n = 19). Concordance in mutation detection between kits and sequencing platforms is shown in dark green (mutation detected in all eight (a) or four (b) conditions) and light grey (no mutation detected in all eight (a) or four (b) conditions). Other colours (pink, orange and red) represent mutations detected with some but not all kit/sequencer combinations, thus indicating a discordance in identified mutations (see Tables S2 and S3 for details).
Figure 4Genomic assembly of sequences of the S gene of SARS-CoV-2 omicron variants by EPISEQ SARS-CoV-2 following sequencing with two different kits (based on ARTIC v4.1 and VSS v2) and on two different platforms (Illumina and ONT). (a) SARS-CoV-2 Omicron BA.1 (21K) variants (n = 9). (b) SARS-CoV-2 omicron BA.2 (21L) variants (n = 10). Horizontal black bars represent undetermined bases (N). Legend of sequence alignment (performed with Geneious 10.0.7): bright green, all aligned sequences are identical; light green, some aligned sequences differ (mismatch, undetermined nucleotides or deletions in some sequences). The genomic region covering nucleotides ~700 to 1700 shows sequencing gaps using ARTIC v4.1 (between nucleotides ~700 to 1250, overlapping amplicon 75) and VSS v2 (between nucleotides ~1300 to 1700, overlapping amplicon 57).