| Literature DB >> 35136154 |
Sun Hee Rosenthal1, Anna Gerasimova1, Rolando Ruiz-Vega1, Kayla Livingston1, Ron M Kagan2, Yan Liu1, Ben Anderson1, Renius Owen1, Laurence Bernstein1, Alla Smolgovsky1, Dong Xu1, Rebecca Chen1, Andrew Grupe1, Pranoot Tanpaiboon1, Felicitas Lacbawan3.
Abstract
Monitoring new mutations in SARS-CoV-2 provides crucial information for identifying diagnostic and therapeutic targets and important insights to achieve a more effective COVID-19 control strategy. Next generation sequencing (NGS) technologies have been widely used for whole genome sequencing (WGS) of SARS-CoV-2. While various NGS methods have been reported, one chief limitation has been the complexity of the workflow, limiting the scalability. Here, we overcome this limitation by designing a laboratory workflow optimized for high-throughput studies. The workflow utilizes modified ARTIC network v3 primers for SARS-CoV-2 whole genome amplification. NGS libraries were prepared by a 2-step PCR method, similar to a previously reported tailed PCR method, with further optimizations to improve amplicon balance, to minimize amplicon dropout for viral genomes harboring primer-binding site mutation(s), and to integrate robotic liquid handlers. Validation studies demonstrated that the optimized workflow can process up to 2688 samples in a single sequencing run without compromising sensitivity and accuracy and with fewer amplicon dropout events compared to the standard ARTIC protocol. We additionally report results for over 65,000 SARS-CoV-2 whole genome sequences from clinical specimens collected in the United States between January and September of 2021, as part of an ongoing national genomics surveillance effort.Entities:
Mesh:
Year: 2022 PMID: 35136154 PMCID: PMC8826425 DOI: 10.1038/s41598-022-06091-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1High-throughput NGS workflow. Sequencing library preparation was performed on robotic liquid handlers. (A) Extracted RNA of SARS-CoV-2 positive specimens were converted to cDNA and amplified using a touchdown PCR method with primers published by the ARTIC network with modifications to add Illumina sequencing primer binding sites. Specimen-specific indexed sequencing adapters were added by subsequent fusion PCR using the primer binding sites. Agilent Bravo was used for each process. (B) The final indexed PCR products were purified using Ampure XP beads using BlueCat BlueWasher, pooled into a single library using Hamilton Starlet, and size selected using Sage Science Blue Pippin. (C) The final library was sequenced on an Illumina NovaSeq 6000. (D) An in-house developed bioinformatics pipeline was utilized to generate consensus genomes and for variant calls relative to the reference genome MN908947.3, Wuhan-Hu-1.
Figure 2Comparison of amplicon coverage balance. A bar plot depicting the average sequencing depth of the amplicons (A) using equimolar primer pools, low performing amplicons were observed; (C) optimized modified primer pools, coverage of low performing amplicons was improved; (B, D) ARTIC v3 standard primers with even coverage. A same clinical sample set was used for plot (A) and (B) (Ct 24, n = 11); and (C) and (D) (Ct 24, n = 5). All samples were normalized to 200,000 mapped reads for comparison.
Comparison of percent amplicon dropout by annealing temperature settings.
| Cladea | Samples tested (N) | % Amplicon dropout: | % Amplicon dropout: |
|---|---|---|---|
| Annealing at 65 °C | Annealing at 65–55 °Cb | ||
| 20A | 12 | 0.34 | 0 |
| 20B | 10 | 0.51 | 0 |
| 20C | 3 | 1.7 | 0.34 |
| 20G | 24 | 0.21 | 0 |
| 20H (Beta, V2) | 1 | 2.04 | 0 |
| 20I (Alpha, V1) | 9 | 0 | 0 |
| 21C (Epsilon) | 17 | 1.08 | 0 |
| 21F (Iota) | 3 | 0 | 0 |
| Total | 79 | 0.5 | 0.01 |
aSARS-CoV-2 clades were assigned with Nextclade v1.3.0 (https://clades.nextstrain.org/).
bTouchdown PCR gradually reduced the annealing temperature from 65 °C to 55 °C (0.7 °C/second) within each cycle.
Figure 3Percent genome coverage for intra-assay (A) and inter-assay (B) precision samples. Square dots indicate the average percent coverage of the SARS-CoV-2 genome for each replicate. The box plot shows the medians, interquartile ranges (IQR) and 1.5 × IQR at each Ct value. Ct values were rounded down.
Intra- and inter-assay concordance of clade and lineage assignment using the automated, high-throughput SARS-CoV-2 NGS workflow.
| Clade/lineagea | Number of samples | ||
|---|---|---|---|
| Intra-assay precision | Inter-assay precision | Total tested/Total correct | |
| 20A/B.1 | 5 | 1 | 6/6 |
| 20A/B.1.189 | 0 | 1 | 1/1 |
| 20A/B.1.232 | 0 | 1 | 1/1 |
| 20A/B.1.234 | 0 | 3 | 3/3 |
| 20A/B.1.243 | 0 | 3 | 3/3 |
| 20A/B.1.525 | 2 | 1 | 3/3 |
| 20A/B.1.539 | 0 | 1 | 1/1 |
| 20A/B.1.628 | 1 | 0 | 1/1 |
| 20B/B.1.1 | 0 | 1 | 1/1 |
| 20B/B.1.1.222 | 1 | 0 | 1/1 |
| 20B/B.1.1.231 | 1 | 0 | 1/1 |
| 20B/B.1.1.265 | 1 | 0 | 1/1 |
| 20B/B.1.1.316 | 1 | 0 | 1/1 |
| 20B/B.1.1.318 | 2 | 0 | 2/2 |
| 20B/B.1.1.345 | 0 | 1 | 1/1 |
| 20B/B.1.1.348 | 1 | 1 | 2/2 |
| 20B/B.1.1.434 | 2 | 0 | 2/2 |
| 20B/B.1.1.519 | 8 | 15 | 23/23 |
| 20B/P.2 | 0 | 1 | 1/1 |
| 20B/R.1 | 0 | 4 | 4/4 |
| 20C/B.1 | 3 | 0 | 3/3 |
| 20C/B.1.1 | 3 | 0 | 3/3 |
| 20C/B.1.324 | 1 | 0 | 1/1 |
| 20C/B.1.427 | 2 | 5 | 7/7 |
| 20C/B.1.429 | 2 | 16 | 18/18 |
| 20C/B.1.517 | 6 | 1 | 7/7 |
| 20C/B.1.526 | 8 | 4 | 12/12 |
| 20C/B.1.526.1 | 3 | 4 | 7/7 |
| 20C/B.1.526.2 | 9 | 5 | 14/14 |
| 20C/B.1.575 | 2 | 2 | 4/4 |
| 20C/B.1.637 | 1 | 0 | 1/1 |
| 20G/B.1.2 | 12 | 38 | 50/50 |
| 20G/B.1.596 | 4 | 6 | 10/10 |
| 20I (Alpha, V1)/B.1.1.7 | 77 | 44 | 121/121 |
| 20I (Alpha, V1)/Q.4 | 2 | 0 | 2/2 |
| 20I (Alpha, V1)/Q.8 | 1 | 0 | 1/1 |
| 20J (Gamma, V3)/P.1 | 5 | 1 | 6/6 |
| 21D (Eta)/B.1.525 | 2 | 0 | 2/2 |
| 21F (Iota)/B.1.526 | 5 | 0 | 5/5 |
| 21H/B.1.621 | 2 | 0 | 2/2 |
| Total | 175 | 160 | 335/335 |
aSARS-CoV-2 clades were assigned with Nextclade v1.3.0 (https://clades.nextstrain.org/) and lineages were assigned with Pangolin v3.1.11 (https://pangolin.cog-uk.io/).
Figure 4Sequencing pass rate (%) for Ct values between 17 and 35. Percent sample pass at over 97% consensus sequence coverage (A), and at over 90% consensus sequence coverage (B). Sample numbers tested per Ct are indicated in parentheses. Ct values were rounded down.
Proportion of SARS-CoV-2 samples generating complete consensus sequence with the automated, high-throughput SARS-CoV-2 NGS workflow, by clade.
| Cladeb | Samples generating complete consensus sequence with each workflow, % (complete/total)a | |
|---|---|---|
| High-throughput workflowc | ARTIC v3 workflowd | |
| 19B | 100 (3/3) | 0 (0/3) |
| 20A | 83.0 (44/53) | 62.2 (33/53) |
| 20B | 96.2 (102/106) | 59.4 (63/106) |
| 20C | 88.7 (87/98) | 47.9 (47/98) |
| 20G | 96 (168/175) | 81.1 (142/175) |
| 20H (Beta, V2) | 100 (7/7) | 0 (0/7) |
| 20I (Alpha, V1) | 96.9 (691/713) | 78.2 (558/713) |
| 20 J (Gamma, V3) | 100 (44/44) | 45.4 (20/44) |
| 21A (Delta) | 100 (1/1) | 0 (0/1) |
| 21C (Epsilon) | 98.0 (99/101) | 0.99 (1/101) |
| 21D (Eta) | 100 (10/10) | 100 (10/10) |
| 21F (Iota) | 97.7 (128/131) | 61.8 (81/131) |
| 21G (Lambda) | 100 (1/1) | 0 (0/1) |
| 21H | 100 (3/3) | 66.6 (2/3) |
| Total | 96.0 (1388/1446) | 66.1 (957/1446) |
aComplete consensus sequence as defined by obtaining ≥ 97% SARS-CoV-2 genome coverage.
bSARS-CoV-2 clades were assigned with Nextclade v1.3.0 (https://clades.nextstrain.org/).
cSamples were sequenced on the Illumina NovaSeq 6000 with the SP reagent kit using 2 × 251 cycles.
dSamples were sequenced on the Illumina MiSeq with the 600 cycle v3 kit using 2 × 251 cycles.
Figure 5Weekly proportions of SARS-CoV-2 variants. (A) Variant proportions were calculated against total cases sequenced each week. (B) Total number of cases sequenced for each week is indicated with scale bar.