Literature DB >> 33298935

Analytical validity of nanopore sequencing for rapid SARS-CoV-2 genome analysis.

Rowena A Bull1,2, Thiruni N Adikari1,2, James M Ferguson3, Jillian M Hammond3, Igor Stevanovski3, Alicia G Beukers4, Zin Naing2,5, Malinna Yeang2,5, Andrey Verich1, Hasindu Gamaarachchi3,6, Ki Wook Kim5,7, Fabio Luciani1,2, Sacha Stelzer-Braid2,5, John-Sebastian Eden8,9, William D Rawlinson2,5,7,10, Sebastiaan J van Hal4,11, Ira W Deveson12,13.   

Abstract

Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at >99% sensitivity and >99% precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising diversity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched samples. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.

Entities:  

Year:  2020        PMID: 33298935      PMCID: PMC7726558          DOI: 10.1038/s41467-020-20075-6

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative pathogen for COVID-19 disease[1,2]. SARS-CoV-2 is a positive-sense single-stranded RNA virus with a ~30-kb poly-adenylated genome[1,2]. Complete genome sequences published in January 2020[1,3] enabled development of RT-PCR assays for SARS-CoV-2 detection that have served as the diagnostic standard during the ongoing COVID-19 pandemic[4]. Whole-genome sequencing (WGS) of SARS-CoV-2 provides additional data to complement routine diagnostic testing. Viral WGS informs public health responses by defining the phylogenetic structure of disease outbreaks[5]. Integration with epidemiological data identifies transmission networks and can infer the origin of unknown cases[6-11]. Largescale, longitudinal surveillance by viral WGS may also provide insights into virus evolution, with important implications for vaccine development[12-15]. WGS can be performed via PCR amplification or hybrid-capture of the reverse-transcribed SARS-CoV-2 genome sequence, followed by high-throughput sequencing. Short-read sequencing technologies (e.g., Illumina) enable accurate sequence determination and are the current standard for pathogen genomics. However, long-read sequencing devices from Oxford Nanopore Technologies (ONT) offer an alternative with several advantages. ONT devices are portable, cheap, require minimal supporting laboratory infrastructure or technical expertise for sample preparation, and can be used to perform rapid sequencing analysis with flexible scalability[16]. The use of ONT devices for viral surveillance has been demonstrated during Ebola, Zika and other disease outbreaks[17-19]. Although protocols for ONT sequencing of SARS-CoV-2 have been established and applied in both research and public health settings[20-22], adoption of the technology has been limited due to concerns around accuracy. ONT devices exhibit lower read-level sequencing accuracy than short-read platforms[23-25]. This may have a disproportionate impact on SARS-CoV-2 analysis, due to the virus’ low mutation rate (8 × 10−4 substitutions per site per year[26]), which ensures erroneous (false-positive) or undetected (false-negative) genetic variants have a strong confounding effect. In order to address concerns regarding ONT sequencing accuracy and evaluate its analytical validity for SARS-CoV-2 genomics, we have performed amplicon-based nanopore and short-read WGS on matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, allowing rigorous evaluation of ONT performance characteristics.

Results

Analysis of synthetic SARS-CoV-2 controls

Synthetic DNA or RNA reference standards can be used to assess the accuracy and reproducibility of next-generation sequencing assays[27]. We first sequenced synthetic RNA controls that were generated by in vitro transcription of the SARS-CoV-2 genome sequence. The controls matched the Wuhan-Hu-1 reference strain at all positions, allowing analytical errors to be unambiguously identified. To mimic a real-world viral WGS experiment, synthetic RNA was reverse-transcribed then amplified using multiplexed PCR of 98 × ~400 bp amplicons that enabled evaluation of ~95% of the SARS-CoV-2 genome. Eight independent replicates were sequenced on ONT PromethION and Illumina MiSeq instruments (see Methods section). We aligned the resulting reads to the Wuhan-Hu-1 reference genome to assess sequencing accuracy and related quality metrics (Supplementary Fig. 1a–i). Illumina and ONT platforms exhibited distinct read-level error profiles, with the latter characterised by an elevated rate of both substitution (23-fold) and insertion-deletion (indel) errors (76-fold; Table 1 and Supplementary Fig. 1d, e). Per-base error frequency profiles showed clear correlation between ONT replicates (substitution R2 = 0.67; indel R2 = 0.82; Supplementary Fig. 1f, g). This indicates that ONT sequencing errors are not entirely random but are influenced by local sequence context. For example, indel errors were enriched (1.4-fold) at low-complexity sequences within the SARS-CoV-2 genome (i.e., sites with homopolymeric or repetitive content; ~1% of the genome; Supplementary Fig. 1d). Illumina error profiles showed weaker correlation between replicates (substitution R2 = 0.15; indel R2 = 0.42), indicating that short-read sequencing errors were less systematic than for ONT libraries (Supplementary Fig. 1h, i).
Table 1

Sequencing accuracy for Illumina and ONT whole-genome sequencing of synthetic SARS-CoV-2 controls.

Read-level error rate (errors per base per read)Erroneous variants
Illumina samplesReportable (bp)TotalMismatchDeletionInsertionTotalSNVsIndelsConsensus accuracy (%)
A28,6870.001520.000830.000580.00011000100
B28,6870.001530.000820.000600.00012000100
C28,6870.001480.000790.000570.00012000100
D28,6870.001720.000980.000630.00011000100
E28,6870.001240.000890.000240.00011000100
F28,6870.001700.001370.000230.00011000100
G28,6870.001220.000880.000220.00011000100
H28,6870.001180.000840.000240.00011000100
Mean28,6870.001450.000920.000410.00011000100
Sequencing accuracy for Illumina and ONT whole-genome sequencing of synthetic SARS-CoV-2 controls. Despite their distinct error profiles, both sequencing platforms demonstrated high consensus-level sequencing accuracy across the SARS-CoV-2 genome. We used iVar and Medaka workflows to determine consensus genome sequences for Illumina and ONT libraries, respectively (see Methods section). We detected just two erroneous variant candidates in a single ONT library (Table 1). Both of these were single-base insertions occurring at low-complexity sites (Supplementary Fig. 2), with no erroneous SNVs detected in any replicate (n = 8). All Illumina libraries exhibited perfect accuracy (Table 1). Therefore, the sequencing artefacts affecting both technologies had minimal impact on the accuracy of consensus-level sequence determination, with indel errors in ONT samples being a possible exception.

Analysis of matched patient isolates

To further evaluate the suitability of ONT sequencing for SARS-CoV-2 genomics, we conducted rigorous proficiency testing using bona fide clinical specimens. We performed ONT and Illumina WGS on matched, de-identified SARS-CoV-2-positive cases collected at public hospital laboratories in Eastern & Southern New South Wales and Metropolitan Sydney from March to April 2020 (see Methods section). Selected specimens covered a range of SARS-CoV-2 lineages and viral titres (~103–108 copies/μL; Supplementary Data 1). The SARS-CoV-2 genome was enriched by PCR amplification, using a custom set of 14 × ~2.5 kb amplicons that covers 29,783/29,903 bp (99.6%) of the genome, including 100% of annotated protein-coding positions[6]. Pooled amplicons then underwent parallel library preparation and sequencing on an ONT GridION/PromethION and an Illumina MiSeq instrument (see Methods section). Short-read sequencing was performed according to a pathogen genomics accredited diagnostic workflow in a reference NSW Health Pathology laboratory, enabling direct comparison of nanopore sequencing to the established standard for pathogen genomics. In total, we obtained complete (99.6%) genome coverage with both technologies for 157 matched positive cases (Supplementary Data 1). By comparison to the Wuhan-Hu-1 reference strain, Illumina sequencing identified 7.6 consensus single-nucleotide variants (SNVs) and 0.04 indels, on average, per sample. A further 1.0 SNVs and 0.2 indels per sample were detected at sub-consensus read-count frequencies (20–80%), indicative of intra-specimen genetic diversity (see below). Excluding positions with evidence of sub-consensus variation, this provides an overall comparison set of 1201 consensus variants and 4,674,554 positions that match the reference strain in a given sample, against which to assess the accuracy of SARS-CoV-2 nanopore sequencing (Supplementary Data 1). We used each of two best-practice bioinformatics pipelines developed by the ARTIC network to identify consensus variants with ONT sequencing data. The alternative pipelines differed primarily in their use of either Medaka or Nanopolish to call variants (see Methods section). In general, ONT variant candidates identified by both pipelines were highly concordant with the Illumina comparison set. Illumina variants were detected with 99.17% sensitivity and 99.58% precision by Nanopolish, compared to 98.33% sensitivity and 99.24% precision by Medaka (Table 2). Undetected variants (false-negatives) were more frequent than erroneous candidates (false-positives), occurring in 14/157 (9%) and 9/157 (6%) of Medaka samples, respectively (Supplementary Data 2). Only 1/7 (14%) of consensus indels in the Illumina comparison set was detected by either Nanopolish or Medaka, while a further five and nine false-positive indels were detected by the respective pipelines (Supplementary Data 2). While the scarcity of consensus indels detected with either sequencing technology prevented a more thorough evaluation of indel accuracy, this indicates that ONT is inadequate for accurate detection of small indels in the SARS-CoV-2 genome. In contrast, SNVs were detected by Nanopolish and Medaka with high accuracy: overall, we found 99.66% and 98.83% concordance between ONT and Illumina SNVs, as measured by Jaccard similarity, with identical results in 145/157 (92%) and 153/157 samples (97%), respectively (Table 2).
Table 2

Consensus-level accuracy of ONT whole-genome SARS-CoV-2 sequencing on patient specimens.

MedakaMedaka minus blacklistaNanopolishNanopolish minus blacklista
Cases analysed157157157157
Genome coverage (%)99.5998.5699.5998.56
Negative positions4,674,5544,627,7684,674,5544,627,768
Illumina variants1201116212011162
ONT variants1190115911961164
TPs1181115511911160
FNs207102
FPs9454
Sensitivity (%)98.3399.4099.1799.83
Precision (%)99.2499.6599.5899.66
Jaccard similarity (%)97.6099.0698.7699.49
Perfect concordance140/157 cases149/157 cases147/157 cases152/157 cases
Illumina SNVs1194116211941162
ONT SNVs1180115511901160
TPs1180115511901160
FNs14742
FPs0000
Sensitivity (%)98.8399.4099.6699.83
Precision (%)100100100100
Jaccard similarity (%)98.8399.4099.6699.83
Perfect concordance145/157 cases152/157 cases153/157 cases155/157 cases

aBlacklist sites are error-prone low-complexity sequences (n = 15; 9–42 bp; see text for details).

Consensus-level accuracy of ONT whole-genome SARS-CoV-2 sequencing on patient specimens. aBlacklist sites are error-prone low-complexity sequences (n = 15; 9–42 bp; see text for details). Inspection of false-positive and false-negative variant candidates detected with ONT sequencing data showed that these tended to occur in low-complexity sequences, which are known to be refractory to ONT base-calling algorithms[23]. For example, false-negative and/or false-positive candidates were found within a 21-bp T-rich site in the orf1ab gene in multiple samples (Supplementary Fig. 3a, b). We identified 15 problematic low-complexity sites in the SARS-CoV-2 genome ranging in size from 9 to 42 bp in length that showed elevated read-level sequencing error rates (Supplementary Fig. 1d and Supplementary File 1). Exclusion of these positions (~1% of the genome) improved the fidelity of ONT variant detection, with consensus SNVs in the Illumina comparison set being detected with 99.83% and 99.40% sensitivity by Nanopolish and Medaka, respectively, and perfect precision for both. Consensus SNVs detected with the Nanopolish workflow were identical between ONT and Illumina data in 155/157 (99%) of samples (Table 2 and Supplementary Data 3). This suggests that the accuracy of nanopore WGS may be improved via the exclusion of a small number of ‘blacklist’ low-complexity sites in the SARS-CoV-2 genome from downstream analysis. We next assessed the impact of sequencing depth on ONT performance. To do so, we down-sampled nanopore sequencing reads from a uniform 200-fold coverage across the SARS-CoV-2 genome and repeated variant detection across a range of coverage depths (see Methods section). Both sensitivity and precision of variant detection were strongly influenced by sequencing coverage, showing a sharp decline below ~50-fold coverage depth, with minimal improvement observed above ~60-fold (Supplementary Fig. 1a, b). As above, excluding error-prone low-complexity sequences afforded consistent improvements to sensitivity and overall concordance across the range of depths tested (Fig. 1a, b).
Fig. 1

Variant detection performance for whole-genome ONT sequencing of SARS-CoV-2.

(a; upper) Sensitivity with which Illumina comparison SNVs at consensus-level variant frequencies (80–100%) were detected via ONT sequencing on matched SARS-CoV-2 specimens (n = 157). Bars show mean ± range. (a; lower) Fraction of specimens tested in which SNVs were detected with perfect sensitivity (sn). Data are plotted separately for genome-wide variant detection (gold) and variant detection with error-prone ‘blacklist’ sites excluded (red). b Same as in a but Jaccard similarity (jac) scores for all variant candidates are plotted instead of SNV sn. c Correlation of variant frequencies observed for SNV candidates detected at sub-consensus frequencies (20–80%) with Illumina and ONT sequencing. Candidates detected with ONT but not Illumina were considered to be false-positives (FP; red) and candidates detected with Illumina but not ONT were considered to be false-negatives (FP; pink). d Sensitivity (blue) and precision (green) of SNV detection with ONT sequencing at sub-consensus variant frequencies (20–80%). Data are plotted separately for high (60–80%), intermediate (40–60%) and low (20–40%) frequencies. Error bars show 95% confidence intervals (Clopper-Pearson) calculated over all specimens (n = 157). Source data are provided as Source Data file.

Variant detection performance for whole-genome ONT sequencing of SARS-CoV-2.

(a; upper) Sensitivity with which Illumina comparison SNVs at consensus-level variant frequencies (80–100%) were detected via ONT sequencing on matched SARS-CoV-2 specimens (n = 157). Bars show mean ± range. (a; lower) Fraction of specimens tested in which SNVs were detected with perfect sensitivity (sn). Data are plotted separately for genome-wide variant detection (gold) and variant detection with error-prone ‘blacklist’ sites excluded (red). b Same as in a but Jaccard similarity (jac) scores for all variant candidates are plotted instead of SNV sn. c Correlation of variant frequencies observed for SNV candidates detected at sub-consensus frequencies (20–80%) with Illumina and ONT sequencing. Candidates detected with ONT but not Illumina were considered to be false-positives (FP; red) and candidates detected with Illumina but not ONT were considered to be false-negatives (FP; pink). d Sensitivity (blue) and precision (green) of SNV detection with ONT sequencing at sub-consensus variant frequencies (20–80%). Data are plotted separately for high (60–80%), intermediate (40–60%) and low (20–40%) frequencies. Error bars show 95% confidence intervals (Clopper-Pearson) calculated over all specimens (n = 157). Source data are provided as Source Data file. To verify these observations and assess reproducibility, we re-sequenced 12 specimens, selected to cover a range of SARS-CoV-2 titres (Ct = 103–107), to generate triplicate (n = 3) data on both Illumina and ONT platforms. We measured reproducibility by performing pairwise comparisons of detected variant candidates between replicates for a given sample (Supplementary Data 4). No discordant variants were detected between Illumina replicates across any of the 36 pairwise sample comparisons (309 variants total), confirming the reliability of short-read WGS. ONT also showed high reproducibility, with 99.36% Jaccard similarity between Medaka replicates for consensus variants (310 total) and perfect concordance for SNVs (Supplementary Data 4). In summary, ONT sequencing enabled highly accurate and reproducible detection of consensus-level SNVs in SARS-CoV-2 patient isolates but appears generally unsuitable for the detection of small indel variants.

Detection of intra-specimen variation

Within-host genetic diversity is a common feature of RNA viruses, with divergent quasi-species present in a single infection. Within-host diversity may help infecting viruses evade the host immune response, adapt to changing environments and can cause more severe and/or long-lasting disease[28-30]. Resolving this diversity may also better inform studies of virus transmission than consensus-level phylogenetics alone[31-33]. Therefore, we next evaluated the capacity of nanopore sequencing to identify intra-specimen genetic variation by detecting variants present at sub-consensus frequencies (i.e. variants detected in <80% of mapped reads). Analysis of the SARS-CoV-2 synthetic RNA controls and replicate short-read sequencing libraries (see above) showed that sequencing artefacts in Illumina libraries could be misinterpreted as variants at read-count frequencies below ~20% (Supplementary Fig. 2b and Supplementary Data 5), effectively establishing a lower bound for variant detection. We therefore limited our analysis to variants detected at ≥20% frequency, taking variants detected by Illumina sequencing above this level to be genuine. Overall, short-read sequencing identified sub-consensus variants (20–80%) in 54/157 samples, comprising 156 SNVs and 20 indels (Supplementary Data 6). Using Varscan2, we identified 154 sub-consensus SNV candidates in ONT sequencing libraries (Supplementary Data 6). We detected 119 SNVs (sensitivity = 76.3%) in the Illumina comparison set and 25 false-positives (precision = 82.6%; Supplementary Data 6). Read-count frequencies for variants identified with both technologies were correlated (R2 = 0.69), indicating that these were bona fide variants, rather than sequencing artefacts (Fig. 1c). While the overall performance of sub-consensus SNV detection was quite poor, most false-positives and false-negatives were confined to the lower end of the frequency range assessed here (Fig. 1c, d). For example, SNVs at high (60–80%) and intermediate (40–60%) sub-consensus frequencies were detected with relatively high sensitivity (95.7%, 91.3%) and precision (100%, 97.7%), whereas low-frequency variants (20–40%) were detected with low sensitivity (63.2%) and precision (69.6%; Fig. 1d). Unsurprisingly, the high rate of indel errors in ONT sequencing libraries meant that they were unsuitable for detecting indel diversity, with errors overwhelming true variants (Supplementary Data 6). In summary, ONT sequencing enabled detection of within-specimen SNVs at frequencies from ~40–80% with adequate accuracy but was generally unsuitable for the detection of indels or rare SNVs (<40%).

Detection of structural variation

Large genomic deletions or rearrangements can have a major impact on virus function and evolution, however, there are currently just a few reported cases of SARS-CoV-2 specimens harbouring structural variants (SVs)[15,34]. Therefore, we next evaluated the detection of SVs in SARS-CoV-2 specimens with ONT sequencing. We used NGMLR-Sniffles to identify potential SVs in ONT libraries and validated these with supporting evidence from short-read sequencing (see Methods section). Across all SARS-CoV-2 patient specimens, we detected 16 candidate deletions ranging in size from 15 to 1840 bp (Table 3), while no other SV types were identified. Of these, 13/16 were supported by split short-read alignments and/or discordant read-pairs in matched Illumina libraries (Supplementary Fig. 4a and Table 3). For 7/16 candidates, short-read evidence confirmed the presence of the deletion but indicated that the breakpoint position was not accurately placed by ONT reads (Supplementary Fig. 4b and Table 3). Among the thirteen deletions detected by both platforms were examples in genes S, M, N, ORF3, ORF6, ORF8 and orf1ab (Table 3). Only one variant, a 328-bp deletion in ORF8 (Supplementary Fig. S4c), was detected in multiple specimens, although highly similar (but not identical) 28 bp and 29 bp deletions were also detected in S in two unrelated specimens (Supplementary Fig. 4d).
Table 3

Detection of structural variation in SARS-CoV-2 specimens with ONT sequencing.

SpecimenSV typeSizePositionGeneSupporting ONT readsShort-read evidenceBreakpoint resolution
nCoV_077Deletion1518019-18034orf1ab94Yes0, 0
nCoV_087Deletion11321082-2214orf1ab48No
nCoV_088Deletion3426786-26820M75Yes0,0
nCoV_106Deletion5486004-6552orf1ab20No
nCoV_125Deletion2727263-27290ORF620Yes−2, −3
nCoV_183Deletion1525533-25548ORF341Yes−2, −2
nCoV_214Deletion2923554-23583S28Yes+1, +2
nCoV_200Deletion32827906-28234ORF8385Yes0, 0
nCoV_209Deletion6392771-3410orf1ab48Yes0, 0
nCoV_211Deletion1840509-2349orf1ab22No
nCoV_225Deletion32827906-28234ORF8387Yes0, 0
nCoV_235Deletion3726783-26820M21Yes+3, +4
nCoV_249Deletion7022664-3366orf1ab52Yes−1, 0
nCoV_164Deletion58822690-23278S59Yes+1, +4
nCoV_083Deletion2823554-23582S38Yes0, 0
nCoV_083Deletion1329478-29491N36Yes+1, +1
Detection of structural variation in SARS-CoV-2 specimens with ONT sequencing. Overall, this analysis demonstrates that large deletions can be reliably detected using ONT sequencing and suggests that structural variation in the SARS-CoV-2 genome is more common and diverse than currently appreciated.

Discussion

Viral WGS can be used to study the transmission and evolution of SARS-CoV-2, and is increasingly recognised as a critical tool for public health responses to COVID-19. Nanopore sequencing offers an alternative to established short-read platforms for viral WGS with several advantages. ONT devices: (i) are relatively inexpensive, highly portable and require minimal associated laboratory infrastructure; (ii) enable rapid generation of sequencing data and even real-time data analysis; (iii) require comparatively simple procedures for library preparation and; (iv) offer flexibility in sample throughput, accommodating single (e.g., Flongle), multiple (e.g., MinION/GridION) or tens/hundreds (e.g., PromethION) of specimens per flow-cell[16,18]. Therefore, ONT sequencing could further empower SARS-CoV-2 surveillance initiatives by enabling point-of-care WGS analysis and improved turnaround time for critical cases, particularly in isolated or poorly resourced settings[35]. Due to the relatively low mutation rate observed in SARS-CoV-2[26], accurate sequence determination is vital to correctly define the phylogenetic structure of disease outbreaks. With ONT sequencing known to exhibit higher read-level sequencing error rates than short-read technologies[23-25], reasonable concerns exist about suitability of the technology for SARS-CoV-2 genomics. Moreover, public databases for SARS-CoV-2 data (e.g., GISAID: https://www.gisaid.org/) already contain consensus genome sequences generated via ONT sequencing, potentially confounding investigations that rely on these resources. The present study resolves these concerns, demonstrating accurate consensus-level SARS-CoV-2 sequence determination with ONT data. We report that: (i) variants at consensus-level read-count frequencies (80–100%) were detected with >99% sensitivity and >99% precision across 157 SARS-CoV-2-positive specimens, confirming the suitability of ONT sequencing for standard phylogenetic analyses; (ii) high accuracy and reproducibly was achieved by each of two alternative tools for ONT variant detection, with Nanopolish showing modest improvements over Medaka; (iii) a minimum ~60-fold sequencing depth was required to ensure accurate detection of SNVs, but little or no improvement was achieved above this level; (iv) false-positive and false-negative variants were typically observed at low-complexity sequences, with fidelity improved by excluding these problematic sites; (v) in contrast to consensus SNVs, ONT sequencing performed poorly in the detection of consensus indels or low-frequency variants (such variants should therefore be interpreted with caution); (vi) while the high indel error rate in ONT sequencing impedes accurate detection of small indels, long nanopore reads appear well-suited for the detection of large deletions and potentially other structural variants. Although SNVs alone are sufficient for routine phylogenetic analysis, small indels and large structural variants can profoundly impact gene function and are, therefore, of interest to studies of virus evolution and pathogenicity[15]. As the first systematic evaluation of nanopore sequencing for SARS-CoV-2 WGS, this study removes an important barrier to its widespread adoption in the ongoing COVID-19 pandemic. While short-read sequencing platforms remain the gold-standard for high-throughput viral sequencing, the advantages to portability, cost and turnaround-time afforded by nanopore sequencing imply that this emerging technology can serve an important complementary role in local, national and international COVID-19 response strategies.

Methods

Synthetic RNA controls

Synthetic controls used in this study were manufactured by Twist Biosciences and are commercially available (Catalog item 101024). The controls comprise synthetic RNA generated by in vitro transcription (IVT) of the SARS-CoV-2 genome sequence, representing the complete genome in 6 × ~5 kb continuous sequences. The controls used in this study are identical in sequence to the Wuhan-Hu-1 reference strain (MN908947.3), allowing sequencing artefacts to be readily identified. Synthetic controls were prepared for sequencing via a protocol established by the ARTIC network for viral surveillance (https://artic.network/ncov-2019). Briefly, reverse-transcription was performed on aliquots of synthetic RNA (at 106 copies per μL) using Superscript IV (Thermo Fisher Scientific) with both random hexamers and oligo-dT primers. Prepared cDNA was then amplified using multiplexed PCR with 98 × ~400 bp amplicons tiling the SARS-CoV-2 genome (ARTIC V3 primer set; see Supplementary Data 7). Amplification was performed with Q5 Hotstart DNA Polymerase (New England Biolabs) with 1.5 μL of cDNA per reaction. PCR products were cleaned using AMPure XP beads (0.8X bead ratio), quantified using a Qubit fluorometer (Thermo Fisher Scientific) and partitioned into separate aliquots for analysis by short-read and nanopore sequencing. We note that it is not possible to amplify the entire SARS-CoV-2 genome in this way, since amplicons that span boundaries of the 6 × ~5-kb IVT products necessarily fail. Nevertheless, we were able to evaluate ~95% of the SARS-CoV-2 genome sequence.

SARS-CoV-2 specimens

SARS-CoV-2-positive extracts from 157 cases, tested at NSW Health Pathology East Serology and Virology Division (SaViD), were retrieved from storage and included in this study. Wherever relevant, ethical regulations for work with human participants with informed consent were observed, with oversight by HREC at South Eastern Sydney Local Health District (SESLHD; 2020/ETH00287). All specimens were nasopharyngeal swabs originating from patients in New South Wales during March–April 2020. Specimens underwent total nucleic acid extraction using the Roche MagNA Pure DNA and total NA kit on an automated extraction instrument (MagNA pure 96). Reverse-transcription was performed on viral RNA extracts using Superscript IV VILO Master Mix (Thermo Fisher), which contains both random hexamers and oligo-dT primers. Prepared cDNA was then amplified separately with each of 14 × ~2.5-kb amplicons tiling the SARS-CoV-2 genome, as described elsewhere[6] (see Supplementary Data 7). Amplification was performed with Platinum SuperFi Green PCR Mastermix (Thermo Fisher) with 1.5 μL of cDNA per reaction. PCR products were cleaned using AMPure XP beads (0.8X bead ratio), quantified using PicoGreen dsDNA Assay (Thermo Fisher). All 14x amplicon products from a given sample were then pooled at equal abundance and partitioned into separate aliquots for analysis by short-read and nanopore sequencing. This strategy ensured that any sequence artefacts potentially introduced during reverse-transcription and/or PCR amplification were common to matched ONT/Illumina samples, so would not be interpreted as false-positive/negatives during technology comparison. Technical replicates were generated by reamplification and sequencing of existing RNA extracts (not by re-extraction).

Short-read sequencing

Pooled amplicons were prepped for short-read sequencing using the Illumina DNA Prep Kit, according to the manufacturer’s protocol. Samples were multiplexed using Nextera DNA CD Indexes and sequenced on an Illumina MiSeq. Within each sequencing lane, a blank sample was also prepared and sequenced, in order to monitor for contamination and/or index swapping between samples. The resulting reads were aligned to the Wuhan-Hu-1 reference genome (MN908947.3) using bwa mem (0.7.12-r1039)[36]. Primer sequences were trimmed from the termini of read alignments using iVar (1.0)[37]. Trimmed alignments were converted to pileup format using samtools mpileup (v1.9)[38], with anomalous read pairs retained (--count-orphans), base alignment quality disabled (--no-BAQ) and all bases considered, regardless of PHRED quality (--min-BQ 0). Variants were identified using bcftools call (v1.9)[38], assuming a ploidy of 1 (--ploidy 1), then filtered for a minimum read depth of 30 and minimum quality of 20. Variants were classified according to their read-count frequencies as consensus (>80% reads supporting the variant) or sub-consensus (20–80%) variants, with the latter further divided into high (60–80%), intermediate (40–60%) or low-frequency (20–40%). Variants at read-count frequencies below 20% were considered to be potentially spurious and excluded on this basis.

Nanopore sequencing

ARTIC amplicons (~400 bp) from the synthetic RNA controls were prepared for nanopore using the ONT Native Barcoding Expansion kit (EXP-NBD104). The longer amplicons (~2.5 kb) used on SARS-CoV-2 patient specimens were prepared for nanopore sequencing using the ONT Rapid Barcoding Kit (SQK-RBK004). Both kits were used according to the manufacturer’s protocol. Up to 12 samples were multiplexed on a FLO‐FLG001, FLO-MIN106D or FLO-PRO002 or flow-cell and sequenced on a GridION X5 or PromethION P24 device, respectively. In addition, a no-template negative control from the PCR amplification step was prepared in parallel and sequenced on each flow-cell (Supplementary Data 8). The RAMPART (v1.0.6) software package[39] was used to monitor sequencing performance in real-time, with runs proceeding until a minimum ~200-fold coverage was achieved across all amplicons. At this point, the run was terminated and the flow-cell washed using the ONT Flow Cell Wash kit (EXP-WSH003), allowing re-use in subsequent runs. The resulting reads were basecalled using Guppy (4.0.14) and aligned to the Wuhan-Hu-1 reference genome (MN908947.3) using minimap2 (2.17-r941)[40]. The ARTIC tool align_trim was used to trim primer sequences from the termini of read alignments and cap sequencing depth at a maximum of 400-fold coverage. Consensus-level variant candidates were identified using each of two workflows developed by ARTIC (https://github.com/artic-network/artic-ncov2019), using Nanopolish[41] or Medaka (0.11.5) to variants, respectively. Nanopolish variants candidates were filtered directly with the ARTIC artic_vcf_filter tool, while Medaka candidates were evaluated by LongShot (0.4.1)[42] before filtering. Sub-consensus level variant candidates were identified using Varscan2 (v2.4.3)[43].

Performance evaluation

For synthetic RNA controls, read-level quality metrics, such as sequencing error rates, were derived from read alignments using pysamstats, with any bases that differed from the Wuhan-Hu-1 reference sequence considered errors. The accuracy of variant detection by ONT sequencing was evaluated by comparison to the set of variants identified by Illumina sequencing in matched cases. To ensure consistent representation of variants across calls generated by different programs: (i) multi-allelic variant candidates were separate into individual SNVs/indels using bcftools norm (1.9)[38]; (ii) multi-nucleotide variants were decomposed into their simplest set of individual components using rtg-tools vcfdecompose (3.10.1) and; (iii) indels at simple repeats were left-aligned using gatk LeftAlignAndTrimVariants (4.0.11.0). Variant candidates identified by Illumina/ONT could then be considered concordant based on matching genome position, reference base and alternative base/s. For a given case, variant candidates identified with ONT and Illumina were classified as true-positives (TPs), candidates identified by ONT but not Illumina as false-positives (FPs) and candidates identified by Illumina but not ONT as false-negatives (FNs). The following statistical definitions were used to evaluate results:

Structural variation

To identify structural variation, nanopore reads were re-aligned to the Wuhan-Hu-1 reference genome (MN908947.3) using the rearrangement-aware aligner NGMLR (v0.2.7)[44]. Sniffles (v1.0.11)[44] was then used to detect candidate variants with a minimum length of 10 bp and ≥20 supporting reads. To validate SVs detected with ONT alignments, split short-read alignments and discordant read-pairs were extracted from matched Illumina libraries using lumpy[45]. Variant candidates were then manually inspected to verify evidence from ONT and short-reads and assess breakpoint position reFurther information on research design is available in the Nature Research Reporting Summary linked to this article.solution.
  44 in total

1.  Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population.

Authors:  Marco Vignuzzi; Jeffrey K Stone; Jamie J Arnold; Craig E Cameron; Raul Andino
Journal:  Nature       Date:  2005-12-04       Impact factor: 49.962

2.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

Review 3.  Reference standards for next-generation sequencing.

Authors:  Simon A Hardwick; Ira W Deveson; Tim R Mercer
Journal:  Nat Rev Genet       Date:  2017-06-19       Impact factor: 53.242

4.  Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling.

Authors:  Rebecca J Rockett; Alicia Arnott; Connie Lam; Rosemarie Sadsad; Verlaine Timms; Karen-Ann Gray; John-Sebastian Eden; Sheryl Chang; Mailie Gall; Jenny Draper; Eby M Sim; Nathan L Bachmann; Ian Carter; Kerri Basile; Roy Byun; Matthew V O'Sullivan; Sharon C-A Chen; Susan Maddocks; Tania C Sorrell; Dominic E Dwyer; Edward C Holmes; Jen Kok; Mikhail Prokopenko; Vitali Sintchenko
Journal:  Nat Med       Date:  2020-07-09       Impact factor: 53.440

5.  Early Spread of SARS-CoV-2 in the Icelandic Population. Reply.

Authors:  Daniel F Gudbjartsson; Kari Stefansson
Journal:  N Engl J Med       Date:  2020-11-04       Impact factor: 91.245

6.  Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection.

Authors:  Matthew R Henn; Christian L Boutwell; Patrick Charlebois; Niall J Lennon; Karen A Power; Alexander R Macalalad; Aaron M Berlin; Christine M Malboeuf; Elizabeth M Ryan; Sante Gnerre; Michael C Zody; Rachel L Erlich; Lisa M Green; Andrew Berical; Yaoyu Wang; Monica Casali; Hendrik Streeck; Allyson K Bloom; Tim Dudek; Damien Tully; Ruchi Newman; Karen L Axten; Adrianne D Gladden; Laura Battis; Michael Kemper; Qiandong Zeng; Terrance P Shea; Sharvari Gujja; Carmen Zedlack; Olivier Gasser; Christian Brander; Christoph Hess; Huldrych F Günthard; Zabrina L Brumme; Chanson J Brumme; Suzane Bazner; Jenna Rychert; Jake P Tinsley; Ken H Mayer; Eric Rosenberg; Florencia Pereyra; Joshua Z Levin; Sarah K Young; Heiko Jessen; Marcus Altfeld; Bruce W Birren; Bruce D Walker; Todd M Allen
Journal:  PLoS Pathog       Date:  2012-03-08       Impact factor: 6.823

7.  RAMPART: a workflow management system for de novo genome assembly.

Authors:  Daniel Mapleson; Nizar Drou; David Swarbreck
Journal:  Bioinformatics       Date:  2015-01-30       Impact factor: 6.937

8.  Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella.

Authors:  Joshua Quick; Philip Ashton; Szymon Calus; Carole Chatt; Savita Gossain; Jeremy Hawker; Satheesh Nair; Keith Neal; Kathy Nye; Tansy Peters; Elizabeth De Pinna; Esther Robinson; Keith Struthers; Mark Webber; Andrew Catto; Timothy J Dallman; Peter Hawkey; Nicholas J Loman
Journal:  Genome Biol       Date:  2015-05-30       Impact factor: 13.583

9.  Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.

Authors:  Peter Edge; Vikas Bansal
Journal:  Nat Commun       Date:  2019-10-11       Impact factor: 14.919

10.  Genopo: a nanopore sequencing analysis toolkit for portable Android devices.

Authors:  Hiruna Samarakoon; Sanoj Punchihewa; Anjana Senanayake; Jillian M Hammond; Igor Stevanovski; James M Ferguson; Roshan Ragel; Hasindu Gamaarachchi; Ira W Deveson
Journal:  Commun Biol       Date:  2020-09-29
View more
  63 in total

1.  Genomic analysis of human noroviruses using combined Illumina-Nanopore data.

Authors:  Annika Flint; Spencer Reaume; Jennifer Harlow; Emily Hoover; Kelly Weedmark; Neda Nasheri
Journal:  Virus Evol       Date:  2021-09-15

2.  InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses.

Authors:  James M Ferguson; Hasindu Gamaarachchi; Thanh Nguyen; Alyne Gollon; Stephanie Tong; Chiara Aquilina-Reid; Rachel Bowen-James; Ira W Deveson
Journal:  Bioinformatics       Date:  2021-12-15       Impact factor: 6.937

3.  Genomic evidence for divergent co-infections of co-circulating SARS-CoV-2 lineages.

Authors:  Hang-Yu Zhou; Ye-Xiao Cheng; Lin Xu; Jia-Ying Li; Chen-Yue Tao; Cheng-Yang Ji; Na Han; Rong Yang; Hui Wu; Yaling Li; Aiping Wu
Journal:  Comput Struct Biotechnol J       Date:  2022-07-28       Impact factor: 6.155

4.  Rapid repeat infection of SARS-CoV-2 by two highly distinct delta-lineage viruses.

Authors:  Andrew J Gorzalski; Christina Boyles; Victoria Sepcic; Subhash Verma; Joel Sevinsky; Kevin Libuit; Stephanie Van Hooser; Mark W Pandori
Journal:  Diagn Microbiol Infect Dis       Date:  2022-06-22       Impact factor: 2.983

5.  Molecular Analysis of SARS-CoV-2 Lineages in Armenia.

Authors:  Diana Avetyan; Siras Hakobyan; Maria Nikoghosyan; Lilit Ghukasyan; Gisane Khachatryan; Tamara Sirunyan; Nelli Muradyan; Roksana Zakharyan; Andranik Chavushyan; Varduhi Hayrapetyan; Anahit Hovhannisyan; Shah A Mohamed Bakhash; Keith R Jerome; Pavitra Roychoudhury; Alexander L Greninger; Lyudmila Niazyan; Mher Davidyants; Gayane Melik-Andreasyan; Shushan Sargsyan; Lilit Nersisyan; Arsen Arakelyan
Journal:  Viruses       Date:  2022-05-17       Impact factor: 5.818

6.  Genomic epidemiology of COVID-19 in care homes in the east of England.

Authors:  William L Hamilton; Gerry Tonkin-Hill; Emily R Smith; Dinesh Aggarwal; Charlotte J Houldcroft; Ben Warne; Luke W Meredith; Myra Hosmillo; Aminu S Jahun; Martin D Curran; Surendra Parmar; Laura G Caller; Sarah L Caddy; Fahad A Khokhar; Anna Yakovleva; Grant Hall; Theresa Feltwell; Malte L Pinckert; Iliana Georgana; Yasmin Chaudhry; Colin S Brown; Sonia Gonçalves; Roberto Amato; Ewan M Harrison; Nicholas M Brown; Mathew A Beale; Michael Spencer Chapman; David K Jackson; Ian Johnston; Alex Alderton; John Sillitoe; Cordelia Langford; Gordon Dougan; Sharon J Peacock; Dominic P Kwiatowski; Ian G Goodfellow; M Estee Torok
Journal:  Elife       Date:  2021-03-02       Impact factor: 8.140

7.  Rapid, point-of-care antigen and molecular-based tests for diagnosis of SARS-CoV-2 infection.

Authors:  Jacqueline Dinnes; Jonathan J Deeks; Sarah Berhane; Melissa Taylor; Ada Adriano; Clare Davenport; Sabine Dittrich; Devy Emperador; Yemisi Takwoingi; Jane Cunningham; Sophie Beese; Julie Domen; Janine Dretzke; Lavinia Ferrante di Ruffano; Isobel M Harris; Malcolm J Price; Sian Taylor-Phillips; Lotty Hooft; Mariska Mg Leeflang; Matthew Df McInnes; René Spijker; Ann Van den Bruel
Journal:  Cochrane Database Syst Rev       Date:  2021-03-24

8.  Analysis of direct-acting antiviral-resistant hepatitis C virus haplotype diversity by single-molecule and long-read sequencing.

Authors:  Kozue Yamauchi; Mitsuaki Sato; Leona Osawa; Shuya Matsuda; Yasuyuki Komiyama; Natsuko Nakakuki; Hitomi Takada; Ryo Katoh; Masaru Muraoka; Yuichiro Suzuki; Akihisa Tatsumi; Mika Miura; Shinichi Takano; Fumitake Amemiya; Mitsuharu Fukasawa; Yasuhiro Nakayama; Tatsuya Yamaguchi; Taisuke Inoue; Shinya Maekawa; Nobuyuki Enomoto
Journal:  Hepatol Commun       Date:  2022-03-31

Review 9.  Centralised or Localised Pathogen Whole Genome Sequencing: Lessons Learnt From Implementation in a Clinical Diagnostic Laboratory.

Authors:  Alicia G Beukers; Frances Jenkins; Sebastiaan J van Hal
Journal:  Front Cell Infect Microbiol       Date:  2021-05-18       Impact factor: 5.293

10.  Evaluation of Oxford Nanopore MinION RNA-Seq Performance for Human Primary Cells.

Authors:  Ilaria Massaiu; Paola Songia; Mattia Chiesa; Vincenza Valerio; Donato Moschetta; Valentina Alfieri; Veronika A Myasoedova; Michael Schmid; Luca Cassetta; Gualtiero I Colombo; Yuri D'Alessandra; Paolo Poggio
Journal:  Int J Mol Sci       Date:  2021-06-12       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.