Literature DB >> 34963097

SARS-CoV-2 variant detection with ADSSpike.

Daniel Castañeda-Mogollón¹, Claire Kamaliddin¹, Laura Fine¹, Lisa K Oberding¹, Dylan R Pillai².

Abstract

The SARS-CoV-2 coronavirus pandemic has been an unprecedented challenge to global pandemic response and preparedness. With the continuous appearance of new SARS-CoV-2 variants, it is imperative to implement tools for genomic surveillance and diagnosis in order to decrease viral transmission and prevalence. The ADSSpike workflow was developed with the goal of identifying signature SNPs from the S gene associated with SARS-CoV-2 variants through amplicon deep sequencing. Seventy-two samples were sequenced, and 30 mutations were identified. Among those, signature SNPs were linked to 2 Zeta-VOI (P.2) samples and one to the Alpha-VOC (B.1.17). An average depth of 700 reads was found to properlycorrectly identify all SNPs and deletions pertinent to SARS-CoV-2 mutants. ADSSpike is the first workflow to provide a practical, cost-effective, and scalable solution to diagnose SARS-CoV-2 VOC/VOI in the clinical laboratory, adding a valuable tool to public health measures to fight the COVID-19 pandemic for approximately $41.85 USD/reaction.

Entities: Chemical

Keywords: Amplicon deep sequencing; S gene; SARS-CoV-2; Variants of concern; Variants of interest

Mesh：

Year: 2021 PMID： 34963097 PMCID： PMC8608664 DOI： 10.1016/j.diagmicrobio.2021.115606

Source DB: PubMed Journal: Diagn Microbiol Infect Dis ISSN： 0732-8893 Impact factor: 2.803

Introduction

The COVID-19 pandemic's causative agent is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), first identified at the end of 2019 in Wuhan, China (Wu et al., 2020). Following the appearance of mutations in specific SARS-CoV-2 genes, different viral phenotypes have emerged as the main threat to epidemic control. These viral variants, known as variants of concern (VOC), and variants of interest (VOI) are mutants that are associated with either increased transmissibility, changes in epidemiological patterns or clinical presentation, increased virulence, decreased effectiveness of public health and social measures for epidemic control, and/or decreased effectiveness of available diagnostics, vaccines, and/or therapeutics. Genomic surveillance of these mutants is essential for monitoring viral spread and implementing targeted measures to reduce VOC/VOI transmission. Current surveillance strategies rely on whole genome sequencing (WGS) for identification of VOC/VOI, or assays to monitor and detect known mutations. WGS using next generation sequencing (NGS) approaches are based on commercially available sequencing kits or international consortia-published protocols such as those from the ARTIC network (COVID-19 ATIC, 2020). The majority of the targeted assays rely on PCR amplification of a genomic region followed by amplicon Sanger sequencing (Cabral et al., 2021; Sanger Sequencing Solutions for SARS-CoV-2 Research - CA). Other assays are based on reverse transcription quantitative polymerase chain reaction (RT-qPCR) using TaqMan probes targeting specific SNPs (Bier et al., 2021), amplicon melting curve analysis, or high range melting analysis (Banada et al., 2021; Diaz-Garcia et al., 2021). These targeted assays are limited to currently known SNPs related to VOC/VOI and do not have the potential for rapid identification of new mutations, especially during a pandemic where turnaround times are a critical factor for molecular surveillance. The SARS-CoV-2 S gene encodes for the surface glycoprotein of the virus which mediates viral adhesion and entry into human host cells. Mutations in the S-gene sequence in VOC/VOI have resulted in amino acid changes in the S protein that affect the binding to the human cell receptor ACE (Boehm et al., 2021; Hoffmann et al., 2020), antibody recognition, vaccine efficacy, and antibody therapy efficacy (Boehm et al., 2021). In addition, this gene represents an attractive target for SARS-CoV-2 VOC/VOI identification, as signature SNPs, insertions, and deletions can be easily identified in this region through amplicon deep sequencing (ADS). Here, a novel workflow using ADS of the S gene (ADSSpike) was implemented to provide a scalable, high throughput, and unbiased identification workflow for VOC/VOI from clinical SARS-CoV-2 samples. In addition, ADSSpike simplifies the ARTIC Illumina V3 protocol (COVID-19 ARTIC, 2020) by incorporating Illumina overhangs into PCR primers targeting 400 bp regions spanning the SARS-CoV-2 S gene, with the aim of reducing PCR cycles and thus chimera formation (Lahr and Katz, 2009; Sze and Schloss, 2019). To our knowledge, ADSSpike is one of the first workflows to identify SNPs by employing a SARS-CoV-2 S gene-targeted amplicon deep sequencing approach across SARS-CoV-2 specimens. A previous pipeline discusses a similar methodology; however several pitfalls compared to this pipeline were observed Fass et al., 2021.

Methodology

Ethics statement

Ethical approval was obtained from the Conjoint Health Research Ethics Board of the University of Calgary (REB 20-0567, REB 20-0402). All archived specimens were de-identified prior to analysis in this study.

Sample collection and nucleic acid extraction

Clinical nasopharyngeal swab and throat swab specimens (n = 72) were collected and tested for SARS-CoV-2 by Alberta Precision Laboratories between March 2020 and February 2021 as previously described (Pabbaraju et al., 2021). RNA was extracted from 90 to 120 µL of sample using the Qiagen QIAamp® Viral RNA Mini Kit (Cat. No/ID 52906, Qiagen, USA), following the manufacturer's protocol with slight modifications. Briefly, samples were digested with proteinase K for 10 minutes at 56°C prior to extraction, and treated with DNAse I (Promega) to remove any remaining DNA. Obtained RNA was eluted in 2 centrifugation rounds of 40 µL of nuclease-free water.

Amplicon deep sequencing (ADS) strategy

The ADS pipeline (Fig. 1 ) was designed to optimize sample preparation time. Briefly, extracted RNA was subjected to cDNA synthesis before PCR amplification. Fourteen pairs of primers spanning the S gene and its adjacent regions, generating ∼400 bp amplicons, were designed and adapted from published protocols (COVID-19 ARTIC, 2020) (Supplementary Table 1). The PCR amplification step was designed to include the Illumina adapters (thus allowing PCR products to move directly to library preparation and sequencing while reducing PCR artifacts and chimeric reads. Illumina adaptors fused to the locus-specific primers is a strategy that has been used in the past from previous protocols and was adapted from current protocols for SNP calling and haplotype screening (Schnell et al., 2015; 16S Metagenomic Sequencing Library Preparation) with each primer pair). The performance of a pooling strategy was evaluated, comparing the sequencing results from PCR conducted either individually (iPCR, conducting 14 individual PCR reactions each yielding 400 bp amplicons), or as a pooled PCR (pPCR) strategy (one-pot PCR reaction using the 14 primer pairs, generating mixed 400 bp amplicons from each primer set). For each sample, the iPCR products were pooled prior to purification and subsequent analysis (Supplementary Fig. 1).

Fig. 1

Spike gene amplicon deep sequencing (ADS) pipeline named ADSSpike. The presented workflow was divided among 4 steps: sample processing, PCR and library preparation, sequencing, and data analysis (read filtering, alignment, SNP calling, and variant assessment).

cDNA synthesis and PCR amplification of the SARS-CoV-2 S Gene

The cDNA synthesis and PCR amplification of the SARS-CoV-2 S gene are detailed in the Supplementary data. Briefly, the cDNA synthesis was performed using the LunaScript® RT Supermix (New England Biolabs, #E3010L) following the manufacturer's protocol. The S gene and its adjacent regions were amplified using 14 primer pairs, followed by a single PCR run and SPRIselect bead purification (Beckman Life Science, USA). Three nuclease-free water negative controls were used to assess potential amplicon contamination in the lab, as well as to help tuning the parameters for SNP calling (Table 1 ).

Table 1

Summary of the clinical samples used in this study.

Variables	SARS-CoV-2 positive (n = 68)	SARS-CoV-2 negative (n = 4)	P-value
Anatomical swabbing site (n = 72)			0.3058
Nasopharyngeal (n = 50)	46	4
Oropharyngeal (n = 22)	22	0
Swab and transport media (n = 72)	68	4	0.2567
Saline (n = 16)	16	0
Aptima (n = 19)	19	0
E-swab (n = 5)	5	0
UTM (n = 35)	28	4
RT-PCR E gene Ct value (n = 68), mean ±SD [min, max]	22.83 ± 4.77 [17.28, 35.73]	N/A

Summary of the clinical samples used in this study.

Library preparation and sequencing

Fifteen µL of pooled purified amplicons per sample was sent to the Center for Health Genomics and Informatics (CHGI) at the University of Calgary for library preparation and sequencing. Each sample was indexed using the Nextera XT Index Kit V2 along with KAPA HiFi polymerase. The indexed libraries were pooled and sequenced in an Illumina MiSeq instrument (San Diego, CA) in paired-end mode (2 × 250 bp), using the Illumina MiSeq Reagent Kit V2 Nano (500 cycles), for a total of 1 million reads, with a 5% spike of Enterobacteria phage PhiX control.

Read filtering and variant calling parameter optimization

Samples were submitted to the IDseq pipeline for adapter and primer trimming, removal of low-quality reads, and removal of host sequences (Kalantar et al., 2020). Reads were mapped using the Burrows Wheeler Aligner (BWA) (Li and Durbin, 2009) against a SARS-CoV-2 reference genome (GenBank MN908947.3 (Wu et al., 2020)). Samtools was used for file indexing and sorting (Li et al., 2009), followed by FreeBayes (Garrison and Marth, 2012) to perform SNP calling and insertion/deletion (indel) identification. To determine which parameters are best to identify SARS-CoV-2 VOC/VOI, a combination of parameters were employed and the positive predictive value (PPV) along with analytical sensitivity were computed. For this, a depth of coverage (number of reads supporting a SNP or deletion) between 5 to 40 reads was tested, along with a fraction depth between 15% to 90% (fraction of reads supporting the SNP or deletion); a Phred score of 20 was maintained across all parameter combinations. Assessment details of the PPV and sensitivity are available in the Supplementary data. Read depth and coverage, defined as the percentage of the S gene covered by at least one read, were assessed by BEDTools (Quinlan and Hall, 2010) and visualized with Tablet (Milne et al., 2010).

Statistical analysis

Descriptive statistics regarding depth and coverage were performed by including the mean ± standard deviation. Non-parametric Mann-Whitney tests were performed across the continuous variables, and a Fisher's exact test was performed across categories with discrete variables. A nonparametric Kruskal-Wallis test followed by multiple-paired comparisons were carried out to determine difference in significance amongst coverage and Ct value groups. All statistical tests were considered significant for adjusted p-values below 0.05. The statistical analysis and figure design were performed using GraphPad Prism for Mac.

Results

Included samples

Seventy-two samples were selected and tested with ADSSpike (68 were positive for SARS-CoV-2, and 4 were negative). This sample size was selected to aim for an average of 1,000 reads per amplicon per positive sample, for a total of 14,000 reads per positive sample. Of the tested samples, 50 were from nasopharyngeal swab, and 22 were from throat swab. The majority of the samples were stored in UTM (n = 35), followed by Aptima (n = 19), saline (n = 16), and E-swab (n = 5). Ct values for the E gene RT-PCR amongst the positive samples varied from 17.28 to 35.73, with a mean of 22.83. Samples were previously sequenced by an Illumina COVIDSeq SARS-CoV-2 NGS test, where 3 samples were VOC/VOI (2 samples were Zeta/P.2, one sample was Alpha/B.1.1.7).

Preliminary sequencing assessment

A significant difference was observed in the purified DNA concentration after S gene amplification when samples were grouped by SARS-CoV-2 diagnosis (positive: 13.8 ± 15.0 vs negative: 1.0 ± 0.2 ng/µL; P-value = 0.03, Mann-Whitney test). Amplicons from 79 samples, including negative controls and paired samples, were sequenced on Illumina MiSeq in paired-end mode. A total of 835,676 paired raw reads were retrieved; of all reads, 1,210,743 were properly mapped to the S gene reference sequence, with a median of 10,610 (Q1 and Q3: [4432, 14877]), and a mean of 15,325 reads per sample. The positive samples displayed an average read depth of 869.81 ± 477.37, while the negative samples had an average read depth of 30.68 ± 66.31 (Fig. 2 A). Analysis of read depth by Ct value groups displayed an expected pattern where the lowest Ct values tended to have a higher read depth across the S gene (Fig. 2B). An average of 70.38% coverage was observed for all samples, with 76.37% ± 34.90% for the SARS-CoV-2 positive samples and 33.18% ± 13.46% for the negative samples. An analysis of coverage by Ct value grouping shows a similar trend, where there is a significantly different mean across groups (Kruskal-Wallis test; P-value < 0.0001) (Fig. 2C). A multiple comparison paired-analysis test suggests significant difference by coverage between Ct values of 15 to 20 vs 25 to 30 (adjusted P-value = 0.02), 15 to 20 vs 30 to 35 (adjusted P-value = 0.0002), and 20 to 25 vs 30 to 35 (adjusted P-value = 0.0003), detailed in the Supplementary data.

Fig. 2

Read depth and coverage assessment. (A) Plot displaying the mean read depth distribution by individual nucleotide position. Depth was assessed by SARS-CoV-2 positive iPCR samples, SARS-CoV-2 negative samples, and negative controls (NFWiPCR). The dashed purple line represents a 700 read depth. (B) Plot displaying the mean read depth distribution by individual nucleotide position amongst Ct value groups. Depth was assessed by SARS-CoV-2 positive iPCR samples. (C) Side by side box plot of coverage by Ct value groups assessed by a Kruskal-Wallis test and multiple-paired comparisons (*: adjusted P-value < 0.05; **: adjusted P-value < 0.01; ***: adjusted P-value < 0.001). (D) Dot plot displaying the position-specific depth of the identified SNPs for the VOC/VOI (left) vs missed SNPs. The blue dashed line represents the lowest depth for a called SNP, and the red dashed line represents the highest depth recorded for a missed SNP.

Assessment of parameter selection for variant calling

Seven parameter combinations of read depth and fraction depth were analyzed for SARS-CoV-2 variant calling (Table 2 ). Our results indicate that the highest PPV and sensitivity were obtained by 3 combinations: (1) read depth of 10 and depth fraction of 50%, (2) read depth of 40 and depth fraction of 60%, and (3) read depth of 40 and depth fraction of 70%. These combinations were able to capture all the signature SNPs pertinent to the Alpha/B.1.1.7 strain and one of the Zeta/P.2 VOI samples. Only one SNP was captured with the remaining Zeta/P.2 strain by any of the combinations. Upon closer inspection, no reads were generated in the region where the remaining 2 SNPs were expected for the Zeta sample. The rest of the parameter combinations were not stringent enough (sacrificing positive predictive value, and therefore analytical specificity), or were too robust to capture real and expected SNPs (sacrificing analytical sensitivity). Because a tie occurred amongst 3 parameter combinations, a variant screening procedure was performed across all sequenced samples.

Table 2

Parameter iteration for PPV and sensitivity calculation in SARS-CoV-2 VOC/VOI.

Parameter combination	Alpha/B.1.1.7 VOC PPV and sensitivity	Zeta/P.2 VOI (sample 1) PPV and sensitivity	Zeta/P.2 VOI (sample 2) PPV and sensitivity
Read depth = 5; Depth fraction = 15%	22.5%; 100%	9.67%; 100%	1.47%; 33.33%
Read depth = 10; Depth fraction = 33%	100%; 100%	100%; 100%	16.67%; 33.33%
Read depth = 10; Depth fraction = 50%	100%; 100%	100%; 100%	100%; 33.33%
Read depth = 40; Depth fraction = 60%	100%; 100%	100%; 100%	100%; 33.33%
Read depth = 40; Depth fraction = 70%	100%; 100%	100%; 100%	100%; 33.33%
Read depth = 40; Depth fraction = 80%	100%; 88.89%	100%; 100%	100%; 33.33%
Read depth = 40; Depth fraction = 90%	100%; 77.78%	100%; 66.67%	100%; 33.33%

Parameter iteration for PPV and sensitivity calculation in SARS-CoV-2 VOC/VOI. Variant calling between the top 3 parameters was performed. Our analysis revealed that a read depth of 10 and a depth fraction of 50% recorded 121 SNPs and deletions. A read depth of 40 and depth fraction of 60% revealed a total of 119 SNPs and deletions, and a read depth of 40 and depth fraction of 70% showed 118 SNPs and deletions. After analysing each of the called variants amongst these sets of combinations, our analysis revealed that 4 false positives were detected by the first approach, 3 false positives and 2 false negatives by the second, and 3 false positives and 3 false negatives by the third. This suggests that a read depth of 10 and a depth fraction of 50% has a PPV of 96.69%, while a read depth of 40 with a depth fraction of 60% and 70% had a PPV of 97.45% and 97.43%, respectively (Supplementary table 2). Nevertheless, when employing a read depth of 40% and a fraction depth of 60% and 70%, 2 and 3 false negatives were observed, decreasing the sensitivity to 98.29% and 97.43%, respectively. This revealed that a read depth of 10 and a depth fraction of 50% had gave the optimum values for SNP calling and SARS-CoV-2 identification, and were therefore used for variant analysis.

Read depth as a proxy for accurate SNP calling

The previously sequenced VOC/VOI samples, P739 (Alpha/B.1.1.7-VOC; Ct value 27.17), P743 (Zeta/P.2-VOI; Ct value 23.68), and P744 (Zeta/P.2-VOI; Ctvalue 30.17), were used as controls to assess the identification of VOC/VOI from the S gene sequences. The samples had average read depths of 713, 699, and 406, respectively. The P739 sample contained all S gene mutations pertinent to the Alpha/B.1.1.7-VOC (Δ69/70, Δ144/145, N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H). The P743 sample contained all the S gene mutations associated with the Zeta/P.2-VOI (E484K, D614G, and V1176F); in contrast, the P744 sample only showed the V1176F VOC-flagged mutant, suggesting a minimum average read depth of 700 may be required to identify all SNPs pertinent to VOC/VOI. A depth of 97 was found to be the lowest acceptable for accurate SNP calling, while a depth of 536 was the highest observed for missing a SNP (Fig. 2D). We also observed that 55/76 (72.36%) samples had an 50% coverage, a requirement for submission to the GISAID database (Shu and McCauley, 2017).

SNP calling and variant confirmation

Sixty-eight samples from SARS-CoV-2 positive patients were successfully sequenced. The negative samples did not display any SNPs (coverage = 33.18% ± 11.55%; depth: 30.68 ± 66.31). Amongst the positive samples, a total of 30 unique mutations were identified after applying the thresholds for SNP calling, with a total frequency of 121 (Fig. 3 A). Of these, 10/30 were synonymous (total frequency of 15/121), 17/30 were nonsynonymous (total frequency of 102/121), one was a nonsense mutation (total frequency of 1/121), and 2 were indels (total frequency of 3/121) (Fig. 3B). Fifteen SARS-CoV-2 positive samples did not display any SNPs. These samples featured a significantly lower read depth (mean = 168.05; SD = 105.71) compared to those with at least 1 identified SNP (mean = 1040.39; SD = 71.86) (P-value < 0.0001 Mann-Whitney test) (Fig. 3C).

Fig. 3

Identified SNPs indels SARS-CoV-2 positive samples. (A) Non synonymous mutations, deletions, and nonsense mutants identified amongst the 64 iPCR SARS-CoV-2 positive samples in the S and flanking ORF3a genes region. Blue dashed lines represent the receptor binding domain (RBD) delimited by the nucleotides 318 and 510, and its receptor binding motif (RBM). Red mutants are flagged for their presence in VOC/VOI. (B) Pie chart of the SNP type distribution. (C) Side-by-side bar chart of the read depth by SARS-CoV-2 positive samples with no SNPs detected vs samples with at least one identified SNP (D) Box plot of the 5 most prevalent mutants identified across all positive samples (n = 72). The most commonly identified mutants were D614G (frequency of 50/68 or 73.52%), followed by the Q57H mutant in the ORF3a gene (frequency of 30/68 or 44.11%), L1224F (frequency 6/68 or 8.82%), deletion 144/145 (frequency of 2/68 or 2.94%), and the synonymous R628R mutant (frequency of 2/68 or 2.94%) (Fig. 3D). In addition, 4 independent samples had the A21583G synonymous mutant encoding for L7L in the S gene, a variant that has not been recorded in the literature.

Pooling strategy assessment

An alternative strategy was evaluated in order to test the efficacy of a simplified experimental workflow (running one iPCR for each of the 14 primer pairs vs pPCR with 14 primers pairs). A highly significant difference between the aligned read depths was observed, with a mean depth of 560±237.88 for the iPCR samples and 312.87 ± 372.72 for the pPCR samples (P-value < 0.0001; paired t-test). On the 3 samples tested with the 2 approaches, a total of 5 mutations were identified in the S gene, with a positive percent agreement of 80%. One mutant was missed (Δ144/145) and had a read depth of 422 at the flanking regions of the indel by the iPCR approach (Supplementary Table 3).

Discussion

The close monitoring of SARS-CoV-2 mutants is imperative, especially for the continuous development of therapeutics, vaccines, and for better understanding of pathogenesis. This study reports the development of ADSSpike, a practical workflow for selective ADS of the SARS-CoV-2 S gene. The overall workflow is an adaptation of the international ARTIC V3 consortium protocol (COVID-19 ARTIC, 2020) and is similar to the HiSpike pipeline (Fass et al., 2021). One key difference to HiSpike is the use of its relaxed parameters for SNP calling. Indeed, screening for SNPs pertinent to SARS-CoV VOC/VOI with the parameters employed by HiSpike (read depth of 15), revealed a PPV below 25% in each case VOC/VOI analyzed (Table 2). In addition, the HiSpike workflow has no mention of the fraction of reads employed for base calling, a parameter of uttermost importance in the identification of multiple SARS-CoV-2 strains in a single host. Despite mentioning HiSpike as a cost-effective tool, no cost description is provided, which adds uncertainty to the methodology employed. The HiSpike study does not provide enough information on SNPs detected amongst negative controls or SARS-CoV-2 negative samples, and a detailed cost analysis of the pipeline is missing. On the other hand, similar results were recorded with the findings by Fass et al. in terms of coverage decline when samples had a Ct value of 30 or higher. ADSSpike was able to identify SNPs and deletions in clinical specimens that tested positive for SARS-CoV-2, and no non-specific sequences were retrieved from the negative control samples, showing the specificity of the workflow. The results presented here suggest ADSSpike is an accurate SNP detection workflow regardless of the SARS-CoV-2 lineage. ADSSpike provides a practical, cost-effective, and scalable solution to diagnose and monitor SARS-CoV-2 VOC/VOI in the clinical laboratory, adding a valuable tool to public health measures combatting the COVID-19 pandemic for approximately $41.85 USD per reaction. Current approaches to identify VOC/VOI are either based on capillary sequencing, WGS, or targeted PCR or RT-PCR assays. Targeted assays are commercially available but only cover existing mutations, meaning new SNPs can be missed (Wang et al., 2021). A comparison of cost, time, and overall advantages and limitations of SARS-CoV-2 VOC/VOI methods was performed across current methodologies (Supplementary table 4). SARS-CoV-2 VOC detection using WGS by capillary sequencing or NGS has been widely used for surveillance (Goncalves Cabecinhas et al., 2021; Miller et al., 2020; Pillay et al., 2020; Shaibu et al., 2021). In contrast to ADS, amplicon deep sequencing focuses on a specific gene or genomic region, providing a greater sequencing depth and coverage while reducing costs and data burden. Sanger-based approaches have been previously employed (Bezerra et al., 2021; Goes et al., 2021). These are either focused on a small window in the S gene, and hence missing relevant SNPs flanking said region, or target SNPs prevalent to specific VOC/VOIs. While SNPs can be identified using Sanger sequencing, mutations present in a minority of the sample population sequenced can be missed (Davidson et al., 2012; Gómez-Romero et al., 2018). ADSSpike data analysis employs thresholds based on high-quality reads, Phred score, coverage, and cut-offs of 50% of reads supporting the SNPs, as previously described (Kishikawa et al., 2019; Lerch et al., 2017; Lerch et al., 2019; Phred Quality Score - an overview | ScienceDirect Topics; Song et al., 2016). Careful evaluation and SNP screening showed these parameters to accurately call signature SNPs pertinent to VOC/VOIs while minimizing the number of false negatives. Whilst employing these parameters for variant calling, a total of 15 SARS-CoV-2 positive samples did not show any SNPs, suggesting either an identical gene to the reference employed, or insufficient to detect mutations. Because of this, the read depth analysis performed across SARS-CoV-2 positive samples suggests that a read depth below 170 is not likely to detect any SNPs. The analysis performed here suggests an average of 700 reads spanning the S gene as a minimum to detect SNPs and correlate them to SARS-CoV-2 variants. Moreover, a workflow is proposed that could increase the likelihood of detecting SNPs, and potentially a VOC/VOI, based on the initial average read depth and coverage (Supplementary Fig. 2). A limitation of the study is that the recommended 700 read depth described is based on a small sample size (3 VOC/VOI samples) and is therefore subject to a potential selection bias. To summarize, ADSSpike is one of the first workflows to target the SARS-CoV-2 S gene for VOC-VOI detection by NGS amplicon deep sequencing. Additionally, ADSSpike was able to detect the A21583G SNP in 4 independent samples, a mutation that has not been recorded in the literature. This suggests that the presented workflow can identify SNPs in particular population clusters that could be assessed for increased risk, spread, and overall transmissibility. Moreover, the method here proposed can be used as a faster and more feasible approach than WGS, with even greater potential possible through use of a one pot primer-pooled approach.

Conclusions

The emergence of SARS-CoV-2 variants calls for molecular surveillance tools. The ADSSpike workflow provides a cost-effective, scalable and practical solution for SARS-CoV-2 variant identification, including VOC/VOIs. Additionally, the experimental design has been validated with the proper controls and parameter tuning to increase its reliability for SNP calling. The primer mixture approach to amplify the S gene may reduce the time to SNP detection, which is imperative during a pandemic.

Data availability

The data that supports the results presented here are available upon reasonable request.

Author contributions

DCM, CK, LO, and DP contributed towards the experimental design and conceptualization. CK and LO performed the laboratory work. DCM performed the bioinformatics and statistical analysis. LF performed the cost analysis. DCM, CK, and LF wrote the original draft. DCM, CK, LO, LF, and DP reviewed and edited the manuscript. DP led and supervised the project. All authors read and approved the final manuscript.

28 in total

1. Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase.

Authors: Daniel J G Lahr; Laura A Katz
Journal: Biotechniques Date: 2009-10 Impact factor: 1.993

2. Improving the limit of detection for Sanger sequencing: a comparison of methodologies for KRAS variant detection.

Authors: Colin J Davidson; Emily Zeringer; Kristen J Champion; Marie-Pierre Gauthier; Fawn Wang; Jerry Boonyaratanakornkit; Julie R Jones; Edgar Schreiber
Journal: Biotechniques Date: 2012-09 Impact factor: 1.993

3. IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring.

Authors: Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina M Tato; Joseph L DeRisi
Journal: Gigascience Date: 2020-10-15 Impact factor: 6.524

4. A Sanger-based approach for scaling up screening of SARS-CoV-2 variants of interest and concern.

Authors: Matheus Filgueira Bezerra; Lais Ceschini Machado; Viviane do Carmo Vasconcelos De Carvalho; Cássia Docena; Sinval Pinto Brandão-Filho; Constância Flávia Junqueira Ayres; Marcelo Henrique Santos Paiva; Gabriel Luz Wallau
Journal: Infect Genet Evol Date: 2021-05-08 Impact factor: 3.342

5. GISAID: Global initiative on sharing all influenza data - from vision to reality.

Authors: Yuelong Shu; John McCauley
Journal: Euro Surveill Date: 2017-03-30

6. Precise detection of de novo single nucleotide variants in human genomes.

Authors: Laura Gómez-Romero; Kim Palacios-Flores; José Reyes; Delfino García; Margareta Boege; Guillermo Dávila; Margarita Flores; Michael C Schatz; Rafael Palacios
Journal: Proc Natl Acad Sci U S A Date: 2018-05-07 Impact factor: 11.205

Review 7. Novel SARS-CoV-2 variants: the pandemics within the pandemic.

Authors: Erik Boehm; Ilona Kronig; Richard A Neher; Isabella Eckerle; Pauline Vetter; Laurent Kaiser
Journal: Clin Microbiol Infect Date: 2021-05-17 Impact factor: 8.067

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

9. SARS-CoV-2 N501Y Introductions and Transmissions in Switzerland from Beginning of October 2020 to February 2021-Implementation of Swiss-Wide Diagnostic Screening and Whole Genome Sequencing.

Authors: Ana Rita Goncalves Cabecinhas; Tim Roloff; Madlen Stange; Claire Bertelli; Michael Huber; Alban Ramette; Chaoran Chen; Sarah Nadeau; Yannick Gerth; Sabine Yerly; Onya Opota; Trestan Pillonel; Tobias Schuster; Cesar M J A Metzger; Jonas Sieber; Michael Bel; Nadia Wohlwend; Christian Baumann; Michel C Koch; Pascal Bittel; Karoline Leuzinger; Myrta Brunner; Franziska Suter-Riniker; Livia Berlinger; Kirstine K Søgaard; Christiane Beckmann; Christoph Noppen; Maurice Redondo; Ingrid Steffen; Helena M B Seth-Smith; Alfredo Mari; Reto Lienhard; Martin Risch; Oliver Nolte; Isabella Eckerle; Gladys Martinetti Lucchini; Emma B Hodcroft; Richard A Neher; Tanja Stadler; Hans H Hirsch; Stephen L Leib; Lorenz Risch; Laurent Kaiser; Alexandra Trkola; Gilbert Greub; Adrian Egli
Journal: Microorganisms Date: 2021-03-25

10. A Simple Reverse Transcriptase PCR Melting-Temperature Assay To Rapidly Screen for Widely Circulating SARS-CoV-2 Variants.

Authors: Padmapriya Banada; Raquel Green; Sukalyani Banik; Abby Chopoorian; Deanna Streck; Robert Jones; Soumitesh Chakravorty; David Alland
Journal: J Clin Microbiol Date: 2021-07-21 Impact factor: 5.948