| Literature DB >> 35087848 |
Ephraim Fass1, Gal Zizelski Valenci1, Mor Rubinstein1, Paul J Freidlin1, Shira Rosencwaig1, Inna Kutikov1, Robert Werner1, Nofar Ben-Tovim1, Efrat Bucris2, Oran Erster2, Neta S Zuckerman2, Orna Mor2,3, Ella Mendelson2,3, Zeev Dveyrin1, Efrat Rorman1, Israel Nissan1.
Abstract
The changing nature of the SARS-CoV-2 pandemic poses unprecedented challenges to the world's health systems. Emerging spike gene variants jeopardize global efforts to produce immunity and reduce morbidity and mortality. These challenges require effective real-time genomic surveillance solutions that the medical community can quickly adopt. The SARS-CoV-2 spike protein mediates host receptor recognition and entry into the cell and is susceptible to generation of variants with increased transmissibility and pathogenicity. The spike protein is the primary target of neutralizing antibodies in COVID-19 patients and the most common antigen for induction of effective vaccine immunity. Tight monitoring of spike protein gene variants is key to mitigating COVID-19 spread and generation of vaccine escape mutants. Currently, SARS-CoV-2 sequencing methods are labor intensive and expensive. When sequence demands are high sequencing resources are quickly exhausted. Consequently, most SARS-CoV-2 strains are sequenced in only a few developed countries and rarely in developing regions. This poses the risk that undetected, dangerous variants will emerge. In this work, we present HiSpike, a method for high-throughput cost effective targeted next generation sequencing of the spike gene. This simple three-step method can be completed in < 30 h, can sequence 10-fold more samples compared to conventional methods and at a fraction of their cost. HiSpike has been validated in Israel, and has identified multiple spike variants from real-time field samples including Alpha, Beta, Delta and the emerging Omicron variants. HiSpike provides affordable sequencing options to help laboratories conserve resources for widespread high-throughput, near real-time monitoring of spike gene variants.Entities:
Keywords: HiSpike; NGS; Omicron variant; SARS-CoV-2; cost effective; high-throughput; spike variants; variants of concern
Year: 2022 PMID: 35087848 PMCID: PMC8787038 DOI: 10.3389/fmed.2021.798130
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Spike gene scheme with mutations occurring in variants of concern. The spike protein encoded by the spike gene is a large protein of 1,237 amino acids that contains key structural motifs. The figure shows the places of these motifs as well as the location of key amino acid changes that occur in important variants. In parentheses, the start and end of each motif. NTD, N terminal domain (light blue); RBD, Receptor Binding Domain (yellow); FP, Fusion Peptide (black); HR1 and HR2, Heptapeptide Repeat Sequence 1 and 2 (purple); TM, Trans Membrane (green); CD, Cytoplasmic domain (gray).
Figure 2HiSpike method illustration (A). HiSpike three-step protocol outline (B). HiSpike RT-PCR1 and PCR2 Illumina library preparation.
Figure 3Comparison between ARTIC and HiSpike methods. Spike gene sequences of 70 samples (with > 90% spike sequence coverage in both ARTIC and HiSpike methods) were compared using a heat-map. Gray shades between white and black indicate 0 to 10 SNPs respectively. HiSpike sequences were aligned based on a hierarchical clustering tree (shown to the right) vertically and the ARTIC sequences of the same samples were aligned horizontally to the HiSpike samples. Pairs of the same sample are represented in the diagonal line (outlined in red). Sample 11 (counting from the left) exhibits a single SNP difference between the HiSpike and Artic results; all the other samples show identical sequences.
Figure 4Phylogenetic tree based on spike gene sequences generated by the HiSpike method. Spike gene sequences of 441 samples generated by the HiSpike method were uploaded to the Nextclade platform. SARS-CoV-2 clades are illustrated by different colors and locations on the rectangular tree. The X axis indicates the number of mutations relative to the reference full viral genome. HiSpike sequences are represented by the full circles on a background of the Nextclade's global representative clade tree of December 8 2021.
Figure 5Spike sequence breadth and depth as a function of different sample Ct levels. Spike gene sequences of 396 samples were divided to 6 groups based on their Ct values (up to 15, 15–20, 20–25, 25–30, 30–35, and over 35). The number (n) of sample in each group is in brackets. (A). The percentage of spike gene coverage, with at least 5 reads, at different Ct levels is shown by Boxplots. The Boxplots represent the coverage breadth value distributions from the lower to the upper quartiles. The inner horizontal line indicates the median. The vertical whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range. More extreme data points are drawn as circles. (B). Sequence coverage depth along the spike gene. The average coverage depth was plotted, on a logarithmic scale, along the genome positions. Lines were smoothed by a sliding window of length 20. At coverage depth x5 the spike gene region encoding the receptor binding domain is marked with a red line and annotated (RBD and in brackets its amino acid location).