| Literature DB >> 35677393 |
Daniele Ramazzotti1, Davide Maspero2,3,4, Fabrizio Angaroni2, Silvia Spinelli1, Marco Antoniotti2,5, Rocco Piazza1,5, Alex Graudenzi2,3,5.
Abstract
A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically discarded after creating consensus sequences. Such analysis allows one to detect intra-host low-frequency mutations, which are a footprint of mutational processes underlying the origination of new variants. Their timely identification may improve public-health decision-making with respect to traditional approaches exploiting consensus sequences. We present the analysis of 220,788 high-quality deep sequencing SARS-CoV-2 samples, showing that many spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta, and Omicron, might have been intercepted several months in advance. Furthermore, we show that a refined genomic surveillance system leveraging deep sequencing data might allow one to pinpoint emerging mutation patterns, providing an automated data-driven support to virologists and epidemiologists.Entities:
Keywords: bioinformatics; genomic analysis; microbiology; virology
Year: 2022 PMID: 35677393 PMCID: PMC9162787 DOI: 10.1016/j.isci.2022.104487
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Hazardous SARS-CoV-2 variants
| WHO label | Pango lineage | Spike mutations of interest (SMoI) | ECDC category | CDC category | WHO category | Early detection | Refined surveillance |
|---|---|---|---|---|---|---|---|
| Alpha | B.1.1.7 and Q lineages | S:N501Y, S:D614G, S:P681H | DEV | VUM | VOC | ✓ | |
| Alpha+ | B.1.1.7 + S:L452R | S:L452R, S:N501Y, S:D614G, S:P681H | DEV | VUM | VOC | ✓ | |
| Alpha+ | B.1.1.7 + S:E484K | S:E484K, S:N501Y, S:D614G, S:P681H | DEV | VUM | VOC | ✓ | |
| Alpha+ | B.1.1.7 + S:S494P | S:S494P, S:N501Y, S:D614G, S:P681H | DEV | VUM | VOC | ✓ | |
| Beta | B.1.351 and descendent | S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V | VOC | VUM | VOC | ✓ | ✓ |
| Beta+ | B.1.351 + S:L18F | S:L18F, S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V | VOI | VUM | VOC | ✓ | ✓ |
| Beta+ | B.1.351 + S:P384L | S:P384L, S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V | VOI | VUM | VOC | ✓ | ✓ |
| Beta+ | B.1.351 + S:E516Q | S:K417N, S:E484K, S:N501Y, S:E516Q, S:D614G, S:A701V | VOI | VUM | VOC | ✓ | |
| Gamma | P.1 and descendent | S:K417T, S:E484K, S:N501Y, S:D614G, S:H655Y | VOC | VUM | VOC | ✓ | |
| Gamma+ | P.1.7 | S:K417T, S:E484K, S:N501Y, S:D614G, S:H655Y, S:P681H | VOI | VUM | VOC | ✓ | ✓ |
| Delta | B.1.617.2 | S:L452R, S:T478K, S:D614G, S:P681R | VOC | VOC | VOC | ✓ | ✓ |
| Delta+ | AY lineages | S:K417N, S:L452R, S:T478K, S:D614G, S:P681R | VOI | VOC | VOC | ✓ | |
| Delta+ | AY.34 | S:L452R, S:T478K, S:D614G, S:Q677H, S:P681R | VOI | VOC | VOC | ✓ | ✓ |
| Delta+ | B.1.617.2 + S:E484X | S:L452R, S:T478K, S:E484X, S:D614G, S:P681R | VOI | VOC | VOC | ✓ | |
| Delta+ | B.1.617.2 + S:Q613H | S:L452R, S:T478K, S:Q613H, S:D614G, S:P681R | VOI | VOC | VOC | ✓ | ✓ |
| Epsilon | B.1.427 and B.1.429 | S:L452R, S:D614G | DEV | VUM | VUM | ✓ | |
| Zeta | P.2 | S:E484K, S:D614G | DEV | VUM | – | ||
| Eta | B.1.525 | S:E484K, S:D614G, S:Q677H | DEV | VUM | VUM | ✓ | |
| Theta | P.3 | S:E484K, S:N501Y, S:D614G, S:P681H | DEV | – | – | ||
| Iota | B.1.526 | S:E484K, S:D614G, S:A701V | DEV | VUM | VUM | ✓ | |
| Kappa | B.1.617.1 | S:L452R, S:E484Q, S:D614G, S:P681R | DEV | VUM | VUM | ✓ | ✓ |
| Lambda | C.37 | S:L452Q, S:F490S, S:D614G | VOI | – | VOI | ✓ | |
| Mu | B.1.621 and B.1.621.1 | S:R346K, S:E484K, S:N501Y, S:D614G, S:P681H | VOI | VUM | VOI | ✓ | |
| Omicron | B.1.1.529 | S:K417N, S:S477N, S:T478K, S:N501Y, S:D614G, S:H655Y, S:N679K, S:P681H | VOC | VOC | VOC | ✓ | |
| – | A.23.1 | S:V367F, S:E484K, S:Q613H | DEV | – | – | ✓ | |
| – | A.27 | S:L452R, S:N501Y, S:A653V, S:H655Y | DEV | – | – | ✓ | |
| – | A.28 | S:E484K, S:N501T, S:H655Y | DEV | – | – | ✓ | |
| – | AY.4.2 | S:Y145H, S:A222V, S:L452R, S:T478K, S:D614G, S:P681R | VUM | – | – | ✓ | |
| – | AT.1 | S:E484K, S:D614G, S:N679K | DEV | – | – | ||
| – | AV.1 | S:N439K, S:E484K, S:D614G, S:P681H | DEV | – | – | ||
| – | B.1.1.318 | S:E484K, S:D614G, S:P681H | VUM | – | VUM | ✓ | |
| – | B.1.1.519 | S:T478K, S:D614G | DEV | – | VUM | ✓ | ✓ |
| – | B.1.1.523 | S:E484K, S:S494P, S:D614G | – | – | VUM | ||
| – | B.1.214.2 | S:Q414K, S:N450K, S:D614G | DEV | – | VUM | ✓ | ✓ |
| – | B.1.466.2 | S:N439K, S:D614G, S:P681R | – | – | VUM | ||
| – | B.1.616 | S:V483A, S:D614G, S:H655Y, S:G669S | DEV | – | – | ✓ | |
| – | B.1.617.3 | S:L452R, S:E484Q, S:D614G, S:P681R | DEV | VUM | – | ✓ | ✓ |
| – | B.1.619 | S:E484K, S:D614G | – | – | VUM | ||
| – | B.1.620 | S:S477N, S:E484K, S:D614G, S:P681H | DEV | – | VUM | ||
| – | B.1.630 | S:A222V, S:L452R, S:E484Q, S:D614G, S:H655Y | – | – | VUM | ✓ | |
| – | C.1.2 | S:D614G, S:E484K, S:H655Y, S:N501Y, S:N679K, S:Y449H | VUM | – | VUM | ✓ | |
| – | C.16 | S:L452R, S:D614G | DEV | – | – | ✓ | |
| – | C.36.3 | S:L452R, S:D614G, S:Q677H | VUM | – | VUM | ✓ | |
| – | R.1 | S:E484K, S:D614G | – | – | VUM |
List of SARS-CoV-2 variants of concern (VOC), of interest (VOI), under monitoring (VUM), and de-escalated variants (DEV), updated on October 26th, 2021, as from the categorization of World Health Organization (2021); European Centre for Disease Prevention and Control (2021); Centers for Disease Control and Prevention (2021). Omicron variant was added to the list even if designated as VOC on November 26th. Information on the WHO label, the constituting Pango lineages Rambaut et al. (2020), the associated spike mutations of interest (SMoI), and the institution-specific categories are shown. Variant labels marked with “+” include additional SMoIs with respect to the related upstream variant. In the last two columns, we report the variants for which either an early detection of the related SMoIs and/or a refined surveillance (via homoplasy analysis) is granted by exploiting deep sequencing data (see Results). Notice that A.23.1 and B.1.525 (Eta) are included in the list of so-called variants of note in the Cov-Lineages.org lineage report O’Toole et al. (2021a, 2021b).
Notice that, at the time of writing, no SMoIs were explicitly associated to the Omicron variant. Here, we indicate the S mutations present in such variant identified as SMoI in at least one of the remaining 43 variants included in the table, whereas for an updated characterization of Omicron and other variants we refer the reader to Hodcroft (2021).
Figure 1SARS-CoV-2 samples in GISAID and NCBI public repositories
Number of SARS-CoV-2 samples for which either deep sequencing data or consensus sequences are available, grouped by month in which the related dataset is released in the period January 2020–August 2021. Source databases are NCBI (National Center for Biotechnology Information, 2021) for deep sequencing data and GISAID (Shu and McCauley, 2017) for consensus sequences (update: August 2021).
Figure 2Early detection of 6 SMoIs associated to hazardous variants from deep sequencing data
Analysis of SMoIs: S:L18F, S:Q414K, S:L452R, S:T478K, S:H655Y, and S:A701V (see Table 1). Circles with purple borders mark the first month in which the mutation was detected as minor (MF and ) in at least 5 samples, while been still undetected as fixed (MF ); circles with blue borders mark the month in which the mutation was first detected as fixed in at least 1 sample; red lines highlight the anticipation (when months). The analysis is performed by splitting the samples in the 6 distinct geographical regions and by reporting the corresponding results at the global scale. All circles contain a pie-chart that displays the ratio of samples showing that mutation either as minor or as fixed in that month (further details are provided in the main text). For each SMoI the related variants are also reported.
Figure 3Mutant frequency and prevalence variation in time of SMoIs S:L452R and S:H655Y
The leftmost panels return the distribution of the mutation frequency (MF) of all samples with SMoIs S:L452R (upper panels) and S:H655Y (lower), grouped by month and geographical region. Each cell shows the proportion of samples showing the mutation with that specific MF. The rightmost panels show the number of samples showing the mutations either as minor (MF and ) or as fixed (MF . The lineages associated to both variants are also displayed.
Figure 4Early detection of 6 S mutations not associated to known variants
Analysis of 6 S mutations originally detected as minor (in at least 5 samples) and only successively as fixed at the global scale, namely, S:W152C, S:S297L, S:C361S, S:G446V, S:A570D, and S:T791K. For further details, please refer to the caption of Figure 2. S mutations first detected as minor at the local scale are shown in Figure S1 in the supplementary information.
Figure 5Early detection of N mutations
Analysis of NMoI N:D377Y and of the three highly diffused N mutations originally detected as minor (in at least 5 samples) and only successively as fixed at the global scale, namely, N:L219F, N:A254S, and N:A254V. For further details please refer to the caption of Figure 2. N mutations first detected as minor at the local scale are shown in Figure S2 in the supplementary information.
Figure 6Analysis of homoplastic minor variants
(A–D) The heatmaps show the prevalence (i.e., number of samples over the total) of the SMoIs (panel A), additional highly diffused S mutations (B), the NMoIs (C), and the additional highly diffused N mutations (D) retrieved as minor (MF and ) in the samples associated to the variants of Table 1 via Pangolin (O’Toole et al., 2021a, 2021b). Only the mutations observed in at least of the samples of any variant are shown.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Software and algorithms | ||
| BWA-MEM 0.7.17-r1188 | Li, Heng, and Richard Durbin. “Fast and accurate short read alignment with Burrows–Wheeler transform.” Bioinformatics 25.14 (2009): 1754–1760. | |
| Samtools 1.10 | Li, Heng. “Improving SNP discovery by base alignment quality.” Bioinformatics 27.8 (2011): 1157–1158. | |
| iVar 1.3.1 | Grubaugh, Nathan D. et al. “An amplicon-based sequencing framework for accurately measuring intra-host virus diversity using PrimalSeq and iVar.” Genome biology 20.1 (2019): 1–19. | |
| The R Project for Statistical Computing | Team, R. Core. “R: A language and environment for statistical computing.” (2013): 201. | |
| Custom code to replicate the analyses presented in the text. | This paper. | |