| Literature DB >> 36074769 |
Ernest R Chan1,2, Lucas D Jones3, Marlin Linger4, Jeffrey D Kovach2,4, Maria M Torres-Teran5, Audric Wertz6, Curtis J Donskey3,7, Peter A Zimmerman4,8.
Abstract
SARS-CoV-2 whole genome sequencing has played an important role in documenting the emergence of polymorphisms in the viral genome and its continuing evolution during the COVID-19 pandemic. Here we present data from over 360 patients to characterize the complex sequence diversity of individual infections identified during multiple variant surges (e.g., Alpha and Delta). Across our survey, we observed significantly increasing SARS-CoV-2 sequence diversity during the pandemic and frequent occurrence of multiple biallelic sequence polymorphisms in all infections. This sequence polymorphism shows that SARS-CoV-2 infections are heterogeneous mixtures. Convention for reporting microbial pathogens guides investigators to report a majority consensus sequence. In our study, we found that this approach would under-report sequence variation in all samples tested. As we find that this sequence heterogeneity is efficiently transmitted from donors to recipients, our findings illustrate that infection complexity must be monitored and reported more completely to understand SARS-CoV-2 infection and transmission dynamics. Many of the nucleotide changes that would not be reported in a majority consensus sequence have now been observed as lineage defining SNPs in Omicron BA.1 and/or BA.2 variants. This suggests that minority alleles in earlier SARS-CoV-2 infections may play an important role in the continuing evolution of new variants of concern.Entities:
Mesh:
Year: 2022 PMID: 36074769 PMCID: PMC9455841 DOI: 10.1371/journal.pgen.1010200
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 6.020
Fig 4Summary of the Distribution of SARS-CoV-2 Sequence Variation and Heterogeneity.
Counts of samples observed to show variation (among the 140 meeting analytical threshold) across the SARS-CoV-2 genome. Black histogram components include those positions where biallelic variation was observed at or below 50% and may not have been included in reported consensus sequences. Red histogram components identify positions and occurrences where the alternate allele was observed at proportions from 51–100% and would have been included in the reported consensus sequence. The multicolored bar below the graph identifies genomic segments encoding different SARS-CoV-2 genes designated in the legend (ORF–Open Reading Frame; S–Spike; E–Envelope; M = Membrane; N–Nucleocapsid). Quantitative details summarized in Fig 4 are available in S2A to S2B Table.
Fig 2Assessment of Alternate and Reference Allele proportions (AAF:RAF) Across 140 Infections.
Box and whisker plots illustrate the spread of RAF:AAF proportions for each mutation (median (bar); 25th and 75th percentiles (bottom and top of box, respectively); 10th and 90th percentiles (bottom and top whiskers, respectively)). The red dashed line at 50% identifies the demarcation at which the alternate allele is recognized and reported as the conventional consensus nucleotide that characterizes the infecting virus. These 75 mutations were chosen for further analysis because they all showed evidence of biallelic states at greater than the 5% background threshold and were shared in more than 10 infections (blue dashed line). Among these varying positions, Illumina sequence reads for 30 mutations did not reach an AAF:RAF >50% for any of the 140 infections (shaded gray in aligned table); 45 positions exceeded the RAF:AAF 50% threshold for a portion of the 140 infections and would therefore have contributed to characterization of the infecting viral strain. The accompanying table enumerates the number of samples with AAF between 5 and 95%, <50% and >50%. Overall, among the 5,390 alternate alleles detected at a frequency >5%, just 975 (18%) were observed at >50% frequency necessary to be reported in consensus sequence. Nucleotide positions characterized by biallelic iSNVs in our VA study population and the samples queried in the sequence read archive (SRA) (S1 Fig) are designated by a green dot.
Assessment of SNVs at Time Quartiles of the COVID-19 Pandemic.
| Quartile | SNP Avg (AAF>5%) | Min (>5%) | Max (>5%) | SNP Avg (AAF>50%) | Min (>50%) | Max (>50%) |
|---|---|---|---|---|---|---|
| 8/5/20 | 37.74 | 12 | 63 | 10.89 | 7 | 17 |
| 12/9/20 | 48.68 | 34 | 60 | 16.16 | 9 | 24 |
| 4/14/21 | 59.91 | 36 | 80 | 24.45 | 15 | 41 |
| 8/18/21 | 72.33 | 40 | 132 | 33.76 | 8 | 46 |