| Literature DB >> 34541432 |
Shay Leary1, Silvana Gaudieri1,2,3, Matthew D Parker4, Abha Chopra1, Ian James1, Suman Pakala3, Eric Alves2, Mina John1,5, Benjamin B Lindsey6,7, Alexander J Keeley6,7, Sarah L Rowland-Jones6,7, Maurice S Swanson8, David A Ostrov9, Jodi L Bubenik8, Suman R Das3, John Sidney10, Alessandro Sette10,11, Thushan I de Silva6,7, Elizabeth Phillips1,3, Simon Mallal1,3.
Abstract
BACKGROUND: Genetic variations across the SARS-CoV-2 genome may influence transmissibility of the virus and the host's anti-viral immune response, in turn affecting the frequency of variants over time. In this study, we examined the adjacent amino acid polymorphisms in the nucleocapsid (R203K/G204R) of SARS-CoV-2 that arose on the background of the spike D614G change and describe how strains harboring these changes became dominant circulating strains globally.Entities:
Keywords: COVID-19; SARS-CoV-2; homologous recombination; sub-genomic RNA transcript; transcription-regulating sequence; viral polymorphism
Year: 2021 PMID: 34541432 PMCID: PMC8439434 DOI: 10.20411/pai.v6i2.460
Source DB: PubMed Journal: Pathog Immun ISSN: 2469-2964
Figure 1.Proportion of weekly deposited SARS-CoV-2 sequences globally (n=455774). The D614G (B.1) variant has become one of the dominant forms globally between January 2020 and January 2021. Note a small proportion of deposited sequences did not include information regarding specific collection date and as such were excluded.
Figure 2.Proportion of weekly deposited SARS-CoV-2 sequences by region. The proportion of R203/G204 to K203/R204 sub-variants of the D614G variant differs in different regions between January 2020 and January 2021 with recent increases in the frequency of new variants.
Figure 3.The configuration of canonical sgRNAs and the novel non-canonical nucleocapsid sgRNA (N*) in SARS-CoV-2. The bottom bar illustrates the presence of the leader sequence (blue text) followed by the transcription-regulating sequence (TRS; red text) within the genomic sequence that continues into the first ORF 1a. The presence of other canonical sgRNA transcripts in which the leader sequence and TRS precede the start codon (methionine; pink) of the other proteins are shown. The presence of the novel non-canonical sgRNA transcript containing the K203/R204 polymorphisms (N*) is shown. The ARTIC primer locations and resultant amplicons are shown.
Figure 4.Exploration of sgRNAs in 981 samples from Sheffield, United Kingdom. A. A heatmap showing presence or absence of sgRNAs from different ORFs. K203/R204 (KR)-containing sequences have evidence of the novel truncated N ORF sgRNA (N*, red, 233/553, 42%). An ORF sgRNA was deemed present if we could find >=1 read in support. Heatmap is ordered by the presence or absence of the novel sgRNA. There were a total of 448 R203/G204 (RG)-containing sequences and 1 had evidence of a novel sgRNA (likely false positive, Figure S2). B. Significantly higher (Mann-Whitney U P < 2.2e-16) total sgRNA in KR-containing compared to RG-containing sequences. C. Sub-genomic RNA is significantly increased in KR-containing compared to RG-containing sequences for a number of ORFs, most notably nucleocapsid (N; Mann-Whitney U P = 2.06e-37 corrected for multiple testing using the Holm method). Y-axis denotes square root transformed sub-genomic reads normalized to 100,000 genomic reads from the same ARTIC amplicon. D. There is no difference in genomic RNA levels (normalized to total mapped reads) between KR- and RG-containing sequences. *novel sgRNA, ORF10, and ORF1a are excluded from this analysis due to ORF10 not being expressed, difficulty in discriminating ORF1a sgRNA from genomic RNA, and the novel truncated N sgRNA only being present in KR-containing sequences. *** < 0.001, ** < 0.01, * < 0.05. All P values shown are following correction for multiple testing with the Holm method.
Figure 5.Spike 614 and Nucleocapsid 203/204 Status, Diagnostic Metrics, and level of sub-genomic and genomic RNA. A. E gene cycle threshold (CT) normalized to RNase P CT stratified by variant status in N = 478 individuals from Sheffield dataset with day of symptom onset data available. This normalization was done to combine and display E gene CT data from 2 different extraction protocols. Y-axis reversed to aid interpretation, as lower normalized CT values equal higher virus levels. B. Normalized E gene CT vs the day of sampling from day of symptom onset. P values provided are from a generalized multivariable linear regression model (GLM) for the difference in normalized E gene CT value between samples containing each variant, with extraction method and day of illness included in the model (Table S6). C. Normalized (per 1000 genomic reads) sgRNA levels for ORFs S and N. D. Normalized (per 100,000 mapped reads) genomic RNA levels for ORFs S and N.