| Literature DB >> 35996593 |
S A J Wilkinson1, Alex Richter2, Anna Casey1, Husam Osman3, Jeremy D Mirza1, Joanne Stockton1, Josh Quick1, Liz Ratcliffe3, Natalie Sparks1, Nicola Cumley1, Radoslaw Poplawski1, Samuel N Nicholls1, Beatrix Kele4, Kathryn Harris4, Thomas P Peacock5, Nicholas J Loman1.
Abstract
Long-term severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections in immunodeficient patients are an important source of variation for the virus but are understudied. Many case studies have been published which describe one or a small number of long-term infected individuals but no study has combined these sequences into a cohesive dataset. This work aims to rectify this and study the genomics of this patient group through a combination of literature searches as well as identifying new case series directly from the COVID-19 Genomics UK (COG-UK) dataset. The spike gene receptor-binding domain and N-terminal domain (NTD) were identified as mutation hotspots. Numerous mutations associated with variants of concern were observed to emerge recurrently. Additionally a mutation in the envelope gene, T30I was determined to be the second most frequent recurrently occurring mutation arising in persistent infections. A high proportion of recurrent mutations in immunodeficient individuals are associated with ACE2 affinity, immune escape, or viral packaging optimisation. There is an apparent selective pressure for mutations that aid cell-cell transmission within the host or persistence which are often different from mutations that aid inter-host transmission, although the fact that multiple recurrent de novo mutations are considered defining for variants of concern strongly indicates that this potential source of novel variants should not be discounted.Entities:
Keywords: SARS-CoV-2; convergent evolution; genomics; immunodeficiency; persistent infection; variant emergence
Year: 2022 PMID: 35996593 PMCID: PMC9384748 DOI: 10.1093/ve/veac050
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 2.Cumulative occurrences of non-synonymous recurrent de novo mutations in S-gene divided by gene domain in 168 genomes obtained from twenty-eight patients. Substitution mutations were clustered by amino acid loci, this is notated with the International Union of Pure and Applied Chemistry (IUPAC) ambiguity code X to indicate any possible amino acid, lines for cumulative sites are dashed for easier differentiation. Only loci that were notable when clustered (significant difference with non-clustered equivalent or loci not highlighted without clustering) were included in the figure. Mutations were observed in the following domains: NTD, receptor-binding domain (RBD), and the SP (Xia 2021). Deletions (Δ) were clustered within a window of six amino acids (AA) regardless of length or position of deletion; full details of the breakdown can be found at https://github.com/BioWilko/recurrent-sars-cov-2-mutations/blob/main/dataset/mutation_calls.csv. The first genome from each patient was considered to be day 0. The sampling periods and frequencies within the dataset were highly variable, 218 days was the longest time period covered within the dataset but the majority were much shorter, the full details of the dataset are available in Supplementary Table S1. All recurrent de novo mutations were labelled on the graph.
Figure 3.Cumulative occurrences of non-synonymous recurrent DNMs in genes other than S or ORF1ab subdivided by gene in 168 genomes obtained from 28 patients. Recurrent DNMs were observed in E (encodes envelope protein) and M (encodes membrane glycoprotein) genes, the full details of the gene definitions used are available from (Wu et al. 2020). The first genome from each patient was considered to be day 0. The sampling periods and frequencies within the dataset were highly variable, 218 days was the longest time period covered within the dataset but the majority were much shorter, the full details of the dataset are available in Supplementary Table S1. All recurrent DNMs were labelled on-graph.
Figure 4.Cumulative occurrences of non-synonymous recurrent DNMs in ORF1ab polyprotein subdivided by gene in 168 genomes obtained from 28 patients. The first genome from each patient was considered to be day 0. The sampling periods and frequencies within the dataset was highly variable, 218 days was the longest time period covered within the dataset but the majority were much shorter, the full details of the dataset are available in Supplementary Table S1. All recurrent DNMs were labelled on-graph.
DNM occurrence frequencies for all recurrent DNMs in this analysis and the COG-UK dataset (n = 1,576,942). COG-UK dataset figures were generated using the dataset as it existed on 7 December 2021. Data was generated via CLIMB-Covid (Nicholls et al. 2021). The COG-UK dataset was used due to the quality of metadata available as a background dataset as well as programmatic access to variant information through existing CLIMB-COVID tools.
| DNM annotation | Frequency in DNM occurrence analysis | Frequency in COG-UK dataset | Percentage of genome series in which DNM occurred | Percentage of genomes in COG-UK with DNM |
|---|---|---|---|---|
| S:E484K | 8 | 3,437 | 28.57% | 0.2180% |
| E:T30I | 6 | 208 | 21.42% | 0.0132% |
| M:H125Y | 4 | 2,188 | 14.29% | 0.1387% |
| S:Δ138 region | 4 | 283,289 | 14.29% | 17.9645% |
| NSP4:T295I | 3 | 1,933 | 10.71% | 0.1226% |
| S:Q493K | 3 | 59 | 10.71% | 0.0037% |
| S:Δ67 region | 2 | 292,969 | 7.14% | 18.5783% |
| S:S13I | 2 | 211 | 7.14% | 0.0134% |
| NSP12:V792I | 2 | 10 | 7.14% | 0.0006% |
| NSP3:P822L | 2 | 28,410 | 7.14% | 1.8016% |
| NSP3:T820I | 2 | 442 | 7.14% | 0.0280% |
| NSP3:T504P | 2 | 18 | 7.14% | 0.0011% |
| S:L452R | 2 | 1,010,866 | 7.14% | 64.1029% |
| S:Q498R | 2 | 225 | 7.14% | 0.0143% |
| S:E484G | 2 | 46 | 7.14% | 0.0029% |
| S:Δ243 region | 2 | 546 | 7.14% | 0.0346% |
| S:F486I | 2 | 6 | 7.14% | 0.0004% |
| S:G142V | 2 | 1,361 | 7.14% | 0.0863% |
| S:T95I | 2 | 682,286 | 7.14% | 43.2664% |
| NSP3:K977Q | 2 | 391 | 7.14% | 0.0248% |
| S:F490L | 2 | 463 | 7.14% | 0.0294% |
Recurrent mutations which are variant defining based upon United Kingdom Health Security Agency (UKHSA) variant definitions. Variant definitions were parsed from the UKHSA variant definition files available at: https://github.com/phe-genomics/variant_definitions. Lineages were called using pangolin (O’Toole et al. 2021b).
| Mutation annotation | Pango lineage | UKHSA label | WHO label |
|---|---|---|---|
| NSP3:K977Q | P.1 | VOC-21JAN-02 | Gamma |
| NSP3:P822L | AV.1 | VUI-21MAY-01 | n/a |
| S:E484K | B.1.351 | VOC-20DEC-02 | Beta |
| S:E484K | B.1.525 | VUI-21FEB-03 | Eta |
| S:E484K | P.1 | VOC-21JAN-02 | Gamma |
| S:E484K | A.23.1 | VUI-21FEB-01 | n/a |
| S:E484K | AV.1 | VUI-21MAY-01 | n/a |
| S:E484K | B.1.1.318 | VUI-21FEB-04 | n/a |
| S:E484K | B.1.1.7 (with E484K) | VOC-21FEB-02 | n/a |
| S:E484K | B.1.324.1 | VUI-21MAR-01 | n/a |
| S:E484K | P.3 | VUI-21MAR-02 | Theta |
| S:E484K | P.2 | VUI-21JAN-01 | Zeta |
| S:E484K | B.1.621 | VUI-21JUL-01 | n/a |
| S:L452R | B.1.617.2 | VOC-21APR-02 | Delta |
| S:L452R | B.1.617.1 | VUI-21APR-01 | Kappa |
| S:L452R | B.1.617.3 | VUI-21APR-03 | n/a |
| S:L452R | C.36.3 | VUI-21MAY-02 | n/a |
| S:Q498R | BA.1 | VOC-21NOV-01 | Omicron |
| S:T95I | AV.1 | VUI-21MAY-01 | n/a |
| S:T95I | B.1.1.318 | VUI-21FEB-04 | n/a |
| S:T95I | B.1.621 | VUI-21JUL-01 | n/a |
| S:Δ67 region/RDR1 | B.1.1.7 | VOC-20DEC-01 | Alpha |
| S:Δ138 region/RDR2 | B.1.1.7 | VOC-20DEC-01 | Alpha |
| S:Δ138 region/RDR2 | AV.1 | VUI-21MAY-01 | n/a |
| S:Δ138 region/RDR2 | B.1.1.318 | VUI-21FEB-04 | n/a |
| S:Δ243 region/RDR4 | C.37 | VUI-21JUN-01 | Lambda |
Figure 1.Distribution of de novo mutations included in this study across the entire SARS-CoV-2 genome. Schematic of SARS-CoV-2 genome with relevant ORFs annotated. DNMs with the highest frequency annotated by amino acid position and substitutions—X indicates multiple amino acids form DNMs at this position.
Figure 5.Spike mutational profiles of particular interest described by this study. Select spikes from late sequencing of three long-term Alpha infections shown as Spike schematics. Spike variants from WT Alpha, Delta, and BA.1 Omicron shown for comparison. Mutations shown in grey are existing lineage-defining Alpha mutations. Mutations marked with an asterisk indicate mixed, but resolvable bases in the sequence.