| Literature DB >> 35677774 |
Priya Ramarao-Milne1, Yatish Jain1,2, Letitia M F Sng1, Brendan Hosking1, Carol Lee1, Arash Bayat3, Michael Kuiper4, Laurence O W Wilson1,2, Natalie A Twine1,2, Denis C Bauer1,2,5.
Abstract
New SARS-CoV-2 variants emerge as part of the virus' adaptation to the human host. The Health Organizations are monitoring newly emerging variants with suspected impact on disease or vaccination efficacy as Variants Being Monitored (VBM), like Delta and Omicron. Genetic changes (SNVs) compared to the Wuhan variant characterize VBMs with current emphasis on the spike protein and lineage markers. However, monitoring VBMs in such a way might miss SNVs with functional effect on disease. Here we introduce a lineage-agnostic genome-wide approach to identify SNVs associated with disease. We curated a case-control dataset of 10,520 samples and identified 117 SNVs significantly associated with adverse patient outcome. While 40% (47) SNV are already monitored and 36% (43) are in the spike protein, we also identified 70 new SNVs that are associated with disease outcome. 31 of these are disease-worsening and predominantly located in the 3'-5' exonuclease (NSP14) with structural modelling revealing a concise cluster in the Zn binding domain that has known host-immune modulating function. Furthermore, we generate clade-independent VBM groupings by identifying interacting SNVs (epistasis). We find 37 sets of higher-order epistatic interactions joining 5 genomic regions (nsp3, nsp14, Spike S1, ORF3a, N). Structural modelling of these regions provides insights into potential mechanistic pathways of increased virulence as well as orthogonal methods of validation. Clade-independent monitoring of functionally interacting (epistasis, co-evolution) SNVs detected emerging VBM a week before they were flagged by Health Organizations and in conjunction with structural modelling provides faster, mechanistic insight into emerging strains to guide public health interventions.Entities:
Keywords: AlphaFold; COVID-19; Case-control; GWAS; Nsp14; SARS-CoV-2
Year: 2022 PMID: 35677774 PMCID: PMC9162986 DOI: 10.1016/j.csbj.2022.06.005
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Summary of sample inclusion into the case-control dataset.
| Patient status annotated as ‘Unknown’ | Removed | 3,312,914 |
| Ambiguous annotations that cannot be associated with better or worse disease outcome including, ‘Live’, ‘Hospitalized’, ‘Outpatient’, ‘Symptomatic’, ‘Released’, ‘Ambulatory’, ‘Inpatient’, ‘other’. | Removed | 120,429 |
| Unannotated (missing patient status) | Removed | 27,939 |
| ‘Deceased’, ‘Severe’, ‘Critical’, ‘Dead’, ‘Post-mortem’, ‘Death’ and ‘ICU’. | Kept – Cases | 3,639 |
| ‘Asymptomatic’, ‘Mild’, ‘Mild clinical signs without hospitalisation’, and ‘Recovered’ | Kept – Controls | 7,157 |
| Total Samples Removed | ||
| Total Samples Kept | ||
Fig. 1Results from association analysis. A) Manhattan plot of VariantSpark gini importance scores with 10,520 case/control data. 100 bp are removed on each end. Mutations associated with current and previous variants being monitored (VBM) are labelled and coloured while mutations which are not currently associated with a VBM are grey and unlabelled. Red dot with yellow border represents hit from a previous GWAS study (Hahn et al., 2021). B) SARS-CoV-2 genome and regions corresponding to protein regions. Protein regions coloured in dark red correspond to protein regions with significant clusters of mutations (from Fig. 1A). Protein regions highlighted in blue represent regions involved in putative highly associative 4-SNV interactions. Inset represents AlphaFold prediction and location of amino acid residues corresponding to the nsp14 mutation cluster identified by VariantSpark. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
VariantSpark predicted 31 novel variants associated with worse disease outcome.
| Locus | REF | ALT | p-value | Gene | Consequence | Product |
|---|---|---|---|---|---|---|
| 19,276 | G | N | 4.75E-07 | NSP14 | 'G413S', 'G413R', 'G413C' | 3′-to-5′ exonuclease |
| 19,277 | G | N | 9.08E + 00 | NSP14 | 'G413D', 'G413A', 'G413V' | 3′-to-5′ exonuclease |
| 19,278 | T | N | 9.14E + 00 | NSP14 | 'G413G', 'G413G', 'G413G' | 3′-to-5′ exonuclease |
| 19,279 | T | N | 9.03E + 00 | NSP14 | 'C414S', 'C414R', 'C414G' | 3′-to-5′ exonuclease |
| 19,280 | G | N | 2.34E-06 | NSP14 | 'C414Y', 'C414S', 'C414F' | 3′-to-5′ exonuclease |
| 19,281 | T | N | 2.10E-06 | NSP14 | 'C414*', 'C414C', 'C414W' | 3′-to-5′ exonuclease |
| 19,282 | G | N | 1.54E-06 | NSP14 | 'D415N', 'D415H', 'D415Y' | 3′-to-5′ exonuclease |
| 19,283 | A | N | 4.36E-07 | NSP14 | 'D415A', 'D415G', 'D415V' | 3′-to-5′ exonuclease |
| 19,284 | T | N | 5.29E-07 | NSP14 | 'D415E', 'D415D', 'D415E' | 3′-to-5′ exonuclease |
| 19,285 | G | N | 2.98E-07 | NSP14 | 'G416S', 'G416R', 'G416C' | 3′-to-5′ exonuclease |
| 19,286 | G | N | 1.34E-06 | NSP14 | 'G416D', 'G416A', 'G416V' | 3′-to-5′ exonuclease |
| 19,287 | T | N | 5.62E-06 | NSP14 | 'G416G', 'G416G', 'G416G' | 3′-to-5′ exonuclease |
| 19,288 | G | N | 9.62E-06 | NSP14 | 'G417S', 'G417R', 'G417C' | 3′-to-5′ exonuclease |
| 20,800 | A | N | 6.14E-05 | NSP16 | 'T48P', 'T48A', 'T48S' | 2′-O-ribose methyltransferase |
| 20,801 | C | N | 5.22E-05 | NSP16 | 'T48N', 'T48S', 'T48I' | 2′-O-ribose methyltransferase |
| 20,802 | T | N | 6.71E-05 | NSP16 | 'T48T', 'T48T', 'T48T' | 2′-O-ribose methyltransferase |
| 20,803 | C | N | 6.68E-05 | NSP16 | 'Q49K', 'Q49E', 'Q49*' | 2′-O-ribose methyltransferase |
| 20,804 | A | N | 6.85E-05 | NSP16 | 'Q49P', 'Q49R', 'Q49L' | 2′-O-ribose methyltransferase |
| 20,805 | A | N | 6.46E-05 | NSP16 | 'Q49H', 'Q49Q', 'Q49H' | 2′-O-ribose methyltransferase |
| 20,809 | T | N | 2.21E-05 | NSP16 | 'C51S', 'C51R', 'C51G' | 2′-O-ribose methyltransferase |
| 20,810 | G | N | 2.16E-05 | NSP16 | 'C51Y', 'C51S', 'C51F' | 2′-O-ribose methyltransferase |
| 20,811 | T | N | 2.25E-05 | NSP16 | 'C51*', 'C51C', 'C51W' | 2′-O-ribose methyltransferase |
| 20,812 | C | N | 2.31E-05 | NSP16 | 'Q52K', 'Q52E', 'Q52*' | 2′-O-ribose methyltransferase |
| 20,813 | A | N | 2.18E-05 | NSP16 | 'Q52P', 'Q52R', 'Q52L' | 2′-O-ribose methyltransferase |
| 26,492 | A | T | 5.93E-05 | Between E and M region | ||
| 27,512 | A | N | 1.03E-04 | ORF7a | 'Y40S', 'Y40C', 'Y40F' | Accessory protein |
| 27,513 | C | N | 1.02E-04 | ORF7a | 'Y40*', 'Y40*', 'Y40Y' | Accessory protein |
| 27,514 | G | N | 1.06E-04 | ORF7a | 'E41K', 'E41Q', 'E41*' | Accessory protein |
| 27,516 | G | N | 7.21E-05 | ORF7a | 'E41E', 'E41D', 'E41D' | Accessory protein |
| 28,272 | A | T | 4.13E-06 | Between ORF8 and N region | ||
| 29,782 | A | * | 8.62E-05 | N/A | ||
Fig. 2Network of 4-SNV combinations showing highly associative interactions. Coloured nodes indicate SNVs found in VBM. Size of node is proportional to the frequency at which that SNV is involved in highly associated 4-SNV interactions.
Fig. 3Structural analysis of protein models. Alphafold models (verified with crystallographic data where possible). A) 3′-5′ exonuclease (nsp14) (cyan and blue) complexed with nsp10 (yellow) showing relative positions of 413:417 cluster in the N7-MTase domain to Zn binding residues and other ion binding sites. B) Close up of the 413–417 cluster in nsp14 showing proximity of Zn binding domain. C) Structure of nsp10/nsp16 complex (from pdb; 6W4H and Alphafold models) showing nsp16 mutational cluster (T48, Q49, C51, Q52) and its proximity to nsp10 binding, in particular with residue Leu45 form nsp10. D) Predicted Alphafold model of Orf7a accessory protein showing putative mutation sites Y40 and E41. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)