| Literature DB >> 35075180 |
Lue Ping Zhao1, Pavitra Roychoudhury2,3, Peter Gilbert4,2, Joshua Schiffer5, Terry P Lybrand6,7, Thomas H Payne8, April Randhawa2, Sara Thiebaud2, Margaret Mills2, Alex Greninger2, Chul-Woo Pyo5,9, Ruihan Wang5,9, Renyu Li5, Alexander Thomas5, Brandon Norris5,9, Wyatt C Nelson5,9, Keith R Jerome2,3, Daniel E Geraghty10,11.
Abstract
SARS-CoV-2 is spreading worldwide with continuously evolving variants, some of which occur in the Spike protein and appear to increase viral transmissibility. However, variants that cause severe COVID-19 or lead to other breakthroughs have not been well characterized. To discover such viral variants, we assembled a cohort of 683 COVID-19 patients; 388 inpatients ("cases") and 295 outpatients ("controls") from April to August 2020 using electronically captured COVID test request forms and sequenced their viral genomes. To improve the analytical power, we accessed 7137 viral sequences in Washington State to filter out viral single nucleotide variants (SNVs) that did not have significant expansions over the collection period. Applying this filter led to the identification of 53 SNVs that were statistically significant, of which 13 SNVs each had 3 or more variant copies in the discovery cohort. Correlating these selected SNVs with case/control status, eight SNVs were found to significantly associate with inpatient status (q-values < 0.01). Using temporal synchrony, we identified a four SNV-haplotype (t19839-g28881-g28882-g28883) that was significantly associated with case/control status (Fisher's exact p = 2.84 × 10-11). This haplotype appeared in April 2020, peaked in June, and persisted into January 2021. The association was replicated (OR = 5.46, p-value = 4.71 × 10-12) in an independent cohort of 964 COVID-19 patients (June 1, 2020 to March 31, 2021). The haplotype included a synonymous change N73N in endoRNase, and three non-synonymous changes coding residues R203K, R203S and G204R in the nucleocapsid protein. This discovery points to the potential functional role of the nucleocapsid protein in triggering "cytokine storms" and severe COVID-19 that led to hospitalization. The study further emphasizes a need for tracking and analyzing viral sequences in correlations with clinical status.Entities:
Mesh:
Year: 2022 PMID: 35075180 PMCID: PMC8786941 DOI: 10.1038/s41598-021-04376-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Results from analyzing 7137 viral genomes sequenced by laboratories in Washington state and deposited to GISAID. (A) Results from counting mutational numbers per nucleotide throughout the viral genome. Upper arrow indicates observed counts greater than 300. The viral genome is annotated with gene designations immediately below. (B) Computed q-values and maximum values of variant proportions in November 2020, December 2020, and January 2021, obtained from fitting generalized linear models to all individual SNVs. SNVs exceeding established threshold q-value and maximum proportions are highlighted in red (upper right corner). (C) Eight selected SNVs with significant and substantial temporalities are mapped using their locally averaged variant proportions over time from fitted generalized linear models (color key upper left).
Descriptive statistics of 683 and 964 participating patients in, respectively, the discovery and replication case–control studies.
| Variable | Description | Discovery (n = 295 + 388) | Replication (n = 476 + 488) | ||||
|---|---|---|---|---|---|---|---|
| OUT | IN | OUT | IN | ||||
| Sex | Female | 170 (57.63) | 184 (47.42) | 1.69E−02 | 216 (45.38) | 231 (47.34) | 2.39E−26 |
| Male | 124 (42.03) | 200 (51.55) | 177 (37.18) | 256 (52.46) | |||
| UNK | 1 (0.34) | 4 (1.03) | 83 (17.44) | 1 (0.20) | |||
| Age (year) | 1- | 9 (3.06) | 29 (7.47) | 9.53E−04 | 112 (23.53) | 73 (14.96) | 1.15E−06 |
| 20- | 112 (38.1) | 132 (34.02) | 213 (44.75) | 180 (36.89) | |||
| 40- | 139 (47.28) | 151 (38.92) | 105 (22.06) | 148 (30.33) | |||
| 60–100 | 34 (11.56) | 76 (19.59) | 45 (9.45) | 86 (17.62) | |||
| UNK | 1 (0.34) | 1 (0.21) | 1 (0.20) | ||||
| Collection | March | 167 (56.61) | 178 (45.88) | 5.00E−04 | |||
| Month | April | 113 (38.31) | 72 (18.56) | ||||
| May | 12 (4.07) | 24 (6.19) | |||||
| June | 3 (1.02) | 14 (3.61) | 179 (37.61) | 111 (22.75) | 5.00E−06 | ||
| July | 96 (24.74) | 24 (5.04) | 29 (5.94) | ||||
| August | 4 (1.03) | 104 (21.85) | 137 (28.07) | ||||
| September | 21 (4.41) | 16 (3.28) | |||||
| October | 27 (5.67) | 14 (2.87) | |||||
| November | 57 (11.97) | 6 (1.23) | |||||
| December | 4 (0.84) | 10 (2.05) | |||||
| January | 12 (2.52) | 109 (22.34) | |||||
| February | 48 (10.08) | 56 (11.48) | |||||
Association results of selected SNVs (with at least 10 mutations) with hospitalization status (inpatient vs outpatient) in a case–control study of 683 Covid 19 cases: frequencies of wildtypes/mutations among outpatients and inaptients, estimated coefficients, standard errors, p-values, q-values, corresponding residue (if SNV is in the coding region), indicators for urgent SNVs, and corresponding genes.
| ID | SNV | Wild/mutant | Unadjusted analysis | Adjusted analysis* | Residue | Genes | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OUT | IN | Coef | SE | Z | q | Coef | SE | Z | q | ||||||
| 1 | c241 | 88/207 | 101/287 | 0.19 | 0.17 | 1.10 | 2.72E−01 | 3.93E−01 | 0.22 | 0.17 | 1.28 | 1.99E−01 | 2.93E−01 | ||
| 2 | c1059 | 102/193 | 207/181 | − 0.77 | 0.16 | − 4.85 | 1.24E−06 | − 0.79 | 0.16 | − 4.94 | 7.98E−07 | T85I | nsp2 | ||
| 3 | c3037 | 86/209 | 101/287 | 0.16 | 0.17 | 0.91 | 3.65E−01 | 4.74E−01 | 0.20 | 0.18 | 1.12 | 2.62E−01 | 3.41E−01 | F106F | nsp2 |
| 4 | c14408 | 85/210 | 101/287 | 0.14 | 0.17 | 0.81 | 4.19E−01 | 4.95E−01 | 0.18 | 0.18 | 1.00 | 3.19E−01 | 3.77E−01 | ||
| 5 | t19839 | 295/0 | 345/43 | > > 0 | > > 0 | N73N | endoRNase | ||||||||
| 6 | a20268 | 294/1 | 343/45 | 3.65 | 1.01 | 3.60 | 3.16E−04 | 3.76 | 1.02 | 3.70 | 2.15E−04 | L216L | endoRNase | ||
| 7 | a23403 | 88/207 | 101/287 | 0.19 | 0.17 | 1.10 | 2.72E−01 | 3.93E−01 | 0.22 | 0.17 | 1.27 | 2.03E−01 | 2.93E−01 | D614G | S-spike-protein |
| 8 | g25563 | 98/197 | 204/184 | − 0.80 | 0.16 | − 5.01 | 5.52E−07 | − 0.83 | 0.16 | − 5.11 | 3.24E−07 | Q57H | ORF3a | ||
| 9 | c27964 | 288/7 | 370/18 | 0.69 | 0.45 | 1.53 | 1.25E−01 | 2.32E−01 | 0.70 | 0.45 | 1.55 | 1.22E−01 | 2.26E−01 | S24L | ORF8 |
| 10 | c28854 | 295/0 | 346/42 | > > 0 | > > 0 | S194L | Nucleocapsid | ||||||||
| 11 | g28881 | 287/8 | 334/54 | 1.76 | 0.39 | 4.54 | 5.66E−06 | 1.88 | 0.39 | 4.80 | 1.60E−06 | R203K | Nucleocapsid | ||
| 12 | g28882 | 287/8 | 334/54 | 1.76 | 0.39 | 4.54 | 5.66E−06 | 1.88 | 0.39 | 4.80 | 1.60E−06 | R203S** | Nucleocapsid | ||
| 13 | g28883 | 287/8 | 334/54 | 1.76 | 0.39 | 4.54 | 5.66E−06 | 1.88 | 0.39 | 4.80 | 1.60E−06 | G204R | Nucleocapsid | ||
*The analysis adjusted age and sex.
**R203S or R203R.
Association results of three SNV-haplotypes with hospitalization status (inpatient and outpatient) in a case–control study of 683 COVID-19 patients: frequencies of haplotypes among outpatients and inaptients, estimated coefficients, and Fisher's exact p-values, respectively, across three haplotypes.
| Hap | Discovery Set | Replication Set | |||||
|---|---|---|---|---|---|---|---|
| OUT | IN | OUT | IN | WA (%) | |||
| caac | 43 (11.08) | 2.84E−11 | 35 (7.35) | 110 (22.54) | 2.21E−10 | 16.83 | |
| taac | 8 (2.71) | 11 (2.84) | 45 (9.45) | 41 (8.4) | 5.93 | ||
| tagg | 2 (0.42) | 1 (0.2) | 0.06 | ||||
| tggg | 287 (97.29) | 334 (86.08) | 394 (82.77) | 336 (68.85) | 75.84 | ||
| ac | 294 (99.66) | 341 (87.89) | 4.56E−11 | 361 (75.84) | 399 (81.76) | 5.28E−02 | 87.95 |
| at | 2 (0.52) | 7 (1.47) | 2 (0.41) | 0.60 | |||
| gc | 1 (0.34) | 4 (1.03) | 2 (0.42) | 1 (0.2) | 0.55 | ||
| gt | 40 (10.31) | 106 (22.27) | 86 (17.62) | 8.55 | |||
| nc* | 1 (0.26) | 1.33 | |||||
| cg | 94 (31.86) | 199 (51.29) | 2.40E−06 | 195 (40.97) | 239 (48.98) | 7.19E−02 | 54.95 |
| ct | 8 (2.71) | 8 (2.06) | 15 (3.15) | 15 (3.07) | 2.70 | ||
| tg | 4 (1.36) | 3 (0.77) | 2 (0.42) | 3 (0.61) | 0.04 | ||
| tt | 189 (64.07) | 176 (45.36) | 264 (55.46) | 231 (47.34) | 41.04 | ||
| yg* | 2 (0.52) | 0.00 | |||||
Also included are haplotype frequencies in general population of Washington state (far right column).
*n—untyped nucleotide, y—ambiguity typing of either c or t.
Replication results of (t19839-g28881-g28882-g28883) with hospitalization status (inpatient and outpatient) in a case–control study of 476 outpaitents and 488 inpatients: estimated coefficients, standard error, Z-score and p-value across three haplotypes, from the marginal and adjusted analysis.
| Hap | Unadjusted analysis | Adjusted analysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Coef | OR | SE | Z | Coef | OR | SE | Z | |||
| tggg* | 0.00 | 1.00 | 0.00 | 1.00 | ||||||
| caac | 1.30 | 3.69 | 0.21 | 6.28 | 1.70 | 5.46 | 0.25 | 6.91 | ||
| taac | 0.07 | 1.07 | 0.23 | 0.29 | 7.72E−01 | − 0.15 | 0.86 | 0.25 | − 0.59 | 5.53E−01 |
| tagg | − 0.53 | 0.59 | 1.23 | − 0.44 | 6.63E−01 | − 0.79 | 0.46 | 1.24 | − 0.63 | 5.26E−01 |
| Male | 0.34 | 1.41 | 0.14 | 2.38 | ||||||
| Unkown sex | << 0 | |||||||||
| Age | 0.01 | 1.01 | 0.00 | 3.68 | ||||||
| Time | − 0.01 | 0.99 | 0.00 | − 1.80 | 7.25E−02 | |||||
| Time*Time | 0.00 | 1.00 | 0.00 | 2.82 | ||||||
Adjusted analysis controled sex, age, collection time and its square (to account possible non-linear time effect).
*"tggg" is treated as a reference haplotype for comparison with other haplotypes.
Figure 2Evolving haplotype frequencies of SNV haplotype (t19839-g28881-g28882-g28883) over January 2020–2021 in Washington. Total number of samples sequenced in each month is placed below the plot. For convenience, patient numbers in discovery and replication cohorts are also included below the plot. Besides rare haplotypes, four haplotypes are annotated together with number of SNVs in each haplotype and haplotypic frequency in bracket.
Relationships of identified SNV haplotype (t19839-g28881-g28882-g28883) with classifications of Nextstrain and clades by GISAID.
| Hap | caac | taac | cggg | tggg | Rares |
|---|---|---|---|---|---|
| Nextstrain | n = 1201 | 434 | 29 | 5462 | 11 |
| Unspecified | 7 (0.13) | ||||
| 19A | 42 (0.77) | ||||
| 19B | 1 (0.23) | 8 (27.59) | 1422 (26.03) | 1 (9.09) | |
| 20A | 5 (17.24) | 1032 (18.89) | 4 (36.36) | ||
| 20A.EU2 | |||||
| 20B | 1201 (100) | 423 (97.47) | 13 (44.83) | 2 (0.04) | 3 (27.27) |
| 20C | 3 (10.34) | 2005 (36.71) | 3 (27.27) | ||
| 20D | 2 (0.46) | ||||
| 20E (EU1) | 1 (0.02) | ||||
| 20F | |||||
| 20G | 951 (17.41) | ||||
| 20H/501Y.V2 | |||||
| 20I/501Y.V1 | 8 (1.84) | ||||
| 20 J/501Y.V3 | |||||
| G | 1 (0.23) | 18 (62.07) | 858 (15.71) | 4 (36.36) | |
| GH | 3 (10.34) | 3131 (57.32) | 3 (27.27) | ||
| GR | 1196 (99.58) | 429 (98.85) | 2 (18.18) | ||
| GV | 1 (0.02) | ||||
| L | |||||
| O | 5 (0.42) | 2 (0.46) | 35 (0.64) | 1 (9.09) | |
| S | 2 (0.46) | 8 (27.59) | 1413 (25.87) | 1 (9.09) | |
| V | 24 (0.44) | ||||
Relationships of identified SNV haplotype (t19839-g28881-g28882-g28883) with lineages assigned by Pangolin.
| Hap | caac | taac | cggg | tggg |
|---|---|---|---|---|
| Lineage | n = 1201 | 434* | 29 | 5462* |
| A.1 | 2 (0.46) | 8 (27.59) | 1412 (25.85) | |
| B.1 | 1 (0.23) | 5 (17.24) | 937 (17.15) | |
| B.1.1 | 75 (6.24) | |||
| B.1.1.110 | 1 (0.08) | |||
| B.1.1.158 | 37 (3.08) | 1 (3.45) | ||
| B.1.1.222 | 23 (1.92) | |||
| B.1.1.26 | 136 (31.34) | 2 (0.04) | ||
| B.1.1.290 | 84 (6.99) | 1 (0.23) | 1 (3.45) | |
| B.1.1.291 | 956 (79.6) | 7 (1.61) | 11 (37.93) | |
| B.1.1.65 | 18 (4.15) | 1 (0.02) | ||
| B.1.1.76 | 20 (1.67) | 1 (0.02) | ||
| B.1.169 | 8 (1.84) | 52 (0.95) | ||
| B.1.333.1 | 5 (0.42) | 5 (1.15) | 7 (0.13) | |
| B.1.371 | 1 (3.45) | 617 (11.30) | ||
| B.1.426 | 2 (6.90) | 12 (0.22) |
*Corresponding lineages associated with "taac" only or "tggg" only are excluded. For a complete list, see a separate table.