| Literature DB >> 34870573 |
Pakorn Aiewsakun1,2, Patrawee Nilplub2, Patompon Wongtrakoongate3,4, Suradej Hongeng5, Arunee Thitithanyanont1,2.
Abstract
In this study, we performed genome-wide association analyses on SARS-CoV-2 genomes to identify genetic mutations associated with pre-symptomatic/asymptomatic COVID-19 cases. Various potential covariates and confounding factors of COVID-19 severity, including patient age, gender and country, as well as virus phylogenetic relatedness were adjusted for. In total, 3021 full-length genomes of SARS-CoV-2 generated from original clinical samples and whose patient status could be determined conclusively as either 'pre-symptomatic/asymptomatic' or 'symptomatic' were retrieved from the GISAID database. We found that the mutation 11 083G>T, located in the coding region of non-structural protein 6, is significantly associated with asymptomatic COVID-19. Patient age is positively correlated with symptomatic infection, while gender is not significantly correlated with the development of the disease. We also found that the effects of the mutation, patient age and gender do not vary significantly among countries, although each country appears to have varying baseline chances of COVID-19 symptom development.Entities:
Keywords: GWAS; SARS-CoV-2; nsp6
Mesh:
Year: 2021 PMID: 34870573 PMCID: PMC8767342 DOI: 10.1099/mgen.0.000734
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
SARS-CoV-2 polymorphic sites under the study
Twenty-six polymorphic sites with <50 % ambiguous bases and aggregated minor allele frequencies of >5 % of the collected sequences were analysed in our analyses. Sites with co-occurring variants, i.e. those with pairwise uncertainty coefficients of >0.90 (Table S5), were grouped together for analysis.
|
Nucleotide position* |
Nucleotide grouping* |
Reference variant [frequency (%)]* |
Alternative variant [frequency (%)] |
Gene location |
Substitution type |
Amino acid change† |
|---|---|---|---|---|---|---|
|
313 |
313 |
C(70.18) |
T(29.66), N(0.1), –(0.03), Y(0.03) |
ORF1ab:NSP1 |
Synonymous |
L16 |
|
1059 |
1059 |
C(91.43) |
T(8.47), Y(0.07), N(0.03) |
ORF1ab:NSP2 |
Non-synonymous |
T85I |
|
3037 |
3037 |
C(17.71) |
T(82.16), N(0.07), Y(0.07) |
ORF1ab:NSP3 |
Synonymous |
F106 |
|
11 083 |
11 083 |
G(92.62) |
T(6.79), N(0.56), K(0.03) |
ORF1ab:NSP6 |
Non-synonymous |
L37F |
|
18 877 |
18 877 |
C(94.21) |
T(5.76), N(0.03) |
ORF1ab:NSP14 |
Synonymous |
L280 |
|
20 268 |
20 268 |
A(93.02) |
G(6.59), N(0.4) |
ORF1ab:NSP15 |
Synonymous |
L216 |
|
23 403 |
23 403 |
A(18.21) |
G(81.53), N(0.2), R(0.07) |
S |
Non-synonymous |
D614G |
|
25 563 |
25 563 |
G(83.88) |
T(14.96), C(1.06), N(0.1) |
ORF3a |
Non-synonymous |
Q57H |
|
26 730 |
26 730 |
G(94.64) |
A(5.1), T(0.13), C(0.07), N(0.07) |
M |
Non-synonymous |
V70I, V70F, V70L |
|
26 735 |
26 735 |
C(94.87) |
T(5.06), N(0.07) |
M |
Synonymous |
Y71 |
|
28 975 |
28 975 |
G(92.98) |
T(6.29), C(0.53), N(0.2) |
N |
Non-synonymous |
M234I |
|
29 692 |
29 692 |
G(92.65) |
T(5.66), –(1.42), N(0.26) |
3′UTR |
Non-coding |
– |
|
241 |
241/14 408 |
C(16.95) |
T(82.75), G(0.13), A(0.07), Y(0.07), –(0.03) |
5′UTR |
Non-coding |
– |
|
14 408 |
C(17.28) |
T(82.36), Y(0.2), N(0.17) |
ORF1ab:RDRP |
Non-synonymous |
P323L | |
|
8782 |
8782/28 144 |
C(92.98) |
T(6.92), N(0.1) |
ORF1ab:NSP4 |
Synonymous |
S76 |
|
28 144 |
T(93.25) |
C(6.75) |
ORF8 |
Non-synonymous |
L84S | |
|
18 167 |
18167/21 518 |
C(94.14) |
T(5.79), A(0.07) |
ORF1ab:NSP14 |
Non-synonymous |
P43L, P43H |
|
21 518 |
G(93.88) |
T(5.83), N(0.3) |
ORF1ab:NSP16 |
Non-synonymous |
R287I | |
|
28 881 |
28 881/28 882/28 883 |
G(49.75) |
A(49.85), N(0.3), R(0.1) |
N |
Non-synonymous |
R203K, K204R |
|
28 882 |
G(49.95) |
A(49.65), N(0.26), R(0.1), T(0.03) | ||||
|
28 883 |
G(50.05) |
C(49.59), N(0.26), S(0.1) | ||||
|
4346 |
4346/9286/10 376/14 708/28 725 |
T(94.24) |
C(5.76) |
ORF1ab:NSP3 |
Non-synonymous |
S543P |
|
9286 |
C(94.11) |
T(5.79), N(0.1) |
ORF1ab:NSP4 |
Synonymous |
N244 | |
|
10 376 |
C(93.88) |
T(6.06), N(0.07) |
ORF1ab:NSP5 |
Non-synonymous |
P108S | |
|
14 708 |
C(94.17) |
T(5.79), N(0.03) |
ORF1ab:RDRP |
Non-synonymous |
A423V | |
|
28 725 |
C(94.11) |
T(5.86), N(0.03) |
N |
Non-synonymous |
P151L |
*With respect to the reference SARS-CoV-2 genome (RefSeq accession number: NC_045512.2).
†Reported for non-ambiguous base changes only.
Fig. 1.SARS-CoV-2 phylogeny. The tree was estimated under the maximum-likelihood framework implemented in IQ-TREE2 [27] based on a manually curated alignment of 3021 full-length SARS-CoV-2 genomes. Potential recombination within the alignment was checked by using the Phi test implemented in SplitsTree4 [26], but no evidence was found (P=0.91). The best-fit nucleotide substitution model was determined to be GTR+F+R5 (the general time reversible model+empirical base frequencies+the 5-discrete-rate-category FreeRate model) by ModelFinder [28] under the Bayesian information criterion and was used for tree reconstruction. We compared our tree with the global tree obtained from GISAID, and determined the terminal branch leading to sample EPI_ISL_407976 as a suitable location for root placement. Bar, substitutions per site. The tree file in Newick format with bootstrap clade-support values, computed based on 1000 bootstrap trees, can be found in Data S1. The three columns on the right indicate the United Nations (UN) geoscheme subregion, the GISAID haplogroup assignment and the patient status of the sequences, respectively (see keys). The World map below the tree shows the countries from which the sequences were sampled, colored according to the UN geoscheme subregions.
Fig. 2.Screening for candidate sites with genetic variations associated with COVID-19 pathogenicity by using TreeWAS [29]. (a) Maximum-likelihood tree of SARS-CoV-2 (as shown in Fig. 1) is shown on the left. Bar, substitutions per site. The viruses’ patient status (pat. stat.) and mutational profiles of the 26 polymorphic sites investigated (Table 1) are shown on the right (see keys for details). Sites determined as strongly linked loci are indicated with black horizontal bars and numbers on the top. (b) Three separate tests of genotype–phenotype association implemented in the software TreeWAS [29] were performed, namely ‘Terminal’ (left), ‘Simultaneous’ (middle) and ‘Subsequent’ tests (right) with Bonferroni multiple-testing correction (adjusted P value threshold=5 %/17 sets of polymorphic sites analysed=0.294 %). To account for phylogenetic uncertainty, the tests were applied to the entire distribution of the 1000 bootstrap trees to obtain the distributions of correlation scores and null scores (Cor. score null dist.). The horizontal red strips indicate the 95 % highest density intervals of the score cut-offs obtained from the 1000 bootstrap analyses. The horizontal red dotted lines indicate the score cut-off obtained from the maximum-likelihood tree analysis. All tests revealed that site 11 083 had the highest scores (horizontal red solid lines). Simultaneous tests suggested that site 11 083 was the only site with genetic variations significantly associated with COVID-19 patient status (marked with an asterisk, positive bootstrap testing rate=58.5 %), while the other two tests did not detect significant signals.
GISAID sequences with ambiguous bases at position 11 083
The sequence NC_045512.2 is included as the reference sequence of the original strain. Ambiguous bases are in bold type. The base at position 11 083 is underlined.
|
Accession no. |
Country |
Lineage – GISAID haplogroup |
Patient status |
Sequence (11073–11093) |
|---|---|---|---|---|
|
| ||||
|
NC_045512.2 |
China |
B – L |
Symptomatic |
TCTTTTTTTT |
|
| ||||
|
EPI_ISL_454602 |
Croatia |
B.1.1 – GR |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_539777 |
Czech Republic |
B.1 – GH |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_626570 |
Czech Republic |
B.1 – GH |
Symptomatic |
TCTTTTTTT |
|
EPI_ISL_626613 |
Czech Republic |
B.1.258 – G |
Symptomatic |
TCTTTTTTT |
|
EPI_ISL_437454 |
India |
B.6 – O |
Symptomatic |
TCTTTTTTTT |
|
EPI_ISL_479520 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_436137 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_436140 |
India |
B.1.80 – G |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_436141 |
India |
B.1.80 – G |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_436156 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_436157 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_486386 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTT |
|
EPI_ISL_486394 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_486403 |
India |
B.6 – O |
Pre-symptomatic/Asymptomatic |
TCTTTTTTT |
|
EPI_ISL_486407 |
India |
B.1 – S |
Pre-symptomatic/Asymptomatic |
TCTTTTTTTT |
|
EPI_ISL_486384 |
India |
B.1 – O |
Symptomatic |
TCT |
|
EPI_ISL_447776 |
Colombia |
B.1 – GH |
Symptomatic |
|
|
EPI_ISL_447801 |
Colombia |
B.1.5 – G |
Symptomatic |
|
Fig. 3.Adjusted odds ratios and 95 % confidence intervals of various potential risk factors for COVID-19 symptom development. The values were estimated based on the best-fit binomial generalized linear-mixed model M2, in which the effects of the mutation 11 083G>T, patient gender and age on the disease outcome were treated as fixed effects, and the effects of country sampling and virus phylogenetic relatedness were considered random effects. The model allowed each individual virus and country to have varying baseline chances of symptom development while adjusting for the virus phylogenetic structure. See model specification in Table S6 and estimated parameter values in Table S7.