| Literature DB >> 34245452 |
Shuyi Fang1, Sheng Liu2,3, Jikui Shen4, Alex Z Lu5, Audrey K Y Wang5, Yucheng Zhang2,3, Kailing Li1, Juli Liu6, Lei Yang6, Chang-Deng Hu7,8, Jun Wan1,2,3,9.
Abstract
By analyzing newly collected SARS-CoV-2 genomes and comparing them with our previous study about SARS-CoV-2 single nucleotide variants (SNVs) before June 2020, we found that the SNV clustering had changed remarkably since June 2020. Apart from that the group of SNVs became dominant, which is represented by two nonsynonymous mutations A23403G (S:D614G) and C14408T (ORF1ab:P4715L), a few emerging groups of SNVs were recognized with sharply increased monthly incidence ratios of up to 70% in November 2020. Further investigation revealed sets of SNVs specific to patients' ages and/or gender, or strongly associated with mortality. Our logistic regression model explored features contributing to mortality status, including three critical SNVs, G25088T(S:V1176F), T27484C (ORF7a:L31L), and T25A (upstream of ORF1ab), ages above 40 years old, and the male gender. The protein structure analysis indicated that the emerging subgroups of nonsynonymous SNVs and the mortality-related ones were located on the protein surface area. The clashes in protein structure introduced by these mutations might in turn affect the viral pathogenesis through the alteration of protein conformation, leading to a difference in transmission and virulence. Particularly, we explored the fact that nonsynonymous SNVs tended to occur in intrinsic disordered regions of Spike and ORF1ab to significantly increase hydrophobicity, suggesting a potential role in the change of protein folding related to immune evasion.Entities:
Keywords: SARS-CoV-2; age; gender; mortality risk factor; single nucleotide variants
Mesh:
Substances:
Year: 2021 PMID: 34245452 PMCID: PMC8426680 DOI: 10.1002/jmv.27191
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Figure 1SNVs identified in more than 3% of SARS‐CoV‐2 genomes after June 1, 2020. (A) Two‐way clustering of 52 high frequent SNVs with possible annotated AA changes in 76,926 genomes worldwide. (B) Monthly occurrence ratios of corresponding SNVs. (C) Temporal patterns of the emerging groups A.E1, A.E2, and A.E3. (D) Geographical distributions of emerging SNVs in groups A.E1–3, respectively. AA, amino acid; SNV, single nucleotide variant
Figure 2SNVs specific to the age and the gender. (A) Sample distribution for five age groups. (B) SNVs significantly over‐represented in at least two age groups. (C) SNVs enriched in one age group. (D) SNVs specific to the gender with ratios in the female and male. (E) Statistical significances of SNVs specific to the gender in (D) represented by FDR‐adjusted p values (−log 10). FDR, false‐discovery rate; SNV, single nucleotide variant
Figure 3Morality related SNVs. (A) Number of SARS‐CoV‐2 samples and death ratio for each month in the study. (B) Forty‐one SNVs significantly over‐represented in the death group with corresponding total numbers of occurrences, ratios in the death and nondeath groups, and enrichment p value. (C) ThirtySNVs significantly enriched in the nondeath group with corresponding total numbers of occurrences, ratios in the death and nondeath groups, and p value. (D) Overlap of SNVs specific to the age, gender, and mortality. (E) ROC curve of logistic regression model to predict mortality. SNV, single nucleotide variant
Figure 4Protein structure variation caused by selected nonsynonymous SNVs. (A) S:A222V, (B) S:L18F, (C) ORF10:V30F, (D) N:A220V, (E) nsp7:L71F, (F) nsp14:A320V, (G) S:V1176F, (H) Ratios of nonsynonymous SNVs in the whole region or IDR of proteins, S, ORF1ab, and ORF3a, (I) Hydrophobic scores before (REF) and after alternations (ALT) of nonsynonymous SNVs in the IDRs of proteins, S, ORF1ab, and ORF3a. SNV, single nucleotide variant