| Literature DB >> 34072181 |
Roman Matyášek1, Kateřina Řehůřková1, Kristýna Berta Marošiová1, Aleš Kovařík1.
Abstract
The genomic diversity ofEntities:
Keywords: SARS-CoV-2; amino acid hydrophobicity; apolipoprotein B mRNA editing enzyme (APOBEC); coronavirus; evolution; genetic variation; mutability
Mesh:
Substances:
Year: 2021 PMID: 34072181 PMCID: PMC8227412 DOI: 10.3390/genes12060826
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Number of genomes used in the study and their phylogenetic classification.
| Group | Clade | Number of Genomes | Coverage | Characteristic Nucleotide Variations 1 | |
|---|---|---|---|---|---|
| Total 2 | This Study | (%) | |||
| Early | L | 4699 | 2222 | 47.3 | C241, C3037, C8782, G11083, A23403, G25563, and U28144C |
| O | 5681 | 2714 | 47.8 | G11083U, C22227U, A23403G, and G26144U | |
| S | 7893 | 4532 | 57.4 | C8782U and U28144C | |
| V | 5320 | 1896 | 35.6 | C241U, C28311U, and C23929U | |
| Late | G | 62,786 | 8638 | 13.8 | C241U, C3037U, and A23403G |
| GH | 89,908 | 23,375 | 26 | C241U, C3037U, A23403G, and G25563U | |
| GR | 136,083 | 35,857 | 26.3 | C241U, C3037U, A23403G, and A28111G | |
| GV | 92,617 | 15,930 | 17.2 | C241U, C3037U, A23403G, and C22227U | |
1 Taken from the GISAID database except for clade O, where SNVs represent mutations occurring in >30% genomes in our data set. 2 To 31 January 2021.
Figure 1Genetic variation and character of SNVs in prominent SARS-CoV-2 phylogenetic clades. (A) Clade abundance evolution in the first year (taken from the GISAID web page on 1 February 2021). (B) Unrooted Neighbor Joining tree constructed from consensus sequences of genomes derived from individual clades. (C) Frequency of substitutions in individual clades (Table 1). Variants were called using a 5% frequency threshold.
Figure 2Mutation spectra of SARS-CoV-2 genomes analyzed in global data sets and virus sequences isolated from a single patient. (A) Relative frequency of individual types of substitutions. Left columns: the character of 135 substitutions identified in eight global lineages (Figure 1); Right columns: the character of 100 substitutions identified in virus genomes from a single individual. Data can be found in Tables S3 and S4. (B,C) The Spearman’s and Pearson’s statistics for 12 types of substitutions: (B) Correlation between the non-synonymous and all substitutions for global sets (upper panel) and single individual (bottom) levels. (C) Correlation between global sets and virus genomes from a single individual for all (upper panel) and non-synonymous (bottom) substitutions. Black lines represent linear regressions derived from measurements. Gray lines represent the equality between global data sets and a single individual (y = x).
Figure 3Distribution of non-synonymous substitutions along the SARS-CoV-2 genome and the character of induced amino acid changes. (A) A scheme of the SARS-CoV-2 genome organization. Numerals in italics indicate the number of substitutions identified within each gene. Genes with no detected substitution are in empty boxes. Nsp1-16 indicate genes for nonstructural proteins: RdRp: RNA-dependent RNA polymerase; Hel: helicase; Exo: 3′-to-5′ exonuclease; Met: 2′-O-ribose methyltransferase. Structural proteins are represented by surface glycoprotein (S), membrane glycoprotein (M), nucleocapsid phosphoprotein (N), and envelope protein (E). (B) Characters of amino acid hydrophobicity changes were computed using the hydrophobicity scale of Kyte and Doolittle [33] (Table S6). For better resolution, values (circles) are shown in black or gray colors, which assign them to the same colored (black and gray) genes in panel A. Number of substitutions in individual hydrophobicity intervals is given on the right margin. Arrow—the widespread Pro323Leu substitution within the NSP12 protein. Oval—cluster of C>U substitutions underlying prominent amino acid changes in the linker region of the N-protein.
Figure 4Shifts in amino acid hydrophobicity caused by SNVs in SARS-CoV-2 genomes. Box plots were constructed from data sets in Table S6. Boxes—Q1 and Q3 quartiles. The vertical line inside the box marks the median. Whiskers extend to the minimum and maximum values. Hydrophobicity scales are according to Kyte and Doolittle [33]. Lineage groups are as in Table 1. The number of SNVs are indicated in each group. Differences between the groups were not significant (p > 0.05, Mann–Whitney U test).
Figure 5Shifts in amino acid hydrophobicity caused by SNVs in SARS-CoV-2 subregions. Cumulative and average values were computed using the amino acid hydrophobicity scales according to [33] (A) and [34] (B).
Biochemical characteristics of amino acids containing major S-protein variants involved virus infectivity and transmissibility.
| Name 1 | Mutation coordinate | Amino Acid Hydrophobicity 2 | Note | |||
|---|---|---|---|---|---|---|
| Genome Protein | Ref. | Allele | Shift | |||
| N440K | U22882G | Asp440Lys | −3.5 | −3.9 | −0.4 | Suspected to increase the infectivity of the virus [ |
| L452R | U22917G | Leu452Arg | 3.8 | −4.5 | −8.3 | Thought to increase immune evasion and ACE2 binding [ |
| S477G * | A22991G | Ser477Gly | −0.8 | −0.4 | 0.4 | Suspected to strengthen receptor interaction [ |
| S477N | G22992A | Ser477Asp | −0.8 | −3.5 | −2.7 | Strengthens receptor interaction [ |
| E484K | G23012A | Glu484Lys | −3.5 | −3.9 | −0.4 | Increased evasion from the host’s immune system [ |
| E484Q | A23014C | Glu484Gln | −3.5 | −3.5 | 0 | Is suspected to increase the infectivity of the virus |
| N501Y | A23063U | Asn501Tyr | −3.5 | −1.3 | 2.2 | Enhances binding activity to the ACE2 receptor and is a variant of concern [ |
| D614G * | A23604G | Asp614Gly | −3.5 | −0.4 | 3.1 | Dominant form in the pandemic [ |
| P681H | C23604A | Pro681His | −1.6 | −3.2 | −1.6 | Increasing prevalence worldwide [ |
| P681R * | C23604G | Pro681Arg | −1.6 | −4.5 | −2.9 | May evade the immune system [ |
| Total | −18.5 | −29.1 | −10.6 | |||
| Average | −1.85 | −2.91 | −1.06 | |||
1 The list is according to the ECDC variant surveillance data report [40]. Variants included in the global data set (Table S6) are marked with asterisks (*). 2 Hydrophobicity values are according to the Kyte and Doolittle scale [33].
Figure 6Dynamics of the nucleotide composition in the SARS-CoV-2 genomes. (A) Graphs rep-resent the nucleotide compositions of consensus sequences typical for individual clades. (B) A, C, G, and U counts. Note: the reduction of Cs in V, G, GH, and GV clades was accompanied by increases in U nucleotides. Data can be found in Table S9.
Figure 7Relationship between the amino acid hydrophobicity and nucleotide composition of the codons. Hydrophobicity scales are according to [33] (A) and [34] (B). All 20 amino acids were considered. Thick and thin horizontal lines represent median and average, respectively. Differences are highlighted at the statistical level of p < 0.05 and p < 0.0001 (Mann–Whitney U test).