| Literature DB >> 34004284 |
Rui Wang1, Jiahui Chen1, Kaifu Gao1, Guo-Wei Wei2.
Abstract
Recently, the SARS-CoV-2 variants from the United Kingdom (UK), South Africa, and Brazil have received much attention for their increased infectivity, potentially high viEntities:
Keywords: Antibody; Binding affinity; COVID-19; Deep learning; Mutation; Persistent homology; SARS-CoV-2; Vaccine escape
Year: 2021 PMID: 34004284 PMCID: PMC8123493 DOI: 10.1016/j.ygeno.2021.05.006
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736
The distribution of 12 SNP types among 6945 unique mutations and 2,194,305 non-unique mutations on the S gene of SARS-CoV-2 worldwide. NU is the number of unique mutations and NNU is the number of non-unique mutations. RU and RNU represent the ratios of 12 SNP types among unique and non-unique mutations. In this table, we bold the ratios that are greater than 10%.
| SNP type | Mutation type | NU | NNU | RU | RNU | SNP type | Mutation type | NU | NNU | RU | RNU |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A>T | Transversion | 655 | 187,467 | 9.43% | 8.54% | C>T | Transition | 609 | 488,323 | 8.77% | |
| A>C | Transversion | 567 | 12,914 | 8.16% | 0.59% | C>A | Transversion | 466 | 369,637 | 6.71% | |
| A>G | Transition | 908 | 530,814 | C>G | Transversion | 269 | 3965 | 3.87% | 0.18% | ||
| T>A | Transversion | 589 | 6690 | 8.48% | 0.30% | G>T | Transversion | 523 | 111,949 | 7.53% | 5.10% |
| T>C | Transition | 976 | 60,918 | 2.78% | G>C | Transversion | 342 | 182,984 | 4.92% | 8.34% | |
| T>G | Transversion | 498 | 179,748 | 7.17% | 8.19% | G>A | Transition | 543 | 58,896 | 7.82% | 2.68% |
The distribution of 12 SNP types among 1024 unique mutations and 266,458 non-unique mutations on the spike RBD gene of SARS-CoV-2 worldwide. NU is the number of unique mutations and NNU is the number of non-unique mutations. RU and RNU represent the ratios of 12 SNP types among unique and non-unique mutations. In this table, we bold the ratios that are greater than 10%.
| SNP type | Mutation type | NU | NNU | RU | RNU | SNP type | Mutation type | NU | NNU | RU | RNU |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A>T | Transversion | 84 | 170,165 | 8.20% | C>T | Transition | 90 | 11,562 | 8.79% | 4.34% | |
| A>C | Transversion | 75 | 3685 | 7.32% | 1.38% | C>A | Transversion | 66 | 16,551 | 6.45% | 6.21% |
| A>G | Transition | 134 | 2310 | 0.87% | C>G | Transversion | 38 | 694 | 3.71% | 0.26% | |
| T>A | Transversion | 89 | 890 | 8.69% | 0.33% | G>T | Transversion | 79 | 7419 | 7.71% | 2.78% |
| T>C | Transition | 161 | 7308 | 2.74% | G>C | Transversion | 47 | 907 | 4.59% | 0.34% | |
| T>G | Transversion | 76 | 11,318 | 7.42% | 4.25% | G>A | Transition | 85 | 33,649 | 8.30% |
Fig. 12D sequence alignment for the S protein RBD of SARS-CoV-2, Bat-SL-RaTG13, Pangolin-CoV, SARS-CoV, and Bat-SL-BM48-31.
Fig. 2Illustration of SARS-CoV-2 mutation-induced BFE changes for the complexes of S protein and ACE2. Here, 100 most observed mutations on S RBD are illustrated.
Fig. 3Illustration of the time evolution of 424 ACE2 binding-strengthening RBD mutations (blue) and 227 ACE2 binding-weakening RBD mutations (red) on the S protein RBD of SARS-CoV-2 from Jan 07, 2020 to April 18, 2021. The x-axis represents date and y-axis represents the natural log of frequency of each mutation. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
List of top 40 high-frequency (HF) mutations and their corresponding BFE changes (unit: kcal/mol) of the binding of S protein and ACE2. Here, count shows the frequency occurred in 2021.
| Rank | HF mutation | Count | BFE change | Rank | HF mutation | Count | BFE change |
|---|---|---|---|---|---|---|---|
| Top 1 | N501Y | 168,801 | 0.5499 | Top 21 | N450K | 184 | 0.3535 |
| Top 2 | L452R | 9843 | 0.5752 | Top 22 | E484Q | 182 | 0.0057 |
| Top 3 | E484K | 9350 | 0.0946 | Top 23 | P330S | 182 | 0.0533 |
| Top 4 | S477N | 9276 | 0.018 | Top 24 | A522V | 179 | 0.0705 |
| Top 5 | N439K | 6056 | 0.1792 | Top 25 | D427N | 164 | −0.1133 |
| Top 6 | T478K | 4935 | 0.9994 | Top 26 | P479S | 153 | 0.3844 |
| Top 7 | K417N | 1634 | 0.1661 | Top 27 | V382L | 151 | 0.0355 |
| Top 8 | K417T | 1508 | 0.0116 | Top 28 | T385N | 151 | 0.0049 |
| Top 9 | S494P | 1483 | 0.0902 | Top 29 | Q414R | 143 | 0.0708 |
| Top 10 | N501T | 1295 | 0.4514 | Top 30 | R346K | 135 | 0.1234 |
| Top 11 | A520S | 819 | 0.1495 | Top 31 | T385I | 127 | 0.0314 |
| Top 12 | A522S | 621 | 0.1283 | Top 32 | R403K | 121 | 0.1778 |
| Top 13 | V367F | 536 | 0.1764 | Top 33 | L455F | 99 | −0.0415 |
| Top 14 | N440K | 432 | 0.6161 | Top 34 | V483F | 99 | 0.5428 |
| Top 15 | S477R | 394 | 0.082 | Top 35 | A475V | 96 | 0.3069 |
| Top 16 | P384L | 389 | 0.2681 | Top 36 | G446V | 86 | 0.1583 |
| Top 17 | R357K | 373 | 0.1393 | Top 37 | L452M | 83 | 0.5966 |
| Top 18 | F490S | 363 | 0.4406 | Top 38 | A348S | 82 | 0.4616 |
| Top 19 | P384S | 263 | 0.1151 | Top 39 | T478I | 81 | 0.1269 |
| Top 20 | Q414K | 224 | 0.1234 | Top 40 | A352S | 78 | 0.2576 |
Fig. 4The 3D structure of SARS-CoV-2 S protein RBD bound with ACE2 (PDB ID: 6M0J). We choose blue and red colors to mark the binding-strengthening and binding-weakening mutations, respectively. Vaccine escape mutations described in Table 4 are labeled. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
List of vaccine escape (VE) and vaccine weakening (VW) Their corresponding BFE changes (unit: kcal/mol) of the binding of S protein and ACE2 are provided as well. Here, the count shows the number of antibodies that will make a specific mutation to be an AD mutation.
| VE mutation | BFE change | Count | VW mutation | BFE change | Count |
|---|---|---|---|---|---|
| S494P | 0.0902 | 50 | N501Y | 0.5499 | 21 |
| Q493L | 0.2279 | 43 | Q493R | 0.1271 | 21 |
| K417N | 0.1661 | 43 | R408I | 0.1949 | 19 |
| F490S | 0.4406 | 42 | Q493H | 0.2385 | 18 |
| F486L | 0.1456 | 41 | P384S | 0.1151 | 18 |
| R403K | 0.1778 | 34 | K378N | 0.0573 | 16 |
| E484K | 0.0946 | 31 | G496S | 0.0187 | 15 |
| L452R | 0.5752 | 28 | L455F | −0.0415 | 15 |
| K417T | 0.0116 | 28 | I410V | 0.7105 | 14 |
| F490L | 0.5139 | 25 | R346S | 0.0374 | 14 |
| E484Q | 0.0057 | 25 | V483A | 0.6695 | 13 |
| A475S | −0.0732 | 24 | K444N | 0.1024 | 12 |
| N501T | 0.4514 | 11 | |||
| P384L | 0.2681 | 11 |
Fig. 5Illustration of SARS-CoV-2 S RBD 100 most observed mutations induced BFE changes for the complexes of S protein and 106 antibodies or ACE2. Here, red colour represents the negative changes that will weaken the binding, while the green colour shows the positive changes that will strengthen the binding. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The statistical analysis of mutations on S protein RBD of 31 countries with large sequencing data. Nseq is the number of sequences in each country. NU-RBD is the number of unique mutations on RBD and NNU-RBD is the number of non-unique mutations on RBD. Npositive and Nnegative represent the number of unique single mutations that will respectively result in positive and negative BFE changes of S protein and ACE2 induced by mutations on S protein RBD.
| Country (Country code) | Nseq | NU | NNU | Npositive | Nnegative |
|---|---|---|---|---|---|
| United Kingdom (UK) | 174,372 | 297 | 98,015 | 234 | 63 |
| United States (USA) | 127,809 | 352 | 44,660 | 252 | 100 |
| Denmark (DK) | 29,689 | 94 | 9628 | 81 | 13 |
| Germany (DE) | 18,778 | 324 | 16,033 | 207 | 117 |
| Canada (CA) | 13,050 | 64 | 1180 | 55 | 9 |
| Netherlands (NL) | 12,293 | 86 | 7824 | 74 | 12 |
| Sweden (SE) | 12,183 | 54 | 8346 | 51 | 3 |
| Switzerland (CH) | 10,257 | 70 | 5623 | 62 | 8 |
| Australia (AU) | 9822 | 41 | 7654 | 34 | 7 |
| France (FR) | 8945 | 76 | 6925 | 64 | 12 |
| Belgium (BE) | 7057 | 68 | 4806 | 63 | 5 |
| Italy (IT) | 6568 | 62 | 4056 | 58 | 4 |
| Spain (ES) | 6435 | 75 | 2340 | 61 | 14 |
| Ireland (IE) | 4193 | 41 | 3498 | 38 | 3 |
| Brazil (BR) | 3914 | 39 | 2899 | 32 | 7 |
| Iceland (IS) | 3868 | 13 | 158 | 13 | 0 |
| India (IN) | 3728 | 53 | 342 | 48 | 5 |
| Luxembourg (LU) | 3719 | 36 | 2224 | 33 | 3 |
| Norway (NO) | 3271 | 27 | 1374 | 26 | 1 |
| Poland (PL) | 3102 | 40 | 2505 | 34 | 6 |
| Mexico (MX) | 2908 | 48 | 1715 | 46 | 2 |
| Portugal (PT) | 2625 | 34 | 1370 | 31 | 3 |
| Latvia (LV) | 2391 | 21 | 761 | 20 | 1 |
| Lithuania (LT) | 2001 | 22 | 1052 | 21 | 1 |
| Slovenia (SI) | 1831 | 27 | 1543 | 20 | 7 |
| Finland (FI) | 1734 | 24 | 784 | 21 | 3 |
| Turkey (TR) | 1729 | 33 | 1126 | 32 | 1 |
| Czech Republic (CZ) | 1685 | 24 | 1339 | 22 | 2 |
| United Arab Emirates (AE) | 1581 | 21 | 80 | 21 | 0 |
| Austria (AT) | 1580 | 25 | 815 | 22 | 3 |
| Singapore (SG) | 1423 | 22 | 319 | 21 | 1 |
Fig. 6The log growth rate and log frequency of mutations on S protein RBD in the United Kingdom. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 7The log growth rate and log frequency of mutations on S protein RBD in the United States. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 8The log growth rate and log frequency of mutations on S protein RBD in the Denmark. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 9The log growth rate and log frequency of mutations on S protein RBD in the Netherlands. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10The log growth rate and log frequency of mutations on S protein RBD in India. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 11The log growth rate and log frequency of mutations on S protein RBD in Singapore. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 12The log growth rate and log frequency of mutations on S protein RBD in Brazil. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Most significant mutations on S protein RBD of 31 countries with large sequencing data.
| Country | Most significant mutations |
|---|---|
| United Kingdom | N439K, S477N, S494P, and N501Y, |
| United States | A520S, N501Y, S494P, E484K, S477N, N501T, and L452R |
| Denmark | S477N, Y453F, S477R, N439K, and N501Y |
| Germany | N439K, S477N, and N501Y |
| Canada | R357K, E484K, and L452R |
| Netherlands | N501Y, K417N, E484K, F486L, S477N, N439K, and K417T |
| Sweden | E484K, S477N, N439K, N501Y, and K417N |
| Switzerland | N439K, S477N, N501Y, Q414K, N450K, L452R, and T478K |
| Australia | S477N, N501Y, L452R, L455F, N439K, and N501T |
| France | S477N, N439K, L452R, A522S, E484K, N501Y, and K417T |
| Belgium | N501Y, S477N, E484K, N450K, K417N, and K417T |
| Italy | N439K, S477N, L452R, E484K, N501Y, K417T, N440K, and Q414K |
| Spain | S477N, N501Y, S494P, and E484K |
| Ireland | N439K, N501Y, and E484K |
| Brazil | E484K, K417T, and N501Y |
| Iceland | S477N, N439K, and E406Q |
| India | N440K, A520S, P384L, S477N, S494P, L452R, E484Q, N501Y, and E484K |
| Luxembourg | S477N, N439K, and N501Y |
| Norway | N439K, S477N, A520S, and N501Y |
| Poland | N439K, S477N, A522S, N501Y, F494P |
| Mexico | L452R and T478K |
| Portugal | S477N, L452R, and N501Y |
| Latvia | E484K, N501Y, N439K, V367F, A522V, S494P, and K417N |
| Lithuania | V362F, N439K, N501Y, S477N, S490L, L452R, S477I, and E471Q |
| Slovenia | N439K, S477R, S477N, N501Y, K356R, and E484K |
| Finland | P384L, S477N, N439K, A352S, and N501Y |
| Turkey | S477N, N501Y, K417N, N501T, and E484K |
| Czech Republic | S459Y, N439K, S477N, N501Y, E484K, and K417N |
| United Arab Emirates | N501Y, N440K, S477N, N439K, E484K, and K417N |
| Austria | S477N, N439K and N501Y |
| Singapore | F490L, N440K, N439K, S477N, L452R, E484K, N501Y, and K417N |
Fig. 13The log growth rate and log frequency of mutations on S protein RBD in Mexico. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)