| Literature DB >> 34004284 |
Rui Wang1, Jiahui Chen1, Kaifu Gao1, Guo-Wei Wei2.
Abstract
Recently, the SARS-CoV-2 variants from the Uene">nited Kingdom (UK), South Africa, and Brazil have received much attention for their increased infectivity, potentially high virulence, and possible threats to existing vaccines and antibody therapies. The question remains if there are other more infectious variants transmitted around the world. We carry out a large-scale study of 506,768 SARS-CoV-2 genome isolates from patients to identify many other rapidly growing mutations on the spike (S) protein receptor-binding domain (RBD). We reveal that essentially all 100 most observed mutations strengthen the binding between the RBD and the host angiotensin-converting enzyme 2 (ACE2), indicating the virus evolves toward more infectious variants. In particular, we discover new fast-growing RBD mutations N439K, S477N, S477R, and N501T that also enhance the RBD and ACE2 binding. We further unveil that mutation N501Y involved in United Kingdom (UK), South Africa, and Brazil variants may moderately weaken the binding between the RBD and many known antibodies, while mutations E484K and K417N found in South Africa and Brazilian variants, L452R and E484Q found in India variants, can potentially disrupt the binding between the RBD and many known antibodies. Among these RBD mutations, L452R is also now known as part of the California variant B.1.427. Finally, we hypothesize that RBD mutations that can simultaneously make SARS-CoV-2 more infectious and disrupt the existing antibodies, called vaccine escape mutations, will pose an imminent threat to the current crop of vaccines. A list of most likely vaccine escape mutations is given, including S494P, Q493L, K417N, F490S, F486L, R403K, E484K, L452R, K417T, F490L, E484Q, and A475S. Mutation T478K appears to make the Mexico variant B.1.1.222 the most infectious one. Our comprehensive genetic analysis and protein-protein binding study show that the genetic evolution of SARS-CoV-2 on the RBD, which may be regulated by host gene editing, viral proofreading, random genetic drift, and natural selection, gives rise to more infectious variants that will potentially compromise existing vaccines and antibody therapies.Entities:
Keywords: Antibody; Binding affinity; COVID-19; Deep learning; Mutation; Persistent homology; SARS-CoV-2; Vaccine escape
Year: 2021 PMID: 34004284 PMCID: PMC8123493 DOI: 10.1016/j.ygeno.2021.05.006
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736
The distribution of 12 SNP types among 6945 unique mutations and 2,194,305 non-unique mutations on the S gene of SARS-CoV-2 worldwide. NU is the number of unique mutations and NNU is the number of non-unique mutations. RU and RNU represent the ratios of 12 SNP types among unique and non-unique mutations. In this table, we bold the ratios that are greater than 10%.
| SNP type | Mutation type | NU | NNU | RU | RNU | SNP type | Mutation type | NU | NNU | RU | RNU |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A>T | Transversion | 655 | 187,467 | 9.43% | 8.54% | C>T | Transition | 609 | 488,323 | 8.77% | |
| A>C | Transversion | 567 | 12,914 | 8.16% | 0.59% | C>A | Transversion | 466 | 369,637 | 6.71% | |
| A>G | Transition | 908 | 530,814 | C>G | Transversion | 269 | 3965 | 3.87% | 0.18% | ||
| T>A | Transversion | 589 | 6690 | 8.48% | 0.30% | G>T | Transversion | 523 | 111,949 | 7.53% | 5.10% |
| T>C | Transition | 976 | 60,918 | 2.78% | G>C | Transversion | 342 | 182,984 | 4.92% | 8.34% | |
| T>G | Transversion | 498 | 179,748 | 7.17% | 8.19% | G>A | Transition | 543 | 58,896 | 7.82% | 2.68% |
The distribution of 12 SNP types among 1024 unique mutations and 266,458 non-unique mutations on the spike RBD gene of SARS-CoV-2 worldwide. NU is the number of unique mutations and NNU is the number of non-unique mutations. RU and RNU represent the ratios of 12 SNP types among unique and non-unique mutations. In this table, we bold the ratios that are greater than 10%.
| SNP type | Mutation type | NU | NNU | RU | RNU | SNP type | Mutation type | NU | NNU | RU | RNU |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A>T | Transversion | 84 | 170,165 | 8.20% | C>T | Transition | 90 | 11,562 | 8.79% | 4.34% | |
| A>C | Transversion | 75 | 3685 | 7.32% | 1.38% | C>A | Transversion | 66 | 16,551 | 6.45% | 6.21% |
| A>G | Transition | 134 | 2310 | 0.87% | C>G | Transversion | 38 | 694 | 3.71% | 0.26% | |
| T>A | Transversion | 89 | 890 | 8.69% | 0.33% | G>T | Transversion | 79 | 7419 | 7.71% | 2.78% |
| T>C | Transition | 161 | 7308 | 2.74% | G>C | Transversion | 47 | 907 | 4.59% | 0.34% | |
| T>G | Transversion | 76 | 11,318 | 7.42% | 4.25% | G>A | Transition | 85 | 33,649 | 8.30% |
Fig. 12D sequence alignment for the S protein RBD of SARS-CoV-2, Bat-SL-RaTG13, Pangolin-CoV, SARS-CoV, and Bat-SL-BM48-31.
Fig. 2Illustration of SARS-CoV-2 mutation-induced BFE changes for the complexes of S protein and ACE2. Here, 100 most observed mutations on S RBD are illustrated.
Fig. 3Illustration of the time evolution of 424 ACE2 binding-strengthening RBD mutations (blue) and 227 ACE2 binding-weakening RBD mutations (red) on the S protein RBD of SARS-CoV-2 from Jan 07, 2020 to April 18, 2021. The x-axis represents date and y-axis represents the natural log of frequency of each mutation. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
List of top 40 high-frequency (HF) mutations and their corresponding BFE changes (unit: kcal/mol) of the binding of S protein and ACE2. Here, count shows the frequency occurred in 2021.
| Rank | HF mutation | Count | BFE change | Rank | HF mutation | Count | BFE change |
|---|---|---|---|---|---|---|---|
| Top 1 | N501Y | 168,801 | 0.5499 | Top 21 | N450K | 184 | 0.3535 |
| Top 2 | L452R | 9843 | 0.5752 | Top 22 | E484Q | 182 | 0.0057 |
| Top 3 | E484K | 9350 | 0.0946 | Top 23 | P330S | 182 | 0.0533 |
| Top 4 | S477N | 9276 | 0.018 | Top 24 | A522V | 179 | 0.0705 |
| Top 5 | N439K | 6056 | 0.1792 | Top 25 | D427N | 164 | −0.1133 |
| Top 6 | T478K | 4935 | 0.9994 | Top 26 | P479S | 153 | 0.3844 |
| Top 7 | K417N | 1634 | 0.1661 | Top 27 | V382L | 151 | 0.0355 |
| Top 8 | K417T | 1508 | 0.0116 | Top 28 | T385N | 151 | 0.0049 |
| Top 9 | S494P | 1483 | 0.0902 | Top 29 | Q414R | 143 | 0.0708 |
| Top 10 | N501T | 1295 | 0.4514 | Top 30 | R346K | 135 | 0.1234 |
| Top 11 | A520S | 819 | 0.1495 | Top 31 | T385I | 127 | 0.0314 |
| Top 12 | A522S | 621 | 0.1283 | Top 32 | R403K | 121 | 0.1778 |
| Top 13 | V367F | 536 | 0.1764 | Top 33 | L455F | 99 | −0.0415 |
| Top 14 | N440K | 432 | 0.6161 | Top 34 | V483F | 99 | 0.5428 |
| Top 15 | S477R | 394 | 0.082 | Top 35 | A475V | 96 | 0.3069 |
| Top 16 | P384L | 389 | 0.2681 | Top 36 | G446V | 86 | 0.1583 |
| Top 17 | R357K | 373 | 0.1393 | Top 37 | L452M | 83 | 0.5966 |
| Top 18 | F490S | 363 | 0.4406 | Top 38 | A348S | 82 | 0.4616 |
| Top 19 | P384S | 263 | 0.1151 | Top 39 | T478I | 81 | 0.1269 |
| Top 20 | Q414K | 224 | 0.1234 | Top 40 | A352S | 78 | 0.2576 |
Fig. 4The 3D structure of SARS-CoV-2 S protein RBD bound with ACE2 (PDB ID: 6M0J). We choose blue and red colors to mark the binding-strengthening and binding-weakening mutations, respectively. Vaccine escape mutations described in Table 4 are labeled. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
List of vaccine escape (VE) and vaccine weakening (VW) Their corresponding BFE changes (unit: kcal/mol) of the binding of S protein and ACE2 are provided as well. Here, the count shows the number of antibodies that will make a specific mutation to be an AD mutation.
| VE mutation | BFE change | Count | VW mutation | BFE change | Count |
|---|---|---|---|---|---|
| S494P | 0.0902 | 50 | N501Y | 0.5499 | 21 |
| Q493L | 0.2279 | 43 | Q493R | 0.1271 | 21 |
| K417N | 0.1661 | 43 | R408I | 0.1949 | 19 |
| F490S | 0.4406 | 42 | Q493H | 0.2385 | 18 |
| F486L | 0.1456 | 41 | P384S | 0.1151 | 18 |
| R403K | 0.1778 | 34 | K378N | 0.0573 | 16 |
| E484K | 0.0946 | 31 | G496S | 0.0187 | 15 |
| L452R | 0.5752 | 28 | L455F | −0.0415 | 15 |
| K417T | 0.0116 | 28 | I410V | 0.7105 | 14 |
| F490L | 0.5139 | 25 | R346S | 0.0374 | 14 |
| E484Q | 0.0057 | 25 | V483A | 0.6695 | 13 |
| A475S | −0.0732 | 24 | K444N | 0.1024 | 12 |
| N501T | 0.4514 | 11 | |||
| P384L | 0.2681 | 11 |
Fig. 5Illustration of SARS-CoV-2 S RBD 100 most observed mutations induced BFE changes for the complexes of S protein and 106 antibodies or ACE2. Here, red colour represents the negative changes that will weaken the binding, while the green colour shows the positive changes that will strengthen the binding. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The statistical analysis of mutations on S protein RBD of 31 countries with large sequencing data. Nseq is the number of sequences in each country. NU-RBD is the number of unique mutations on RBD and NNU-RBD is the number of non-unique mutations on RBD. Npositive and Nnegative represent the number of unique single mutations that will respectively result in positive and negative BFE changes of S protein and ACE2 induced by mutations on S protein RBD.
| Country (Country code) | Nseq | NU | NNU | Npositive | Nnegative |
|---|---|---|---|---|---|
| United Kingdom (UK) | 174,372 | 297 | 98,015 | 234 | 63 |
| United States (USA) | 127,809 | 352 | 44,660 | 252 | 100 |
| Denmark (DK) | 29,689 | 94 | 9628 | 81 | 13 |
| Germany (DE) | 18,778 | 324 | 16,033 | 207 | 117 |
| Canada (CA) | 13,050 | 64 | 1180 | 55 | 9 |
| Netherlands (NL) | 12,293 | 86 | 7824 | 74 | 12 |
| Sweden (SE) | 12,183 | 54 | 8346 | 51 | 3 |
| Switzerland (CH) | 10,257 | 70 | 5623 | 62 | 8 |
| Australia (AU) | 9822 | 41 | 7654 | 34 | 7 |
| France (FR) | 8945 | 76 | 6925 | 64 | 12 |
| Belgium (BE) | 7057 | 68 | 4806 | 63 | 5 |
| Italy (IT) | 6568 | 62 | 4056 | 58 | 4 |
| Spain (ES) | 6435 | 75 | 2340 | 61 | 14 |
| Ireland (IE) | 4193 | 41 | 3498 | 38 | 3 |
| Brazil (BR) | 3914 | 39 | 2899 | 32 | 7 |
| Iceland (IS) | 3868 | 13 | 158 | 13 | 0 |
| India (IN) | 3728 | 53 | 342 | 48 | 5 |
| Luxembourg (LU) | 3719 | 36 | 2224 | 33 | 3 |
| Norway (NO) | 3271 | 27 | 1374 | 26 | 1 |
| Poland (PL) | 3102 | 40 | 2505 | 34 | 6 |
| Mexico (MX) | 2908 | 48 | 1715 | 46 | 2 |
| Portugal (PT) | 2625 | 34 | 1370 | 31 | 3 |
| Latvia (LV) | 2391 | 21 | 761 | 20 | 1 |
| Lithuania (LT) | 2001 | 22 | 1052 | 21 | 1 |
| Slovenia (SI) | 1831 | 27 | 1543 | 20 | 7 |
| Finland (FI) | 1734 | 24 | 784 | 21 | 3 |
| Turkey (TR) | 1729 | 33 | 1126 | 32 | 1 |
| Czech Republic (CZ) | 1685 | 24 | 1339 | 22 | 2 |
| United Arab Emirates (AE) | 1581 | 21 | 80 | 21 | 0 |
| Austria (AT) | 1580 | 25 | 815 | 22 | 3 |
| Singapore (SG) | 1423 | 22 | 319 | 21 | 1 |
Fig. 6The log growth rate and log frequency of mutations on S protein RBD in the United Kingdom. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 7The log growth rate and log frequency of mutations on S protein RBD in the United States. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 8The log growth rate and log frequency of mutations on S protein RBD in the Denmark. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 9The log growth rate and log frequency of mutations on S protein RBD in the Netherlands. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10The log growth rate and log frequency of mutations on S protein RBD in India. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 11The log growth rate and log frequency of mutations on S protein RBD in Singapore. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 12The log growth rate and log frequency of mutations on S protein RBD in Brazil. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Most significant mutations on S protein RBD of 31 countries with large sequencing data.
| Country | Most significant mutations |
|---|---|
| United Kingdom | N439K, S477N, S494P, and N501Y, |
| United States | A520S, N501Y, S494P, E484K, S477N, N501T, and L452R |
| Denmark | S477N, Y453F, S477R, N439K, and N501Y |
| Germany | N439K, S477N, and N501Y |
| Canada | R357K, E484K, and L452R |
| Netherlands | N501Y, K417N, E484K, F486L, S477N, N439K, and K417T |
| Sweden | E484K, S477N, N439K, N501Y, and K417N |
| Switzerland | N439K, S477N, N501Y, Q414K, N450K, L452R, and T478K |
| Australia | S477N, N501Y, L452R, L455F, N439K, and N501T |
| France | S477N, N439K, L452R, A522S, E484K, N501Y, and K417T |
| Belgium | N501Y, S477N, E484K, N450K, K417N, and K417T |
| Italy | N439K, S477N, L452R, E484K, N501Y, K417T, N440K, and Q414K |
| Spain | S477N, N501Y, S494P, and E484K |
| Ireland | N439K, N501Y, and E484K |
| Brazil | E484K, K417T, and N501Y |
| Iceland | S477N, N439K, and E406Q |
| India | N440K, A520S, P384L, S477N, S494P, L452R, E484Q, N501Y, and E484K |
| Luxembourg | S477N, N439K, and N501Y |
| Norway | N439K, S477N, A520S, and N501Y |
| Poland | N439K, S477N, A522S, N501Y, F494P |
| Mexico | L452R and T478K |
| Portugal | S477N, L452R, and N501Y |
| Latvia | E484K, N501Y, N439K, V367F, A522V, S494P, and K417N |
| Lithuania | V362F, N439K, N501Y, S477N, S490L, L452R, S477I, and E471Q |
| Slovenia | N439K, S477R, S477N, N501Y, K356R, and E484K |
| Finland | P384L, S477N, N439K, A352S, and N501Y |
| Turkey | S477N, N501Y, K417N, N501T, and E484K |
| Czech Republic | S459Y, N439K, S477N, N501Y, E484K, and K417N |
| United Arab Emirates | N501Y, N440K, S477N, N439K, E484K, and K417N |
| Austria | S477N, N439K and N501Y |
| Singapore | F490L, N440K, N439K, S477N, L452R, E484K, N501Y, and K417N |
Fig. 13The log growth rate and log frequency of mutations on S protein RBD in Mexico. The blue and red colors respectively represent the binding-strengthening and binding-weakening mutations on RBD. The darker blue/red means the binding-strengthening/binding-weakening mutations with a higher growth rate in a specific 10-day period. The darker purple represents the mutation with a higher log frequency. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)