Literature DB >> 35133792

Emerging Vaccine-Breakthrough SARS-CoV-2 Variants.

Rui Wang¹, Jiahui Chen¹, Yuta Hozumi¹, Changchuan Yin², Guo-Wei Wei^1,3,4.

Abstract

The surge of COVID-19 infections has been fueled by new SARS-CoV-2 variants, namely Alpha, Beta, Gamma, Delta, and so forth. The molecular mechanism underlying such surge is elusive due to the existence of 28 554 unique mutations, including 4 653 non-degenerate mutations on the spike protein. Understanding the molecular mechanism of SARS-CoV-2 transmission and evolution is a prerequisite to foresee the trend of emerging vaccine-breakthrough variants and the design of mutation-proof vaccines and monoclonal antibodies. We integrate the genotyping of 1 489 884 SARS-CoV-2 genomes, a library of 130 human antibodies, tens of thousands of mutational data, topological data analysis, and deep learning to reveal SARS-CoV-2 evolution mechanism and forecast emerging vaccine-breakthrough variants. We show that prevailing variants can be quantitatively explained by infectivity-strengthening and vaccine-escape (co-)mutations on the spike protein RBD due to natural selection and/or vaccination-induced evolutionary pressure. We illustrate that infectivity strengthening mutations were the main mechanism for viral evolution, while vaccine-escape mutations become a dominating viral evolutionary mechanism among highly vaccinated populations. We demonstrate that Lambda is as infectious as Delta but is more vaccine-resistant. We analyze emerging vaccine-breakthrough comutations in highly vaccinated countries, including the United Kingdom, the United States, Denmark, and so forth. Finally, we identify sets of comutations that have a high likelihood of massive growth: [A411S, L452R, T478K], [L452R, T478K, N501Y], [V401L, L452R, T478K], [K417N, L452R, T478K], [L452R, T478K, E484K, N501Y], and [P384L, K417N, E484K, N501Y]. We predict they can escape existing vaccines. We foresee an urgent need to develop new virus combating strategies.

Entities: Chemical

Keywords: COVID-19; SARS-CoV-2; comutations; infectivity; vaccine-breakthrough; vaccine-resistant

Year: 2022 PMID： 35133792 PMCID： PMC8848511 DOI： 10.1021/acsinfecdis.1c00557

Source DB: PubMed Journal: ACS Infect Dis ISSN： 2373-8227 Impact factor: 5.084

The death toll of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has exceeded 4.4 million in August 2021. Tremendous efforts in combating SARS-CoV-2 have led to several authorized vaccines, which mainly target the viral spike (S) proteins. However, the emergence of mutations on the S gene has resulted in more infectious variants and vaccine breakthrough infections. Emerging vaccine breakthrough SARS-CoV-2 variants pose a grand challenge to the long-term control and prevention of the COVID-19 pandemic. Therefore, forecasting emerging breakthrough SARS-CoV-2 variants is of paramount importance for the design of new mutation-proof vaccines and monoclonal antibodies (mABs). To predict emerging breakthrough SARS-CoV-2 variants, one must understand the molecular mechanism of viral transmission and evolution, which is one of the greatest challenges of our time. SARS-CoV-2 entry of a host cell depends on the binding between S protein and the host angiotensin-converting enzyme 2 (ACE2), primed by host transmembrane protease, serine 2 (TMPRSS2).[1] Such a process inaugurates the host’s adaptive immune response, and consequently antibodies are generated to combat the invading virus either through direct neutralization or non-neutralizing binding.[2,3] S protein receptor-binding domain (RBD) is a short immunogenic fragment that facilitates the S protein binding with ACE2. Epidemiological and biochemical studies have suggested that the binding free energy (BFE) between the S RBD and the ACE2 is proportional to the infectivity.[1,4−7] Additionally, the strong binding between the RBD and mAbs leads to effective direct neutralization.[8−10] Therefore, RBD mutations have dominating impacts on viral infectivity, mAb efficacy, and vaccine protection rates. Mutations may occur for various reasons, including random genetic drift, replication error, polymerase error, host immune responses, gene editing, and recombinations.[11−15] Being beneficial from the genetic proofreading mechanism regulated by NSP12 (a.k.a RNA-dependent RNA polymerase) and NSP14,[16,17] SARS-CoV-2 has a higher fidelity in its replication process than the other RNA viruses such as influenza. Nonetheless, near 700 non-degenerate mutations are observed on RBD, contributing many key mutations in emerging variants, that is, N501Y for Alpha, K417N, E484K, and N501Y for Beta, K417T, E484K, and N501Y for Gamma, L452R and T478K for Delta, L452Q and F490S for Lambda, and so forth.[18] Given the importance of the RBD for SARS-CoV-2 infectivity, vaccine efficacy, and mAb effectiveness, it is imperative to understand the mechanism governing RBD mutations. In June 2020, when there were only 89 nondegenerated mutations on the RBD and the highest observed mutational frequency was only around 50 globally, we were able to show that natural selection underpins SARS-CoV-2 evolution based on the genotyping of 24 715 SARS-CoV-2 sequences isolated patients and a topology-based deep learning model for RBD-ACE2 binding analysis.[19] In the same work, we predicted that RBD residues 452 and 501 “have high chances to mutate into significantly more infectious COVID-19 strains”.[19] Currently, these residues are the key mutational sites of all prevailing SARS-CoV-2 variants. We further foresaw a list of 1149 most likely RBD mutations among 3686 possible RBD mutations.[19] Up to date, every one of the observed 683 RBD mutations belongs to the list. In April 2021, we demonstrated that all of the 100 most observed RBD mutations of 651 existing RBD mutations from 506 768 viral genomes had enhanced the binding between RBD and ACE2, resulting in more infectious variants.[18] The odd for these 100 most observed mutations to be there accidentally is smaller than one chance in 1.2 nonillions (2100 ≈ 1.2× 1030). (Note: The average BFE change of 1149 RBD mutations for the RBD-ACE2 complex is −0.28 kcal/mol. Randomly, each RBD mutation has a 50% chance to assume a BFE change above or below −0.28 kcal/mol, which leads to 2100 = 1.276506 × 1030 possible states for 100 mutations.). There is no doubt that natural selection via viral infectivity, rather than any other competing theories,[11−15] is the dominating mechanism for SARS-CoV-2 transmission and evolution. This mechanistic discovery lays the foundation for forecasting future emerging SASR-CoV-2 variants. Understanding SARS-CoV-2 variant threats to current vaccines and mAbs is another urgent issue facing the scientific community.[20] The World Health Organization (WHO) identified variants of concern (VOCs) and variants of interest (VOIs). The former describes variants that have an increment in the transmissibility and virulence or adversely affect the effectiveness of vaccines, therapeutics, and diagnostics with clear clinical correlation evidence. The latter describes variants that carry genetic changes, which are predicted or known to reduce neutralization by antibodies generated against vaccination, the efficacy of treatments, and affect transmissibility, virulence, disease severity, immune escape, diagnostics, and so forth, which cause significant community transmission and suggest an emerging risk to the public. Currently, WHO listed four VOCs, that is, variants B.1.1.7 (Alpha),[21−23] B.1.351 (Beta),[22,24] P.1 (Gamma),[22] and B.1.617.2 (Delta)[25]), and four VOIs, that is, variants B.1.525 (Eta),[26] B.1.526 (Iota),[26,27] B.1.617.1 (Kappa),[28] C.37 (Lambda),[29] and B.1.621 (Mu) (a general introduction about the prevailing and emerging variants is given in Section S1 of the Supporting Information). Our hypothesis is that the severity of variants to infectivity, vaccine efficacy, and mAbs effectiveness depends mainly on how the associated RBD mutations impact the binding with ACE2 and antibodies. On the basis of this hypothesis, we collected and analyzed a library of antibodies and unveiled that most of the RBD mutations would weaken the binding of S protein and antibodies and disrupt the efficacy and reliability of antibody therapies and vaccines.[20] We predicted “the urgent need to develop new mutation-resistant vaccines and antibodies and prepare for seasonal vaccination” in early 2021.[20] We further identified vaccine-escape (i.e., vaccine-breakthrough) mutations and fast-growing mutations.[18] Our predictions of the threats from VOCs and VOIs were in great agreement with experimental data.[30] The objective of this work is to forecast emerging SARS-CoV-2 variants that pose an imminent threat to combating COVID-19 and long-term public health. To this end, we carry out an RBD-specific analysis of SARS-CoV-2 comutations involving a wide variety of combinations of 683 unique single mutations on the RBD. We take a unique approach that integrates viral genotyping of 1 489 884 complete genome sequences isolated from patients, algebraic topology algorithms that won the worldwide competition in computer-aided drug discovery,[31] deep learning models trained with tens of thousands of mutational data points,[20,30] and a library of 130 SARS-CoV-2 antibody structures. By analyzing the frequency, binding free energy (BFE) changes, and antibody disruption counts of RBD comutations, we reveal that nine RBD comutation sets, namely [L452R, T478K], [L452Q, F490S], [E484K, N501Y], [F490S, N501Y], [S494P, N501Y], [K417T, E484K, N501Y], [K417N, L452R, T478K], [K417N, E484K, N501Y], and [P384L, K417N, E484K, N501Y], may strongly disrupt existing vaccines and mAbs with relatively high infectivity and transmissibility among the populations. We predict that low-frequency comutation sets [A411S, L452R, T478K], [L452R, T478K, N501Y], [V401L, L452R, T478K], and [L452R, T478K, E484K, N501Y] are on the path to become dangerous new variants. The associated new mutations, P384L, V401L, and A411S, call for the new design of boosting vaccines and mAbs.

Results

Vaccine-Breakthrough S Protein RBD Mutations

To understand the molecular mechanisms of vaccine-escape mutations, we analyze single nucleotide polymorphisms (SNPs) of 1 489 884 complete SARS-CoV-2 genome sequences, resulting in 683 nondegenerate RBD mutations and their associated frequencies. A full set of mutation information is available on our interactive web page Mutation Tracker (https://users.math.msu.edu/users/weig/SARS-CoV-2_Mutation_Tracker.html, accessed August 5, 2021). The infectivity of each mutation is mainly determined by the mutation-induced BFE change to the binding complex of RBD and ACE2. To estimate the impact of each mutation on vaccines, we collect a library of 130 antibody structures (Supporting Information S2.1.2), including Food and Drug Administration (FDA)-approved mAbs from Eli Lilly and Regeneron. For a given RBD mutation, its number of antibody disruptions is given by the number of antibodies whose mutation-induced antibody-RBD BFE changes are smaller than −0.3 kcal/mol (a list of names for antibodies that are disrupted by mutations can be found in the Supporting Information S2.1.1). BFE changes following mutations are predicted by our deep learning model, TopNetTree.[32] We have created an interactive web page, Mutation Analyzer (https://weilab.math.msu.edu/MutationAnalyzer/, accessed August 5, 2021), to list all RBD mutations, their observed frequencies, their RBD-ACE2 BFE changes following mutations, their number of antibody disruptions, and various ranks. Figure illustrates RBD mutations associated with prevailing SARS-CoV-2 variants, time evolution trajectories of all RBD mutations, and the BFE changes of RBD-ACE2 and 130 RBD-antibodies induced by 75 significant mutations. A summary of our analysis is given in Table .

Figure 1

Table 1

Top 25 Most Observed S Protein RBD Mutationsa

	worldwide		BFE change		antibody disruption
mutation	count	rank	change	rank	count	ratio	rank
N501Y	744354	1	0.5499	30	24	18.46	160
L452R	259345	2	0.5752	28	39	30.0	98
T478K	239619	3	0.9994	2	2	1.54	557
E484K	84167	4	0.0946	272	38	29.23	104
K417T	37748	5	0.0116	433	37	28.46	107
S477N	32673	6	0.0180	422	0	0.0	650
N439K	16154	7	0.1792	159	11	8.46	272
K417N	8399	8	0.1661	176	53	40.77	61
F490S	5617	9	0.4406	52	51	39.23	67
S494P	5119	10	0.0902	282	62	47.69	46
N440K	3379	11	0.6161	22	0	0.0	645
E484Q	3229	12	0.0057	442	30	23.08	130
L452Q	2858	13	0.9802	3	27	20.77	144
A520S	2727	14	0.1495	199	3	2.31	497
N501T	2054	15	0.4514	48	17	13.08	202
R357K	1973	16	0.1393	208	5	3.85	388
A522S	1959	17	0.1283	221	2	1.54	543
R346K	1686	18	0.1234	229	6	4.62	380
V367F	1395	19	0.1764	161	0	0.0	637
N440S	1361	20	0.1499	197	2	1.54	542
P384L	1155	21	0.2681	105	18	13.85	199
Y449S	1146	22	–0.8112	632	85	65.38	16
D427N	1106	23	–0.1133	558	1	0.77	589
R346S	1037	24	0.0374	386	20	15.38	182
A475V	891	25	0.3069	94	10	7.69	289

Here, BFE change refers to the BFE change for the S protein and human ACE2 complex induced by a single-site S protein RBD mutation. A positive mutation-induced BFE change strengthens the binding between S protein and ACE2, which results in more infectious variants. Counts of antibody disruption represent the number of antibody and S protein complexes disrupted by a specific RBD mutation. Here, an antibody and S protein complex is to be disrupted if its binding affinity is reduced by more than 0.3 kcal/mol.[18] In addition, we calculate the antibody disruption ratio (%), which is the ratio of the number of disrupted antibody and S protein complexes over 130 known complexes. Ranks are computed from 683 observed RBD mutations.

Most significant RBD mutations. (a) The 3D structure of SARS-CoV-2 S protein RBD and ACE2 complex (PDB ID: 6M0J). The RBD mutations in 10 variants are marked with color. (b) Illustration of the time evolution of 455 ACE2 binding-strengthening RBD mutations (blue) and 228 ACE2 binding-weakening RBD mutations (red). The x-axis represents the date and the y-axis represents the natural log of frequency. There has been a surge in the number of infections since early 2021. (c) BFE changes of RBD complexes with ACE2 and 130 antibodies induced by 75 significant RBD mutations. A positive BFE change (blue) means the mutation strengthens the binding, while a negative BFE change (red) means the mutation weakens the binding. Most mutations, except for vaccine-resistant Y449H and Y449S, strengthen the RBD binding with ACE2. Y449S and K417N are highly disruptive to antibodies. Here, BFE change refers to the BFE change for the S protein and human ACE2 complex induced by a single-site S protein RBD mutation. A positive mutation-induced BFE change strengthens the binding between S protein and ACE2, which results in more infectious variants. Counts of antibody disruption represent the number of antibody and S protein complexes disrupted by a specific RBD mutation. Here, an antibody and S protein complex is to be disrupted if its binding affinity is reduced by more than 0.3 kcal/mol.[18] In addition, we calculate the antibody disruption ratio (%), which is the ratio of the number of disrupted antibody and S protein complexes over 130 known complexes. Ranks are computed from 683 observed RBD mutations. First, the 10 most observed or fast-growing RBD mutations are N501Y, L452R, T478K, E484K, K417T, S477N, N439K, K417N, F490S, and S494P, as shown in Table . Inclusively, these top mutations strengthen their BFEs and become more infectious, following the natural selection mechanism.[19]Figure b shows that the frequencies of the top three mutations increased dramatically since 2021 due to Alpha, Beta, Gamma, Delta, and other variants. Second, among the top 25 most observed RBD mutations, T478K, L452Q, N440K, L452R, N501Y, N501T, F490S, A475V, and P384L are the 8 most infectious ones judged by their ability to strengthen the binding with ACE2, as shown in Figure c. The BFE changes of S protein and ACE2 for mutation T478K is nearly 1.00 kcal/mol, which strongly enhances the binding of the RBD–ACE2 complex.[33] Together with L452R (BFE change: 0.58 kcal/mol), T478K makes Delta the most infectious variant in VOCs. Third, among the top 25 most observed RBD mutations, Y449S, S494P, K417N, F490S, L452R, E484K, K417T, E484Q, L452Q, and N501Y are the 10 most antibody disruptive ones, judged by their interactions with 130 antibodies shown in Figure c. It can be seen that mutations L452R, E484K, K417T, K417N, F490S, and S494P disrupt more than 30% of antibody-RBD complexes, while mutations E484K and K417T may disrupt nearly 30% antibody-RBD complexes, indicating their disruptive ability to the efficacy and reliability of antibody therapies and vaccines. The most dangerous mutations are the ones that are both infectivity-strengthening and antibody disruptive. Four RBD mutations, N501Y, L452R, F490S, and L452Q, appear in both lists and are key mutations in WHO’s VOC and VOI lists. Among them, F490S and L452Q are the key RBD mutations in Lambda making Lambda a more dangerous emerging variant than Delta. Note that high-frequency mutation S477N does not significantly weaken any antibody and RBD binding and thus does not appear in any prevailing variants.

Vaccine-Breakthrough S Protein RBD Comutations

The recent surge in COVID-19 infections is due to the occurrence of RBD comutations that combine two or more infectivity-strengthening mutations. The most dangerous future SARS-CoV-2 variants must be RBD comutations that combine infectivity-strengthening mutation(s) with antibody disruptive mutation(s). A list of 1 139 244 RBD comutations that are decoded from 1 489 884 complete SARS-CoV-2 genome sequences can be found in Section S2.1.3 of the Supporting Information, and all of the non-degenerate RBD comutations with their frequencies, antibody disruption counts, total BFE changes, and the first detection dates and countries can be found in Section S2.1.4 of the Supporting Information. Figure illustrates the properties of S protein RBD 2, 3, and 4 comutations. The height of each bar shows the predicted total BFE change of each set of comutations on RBD, the color represents the natural log of frequency for each set of RBD comutations, and the number at the top of each bar is the AI-predicted number of antibody-RBD complexes that each set of RBD comutations may disrupt based on a total of 130 RBD and antibody complexes. Notably, for a specific set of comutations the higher the number at the top of the bar is, the stronger ability to break through vaccines will be. From Figure , RBD 2 comutation set [L452R, T478K] (Delta variant) has the highest frequency (219 362) and the highest BFE change (1.575 kcal/mol). Moreover, the Delta variant would disrupt 40 antibody-RBD complexes, suggesting that Delta would not only enhance the infectivity but also be a vaccine breakthrough variant. Moreover, [L452Q, F490S] (Lambda) is another comutation with high frequency, high BFE changes (1.421 kcal/mol), and high antibody disruption count (59). In addition, Lambda is considered to be more dangerous than Delta due to its higher antibody disruption count. Further, [R346K, E484K, N501Y] (Mu variant) has a BFE change of 0.768 kcal/mol and high antibody disruption count (60). It is not as infectious as Delta and Lambda, but has a similar ability as Lambda in escaping vaccines. Note that among all VOCs and VOIs, Beta has the highest ability to break through vaccines, but its infectivity is relatively low (BFE change: 0.656 kcal/mol). Furthermore, high-frequency 2 comutation sets [E484K, N501Y], [F490S, N501Y], and [S494P, N501Y] are all considered to be the emerging variants that have the potential to escape vaccines. From Figure , 3 comutation sets [R345K, E484K, N501Y] (Mu), [K417T, E484K, N501Y] (Gamma), and [K417N, E484K, N501Y] (Beta) draw our attention. They are all the prevailing 3 comutations with moderate BFE changes but very high antibody disruption count (more than 60). With a BFE change of 1.4 kcal/mol and antibody disruption count of 82, comutation set [K417N, L452R, T478K] (Delta plus) appears to be more dangerous than all of the current VOCs and VOIs. For 4 comutations in Figure c, [P384L, K417N, E484K, N501Y] (Beta plus) could penetrate all vaccines due to its highest antibody disruption count of 101. We would like to address that all of the comutation sets, except for [Y449S, N501Y] in Figure , have positive BFE changes, following the natural selection. We anticipate that although comutation sets [V401L, L452R, T478K], [L452R, T478K, N501Y], [A411S, L452R, T478K], and [L452R, T478K, E484K, N501Y] have relatively low frequencies at this point, they may become dangerous variants soon due to their large BFE changes and antibody disruption counts.

Figure 2

Properties of RBD comutations. (a) Illustration of RBD 2 comutations with a frequency greater than 90. (b) Illustration of RBD 3 comutations with a frequency greater than 30. (c) Illustration of RBD 4 comutations with a frequency greater than 20. Here, the x-axis lists RBD comutations and the y-axis represents the predicted total BFE change between S RBD and ACE2 of each set of RBD comutations. The number on the top of each bar is the AI-predicted number of antibody and RBD complexes that may be significantly disrupted by the set of RBD comutations, and the color of each bar represents the natural log of frequency for each set of RBD comutations. (Please check the interactive HTML files in the Supporting Information S2.2.4 for a better view of these plots.) It is important to understand the general trend of SARS-CoV-2 evolution. To this end, we carry out the statistical analysis of RBD comutations. Among 1 489 884 SARS-CoV-2 genome isolates, a total of 1113 distinctive 2 comutations, 612 distinctive 3 comutations, and 217 distinctive 4 comutations are found. Figure a–c illustrate the 2D histograms of 2, 3, and 4 comutations, respectively. The x-axis is the number of antibody disruption counts, and the y-axis shows the total BFE change. Figure a shows that there are 82 RBD 2 comutations that have BFE changes in the range of [0.600, 0.799] kcal/mol and will disruptive 40 to 49 antibodies. According to Figure b, there are 170 unique 3 comutations that have large BFE changes of S protein and ACE2 in the range of [1.500, 1.999] kcal/mol. In Figure c, it is seen that almost all of the 4 comutations on RBD have the BFE changes greater than 0.5 kcal/mol and weaken the binding of S protein with at least 60 antibodies. Figure d–f are the histograms of total BFE changes, natural log of frequencies, and antibody disruption counts for RBD 2, 3, and 4 comutations. It can be found that most of the 2, 3, and 4 RBD comutations have positive total BFE changes, and the larger number of RBD comutations is, the higher number of antibody disruption count will be. In summary, comutations with a larger number of antibody disruptive counts and high BFE changes will grow faster. We anticipate that when most of the population is vaccinated, vaccine-resistant mutations will become a more viable mechanism for viral evolution.

Figure 3

(a) Two-dimensional histograms of antibody disruption count and total BFE changes for 2 comutations (unit: kcal/mol). (b) Two-dimensional histograms of antibody disruption count and total BFE changes (unit: kcal/mol) for RBD 3 comutations. (c) Two-dimensional histograms of antibody disruption count and total BFE changes (unit: kcal/mol) for RBD 4 comutations. (d) The histograms of total BFE changes (unit: kcal/mol) for RBD comutations. (e) The histograms of the natural log of frequency for RBD comutations. (f) The histograms of antibody disruption count for RBD comutations. In panels a–c, the color bar represents the number of comutations that fall into the restriction of x-axis and y-axis. The reader is referred to the web version of these plots in the Supporting Information S2.2.2 and S2.2.3.

Emerging Breakthrough Variants in COVID-19 Devastated Countries

Our analysis of RBD mutations reveals the recent global surge of infections due to RBD comutations. However, due to the difference in the rate of vaccination, COVID-19 control and prevention measures, medical infrastructure, population structures, and so forth, each country may have a different pattern of RBD comutations and follow a different trajectory of SARS-CoV-2 transmission and evolution. Therefore, we analyze the RBD 2, 3, and 4 comutations in 20 countries that have the high frequency of SARS-CoV-2 genome isolates, including the United Kingdom (UK), the United States (US), Denmark (DK), Brazil (BR), Germany (DE), Netherlands (NL), Sweden (SE), Italy (IT), Canada (CA), France (FR), India (IN), and Belgium (BE), as well as Ireland (IE), Spain (ES), Chile (CL), Portugal (PT), Mexico (MX), Singapore (SG), Turkey (TR), and Finland (FL). Figure shows the time evolution of 2, 3, and 4 comutations on the S protein RBD of SARS-CoV-2 from January 01, 2021, to July 31, 2021, in 12 COVID-19 devastated countries. The plots of the other eight countries can be found in the Supporting Information S3. The top five high-frequency comutations in each country are marked by red, blue, green, yellow, and pink lines. The cyan line is for the RBD comutation set [L452Q, F490S] on the Lambda variant, which is more penetrative to vaccines than the Delta. Light gray lines mark the other comutations. The RBD comutation set [L452R, T478K] (Delta) with 1.575 kcal/mol BFE change was first found in IN in early January 2021, and the number of this variant increases rapidly around the world in a short period. Later on, in early March 2021, the UK, US, DK, DE, NL, SE, IT, FR, BE reported the appearance of [L452R, T478K] in early March 2021, and eventually [L452R, T478K] became a dominant comutation, which is consistent to the finding that Delta variant remains largely susceptible to infection. The comutation set [K417T, E484K, N501Y] (Gamma) with BFE change of 0.656 kcal/mol was first found in Brazil in early January 2021 and then it became the most dominant comutation in Brazil and Canada, and the second dominant comutation in the US, NL, SE, IT, FR, IN, and BE. Notably, comutaion set [G446V, L452R, T478K] in the UK with BFE change of 1.733 kcal/mol and 46 antibody disruption counts appears to be a dangerous set of comutations that may affect the infectivity and vaccine/antibodies efficacy shortly. Moreover, comutation set [N501Y, A520S] has quickly increased IN and BE since April 16, 2021. Considering the BFE change and antibody disruptive count of comutation set [N501Y, A520S] is 0.699 and 27, we suggest monitoring this variant in IN and BE. Furthermore, the comutation set [K417N, T470N, E484K, N501T] that was first found in BR on April 06, 2020 has a BFE change of 0.625 kcal/mol and antibody disruption count 84 is an emerging vaccine breakthrough comutation in Brazil. In addition, comutation set [L452Q, F490S] (cyan lines) on Lambda variant was recently drawing much attention due to its potential ability to resist vaccines and enhance the infectivity, which is consistent with our predictions that comutation set [L452Q, F490S] has a relatively significant BFE change of S protein and ACE2 (1.421 kcal/mol) and would reduce the RBD binding with 59 antibodies. Lambda has already spread out in every country in Figure .

Figure 4

Illustration of the time evolution of 2, 3, and 4 comutations on the S protein RBD of SARS-CoV-2 from January 01, 2021, to July 31, 2021, in 12 COVID-19 devastated countries: the United Kingdom (UK), the United States (US), Denmark (DK), Brazil (BR), Germany (DE), Netherlands (NL), Sweden (SE), Italy (IT), Canada (CA), France (FR), India (IN), and Belgium (BE). The y-axis represents the natural log frequency of each RBD comutation. The top five high-frequency comutations in each country are marked by red, blue, green, yellow, and pink lines. The cyan line is for the RBD comutation [L452Q, F490S] on the Lambda variant, and the other comutations are marked by light gray lines. Notably, there are two blues lines in the panel of FR due to the same frequency of [K417N, E484K, N501Y] and [E484K, N501Y]. (Please check the interactive HTML files in the Supporting Information S2.2.1 for a better view of these plots.)

Discussion

Although our predictions achieve high correlation results with experimental data, some existing limitations may hinder us from speeding up the calculation or improving the performance. First, the number of complete SARS-CoV-2 sequences increases rapidly. Usually, it takes a few days to decode SNPs from hundreds of thousands of complete SARS-CoV-2 sequences. Second, we assume that the RBD mutations in our model are independent. Therefore, our predicted BFE changes for multiple RBD mutations are additive. This assumption is a good approximation for a few isolated RBD mutations. Most of the VOCs and VOIs involve no more than three isolated RBD comutations. However, Omicron variant has 15 RBD comutations, for which the validity of our method was examined elsewhere.[34] Typically, a 3D mutant structure of the binding complex is the key component to further improve the prediction accuracy for spatially correlated multiple comutations.

Methods

In this section, the work flow of deep learning-based BFE change predictions of protein–protein interactions induced by mutations for the present SARS-CoV-2 variant analysis and prediction will be first introduced, which includes four steps as shown in Figure : (1) Data preprocessing; (2) training data preparation; (3) feature generations of protein–protein interaction complexes; (4) prediction of protein–protein interactions by deep neural networks (see Section S5 in Supporting Information). Next, the validation of our machine learning-based model will be demonstrated, suggesting consistent and reliable results compared to the experimental deep mutations data.

Figure 5

(a) Illustration of genome sequence data preprocessing and BFE change predictions. (b) Comparison of experimental CT-P59 IC50 fold change (reduction)[35] and predicted BFE changes induced by mutations L452R and T478K. (c) Comparison of predicted BFE changes and relative luciferase units[25] for pseudovirus infection changes of ACE2 and S protein complex induced by mutations L452R and N501Y.

Data Preprocessing and SNP Genotyping

The first step is to preprocess the original SARS-CoV-2 sequences data. In this step, a total of 1 489 884 complete SARS-CoV-2 genome sequences with high coverage and exact collection date are downloaded from the GISAID database[36] (https://www.gisaid.org/) as of August 05, 2021. Complete SARS-CoV-2 genome sequences are available from the GISAID database.[36] Next, the 1 489 884 complete SARS-CoV-2 genome sequences were rearranged according to the reference genome downloaded from the GenBank (NC_045512.2),[37] and multiple sequence alignment (MSA) is applied by using Cluster Omega with default parameters. Then, single nucleotide polymorphism (SNP) genotyping is applied to measure the genetic variations between different isolates of SARS-CoV-2 by analyzing the rearranged sequences,[38,39] which is of paramount importance for tracking the genotype changes during the pandemic. The SNP genotyping captures all of the differences between patients’ sequences and the reference genome, which decodes a total of 28 478 unique single mutations from 1 489 884 complete SARS-CoV-2 genome sequences. Among them, 4653 non-degenerate mutations on S protein and 683 non-degenerate mutations on the S protein RBD (S protein residues from 329 to 530) are detected. In this work, the comutation analysis is more crucial than the unique single mutation analysis. Therefore, for each SARS-CoV-2 isolate, we extract the all of the mutations on S protein RBD, which is called an RBD comutation for a specific isolates. By doing this, a total of 1 139 244 RBD comutations are captured. Notably, the SARS-CoV-2 unique single mutations in the world is available at Mutation Tracker (https://users.math.msu.edu/users/weig/SARS-CoV-2_Mutation_Tracker.html, accessed August 5, 2021). The analysis of RBD mutations is available at Mutation Analyzer (https://weilab.math.msu.edu/MutationAnalyzer/, accessed August 5, 2021).

Methods for BFE Change Predictions

In this section, the process of the machine learning-based BFE change predictions is introduced. Once the data preprocessing and SNP genotyping is carried out, we will first proceed with the training data preparation process, which plays a key role in reliability and accuracy. A library of 130 antibodies and RBD complexes as well as an ACE2-RBD complex are obtained from Protein Data Bank (PDB). RBD mutation-induced BFE changes of these complexes are evaluated by the following machine learning model. Notably, the BFE changes ΔΔGBind = ΔGBindWT – ΔGBindMT, where ΔGBindWT is the BFE of the wild type (WT) of an S RBD-ACE2 or RBD-antibody complex, and ΔGBindMT is the BFE of the mutant type (MT) of an S RBD-ACE2 or RBD-antibody complex. According to the emergency and the rapid change of RNA virus, it is rare to have massive experimental BFE change data of SARS-CoV-2, while on the other hand next-generation sequencing data is relatively easy to collect. In the training process, the data set of BFE changes induced by mutations of the SKEMPI 2.0 data set[40] is used as the basic training set, while next-generation sequencing data sets are added as assistant training sets. The SKEMPI 2.0 contains 7085 single- and multipoint mutations and 4169 elements of that in 319 different protein complexes used for the machine learning model training. The mutational scanning data consists of experimental data of the binding of ACE2 and RBD induced mutations on ACE2[41] and RBD,[42,43] and the binding of CTC-445.2 and RBD with mutations on both protein.[43] Next, the feature generations of protein–protein interaction complexes are performed. The element-specific algebraic topological analysis on complex structures is implemented to generate topological bar codes.[30,44−46] In addition, biochemistry and biophysics features such as Coulomb interactions, surface areas, electrostatics, and so forth are combined with topological features.[20] The detailed information about the topology-based models will be demonstrated in Section 4.3. Lastly, deep neural networks for SARS-CoV-2 are constructed for the BFE change prediction of protein–protein interactions.[30] The detailed descriptions of data set and machine learning model are found in the literature[19,30,47] and are available at TopNetmAb (https://github.com/WeilabMSU/TopNetmAb, accessed August 5, 2021).[148] Moreover, it is noteworthy to mention that the total BFE changes are proportional to the transmissibility/infectivity of a given variant. Although the total BFE changes reported in this work are small (no more than 2 kcal/mol), they do affect the transmissibility a lot. Generally, by comparing infection levels in untreated cultures that are antibody treated, antiviral activity can be measured by a value called IC50 (the half-maximal inhibitory concentration).[48] The IC50 varied depending on the form of infection and cell lines used, indicating it can reveal the transmissibility. Notably, IC50 is approximately equal to dissociation constant (KD).[49] In addition, binding free energy ΔG is equal to RT ln(KD). Here, R is the gas constant with a value of 1.987 cal K–1 mol–1, and T is the temperature of the reaction in Kelvin.[50] Therefore, if ΔGBindMT is k times greater than ΔGBindWT, then IC50 of mutant type is e times greater than IC50 of wild type. In other words, the mutant variant is e times more transmissible than the original variant.

Feature Generation for Machine Learning Model

Among all features generated for machine learning prediction, the application of topology theory takes the model to a whole new level. Those summarized as other inputs are called auxiliary features and are described in Section S4 of the Supporting Information. In this section, a brief introduction about the theory of topology will be discussed. Algebraic topology[44,45] has achieved tremendous success in many fields including biochemical and biophysical properties.[46] Special treatment should be implemented for biology applications to describe element types and amino acids in polypeptides mathematically, which have element-specific and site-specific persistent homology.[19,32] To construct the algebraic topological features on protein–protein interaction model, a series of element subsets for complex structures should be defined, which considers atoms from the mutation sites, atoms in the neighborhood of the mutation site within a certain distance, atoms from antibody binding site, atoms from antigen binding site, and atoms in the system that belong to type of {C, N, O}, . Under the element/site-specific construction, simplicial complexes is constructed on point clouds formed by atoms. For example, a set of independent k + 1 points is from one element/site-specific set U = {u0, u1, ..., u}. The k-simplex σ is a convex hull of k + 1 independent points U, which is a convex combination of independent points. For example, a 0-simplex is a point and a 1-simplex is an edge. Thus, a m-face of the k-simplex with m + 1 vertices forms a convex hull in a lower dimension m < k and is a subset of the k + 1 vertices of a k-simplex, so that a sum of all its (k – 1) faces is the boundary of a k–simplex σ aswhere ⟨u0, ..., û, ..., u⟩ consists of all vertices of σ excluding u. The collection of finitely many simplices is a simplicial complex. In the model, the Vietoris-Rips (VR) complex (if and only if for j, j′ ∈ [0,k]) is for dimension 0 topology, and alpha complex (if and only if ) is for point cloud of dimensions 1 and 2 topology.[46] The k-chain c of a simplicial complex K is a formal sum of the k-simplices in K, which is c = ∑ασ, where α is coefficients and is chosen to be . Thus, the boundary operator on a k-chain c issuch that ∂: C → C and follows from that boundaries are boundaryless ∂ ∂ = Ø. A chain complex isas a sequence of complexes by boundary maps. Therefore, the Betti numbers are given as the ranks of kth homology group H as β = rank (H), where H = Z/B, k-cycle group Z and the k-boundary group B. The Betti numbers are the key for topological features, where β0 gives the number of connected components, such as number of atoms, β1 is the number of cycles in the complex structure, and β2 illustrates the number of cavities. This presents abstract properties of the 3D structure. Finally, only one simplicial complex could not give the whole picture of the protein–protein interaction structure. A filtration of a topology space is needed to extract more properties. A filtration is a nested sequence such thatEach element of the sequence could generate the Betti numbers {β0, β1, β2} and, consequentially, a series of Betti numbers in three dimensions is constructed and applied to be the topological fingerprints in Figure a.

Validations

The validation of our machine learning predictions for mutation-induced BFE changes compared to experimental data has been demonstrated in recently published papers.[20,30] First, we showed high correlations of experimental deep mutational enrichment data and predictions for the binding complex of SARS-CoV-2 S protein RBD and protein CTC-445.2[20] and the binding complex of SARS-CoV-2 RBD and ACE2.[30] In comparison with experimental data on the impacts of emerging variants on antibodies in clinical trials, our predictions achieve a Pearson correlation at 0.80.[30] Considering the BFE changes induced by RBD mutations for ACE2 and RBD complex, predictions on mutations L452R and N501Y have a highly similar trend with experimental data.[30] Meanwhile, as we presented in ref (18) high-frequency mutations are all having positive BFE changes. Moreover, for multimutation tests our BFE change predictions have the same pattern with experimental data of the impact of SARS-CoV-2 variants on major antibody therapeutic candidates, where the BFE changes are accumulative for comutations.[30] Recent studies on potency of mAb CT-P59 in vitro and in vivo against Delta variants[35] show that the neutralization of CT-P59 is reduced by L452R (13.22 ng/mL) and is retained against T478K (0.213 ng/mL). In our predictions,[30] L452R induces a negative BFE change (−2.39 kcal/mol), and T478K produces a positive BFE change (0.36 kcal/mol). In Figure b, the fold changes for experimental and predicted values are presented. Additionally, in Figure c a comparison of the experimental pseudovirus infection changes and predicted BFE change of ACE2 and S protein complex induced by mutations L452R and N501Y, where the experimental data is obtained in a reference to D614G and reported in relative luciferase units.[25] It indicates that the binding of RBD and ACE2 dominates the infectivity of SARS-CoV-2. More details can be found in Section S6 of Supporting Information. The SARS-CoV-2 SNP data in the world is available at Mutation Tracker (https://users.math.msu.edu/users/weig/SARS-CoV-2_Mutation_Tracker.html, accessed August 5, 2021). The most observed SARS-CoV-2 RBD mutations are available at Mutation Analyzer (https://weilab.math.msu.edu/MutationAnalyzer/, accessed August 5, 2021). The TopNetTree model is available at TopNetmAb (https://github.com/WeilabMSU/TopNetmAb, accessed August 5, 2021).[148]

22 in total

Review 1. Biological Properties of SARS-CoV-2 Variants: Epidemiological Impact and Clinical Consequences.

Authors: Reem Hoteit; Hadi M Yassine
Journal: Vaccines (Basel) Date: 2022-06-09

Review 2. Significant perspectives on various viral infections targeted antiviral drugs and vaccines including COVID-19 pandemicity.

Authors: Gandarvakottai Senthilkumar Arumugam; Kannan Damodharan; Mukesh Doble; Sathiah Thennarasu
Journal: Mol Biomed Date: 2022-07-15

Review 3. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

4. COVID-19 Pandemic-Revealed Consistencies and Inconsistencies in Healthcare: A Medical and Organizational View.

Authors: Diana Araja; Uldis Berkis; Modra Murovska
Journal: Healthcare (Basel) Date: 2022-05-31

5. Genomic Diversity of SARS-CoV-2 Omicron Variant in South American Countries.

Authors: Nicolas Luna; Marina Muñoz; Angie L Ramírez; Luz H Patiño; Sergio Andres Castañeda; Nathalia Ballesteros; Juan David Ramírez
Journal: Viruses Date: 2022-06-07 Impact factor: 5.818

6. Deep learning based on biologically interpretable genome representation predicts two types of human adaptation of SARS-CoV-2 variants.

Authors: Jing Li; Ya-Nan Wu; Sen Zhang; Xiao-Ping Kang; Tao Jiang
Journal: Brief Bioinform Date: 2022-05-13 Impact factor: 13.994

7. Mathematical artificial intelligence design of mutation-proof COVID-19 monoclonal antibodies.

Authors: Jiahui Chen; Guo-Wei Wei
Journal: ArXiv Date: 2022-04-20

8. Omicron BA.2 (B.1.1.529.2): high potential to becoming the next dominating variant.

Authors: Jiahui Chen; Guo-Wei Wei
Journal: Res Sq Date: 2022-02-23

9. Strategies to tackle SARS-CoV-2 Mu, a newly classified variant of interest likely to resist currently available COVID-19 vaccines.

Authors: Md Jamal Hossain; Ali A Rabaan; Abbas Al Mutair; Saad Alhumaid; Talha Bin Emran; G Saikumar; Saikat Mitra; Kuldeep Dhama
Journal: Hum Vaccin Immunother Date: 2022-02-16 Impact factor: 3.452

Review 10. Bovine-derived antibodies and camelid-derived nanobodies as biotherapeutic weapons against SARS-CoV-2 and its variants: A review article.

Authors: AbdulRahman A Saied; Asmaa A Metwally; Moses Alobo; Jaffer Shah; Khan Sharun; Kuldeep Dhama
Journal: Int J Surg Date: 2022-01-19 Impact factor: 6.071