Danilo Rosa Nunes1, Carla Torres Braconi1, Louisa F Ludwig-Begall2, Clarice Weis Arns3, Ricardo Durães-Carvalho1. 1. Department of Microbiology, Immunology and Parasitology, Paulista School of Medicine, Federal University of São Paulo, São Paulo, SP, Brazil. 2. Department of Infectious and Parasitic Diseases, Veterinary Virology and Animal Viral Diseases, FARAH Research Centre, Faculty of Veterinary Medicine, University of Liège, Liège, Belgium. 3. Laboratory of Virology, University of Campinas, Campinas, SP, Brazil.
Abstract
Nearly two decades after the last epidemic caused by a severe acute respiratory syndrome coronavirus (SARS-CoV), newly emerged SARS-CoV-2 quickly spread in 2020 and precipitated an ongoing global public health crisis. Both the continuous accumulation of point mutations, owed to the naturally imposed genomic plasticity of SARS-CoV-2 evolutionary processes, as well as viral spread over time, allow this RNA virus to gain new genetic identities, spawn novel variants and enhance its potential for immune evasion. Here, through an in-depth phylogenetic clustering analysis of upwards of 200,000 whole-genome sequences, we reveal the presence of previously unreported and hitherto unidentified mutations and recombination breakpoints in Variants of Concern (VOC) and Variants of Interest (VOI) from Brazil, India (Beta, Eta and Kappa) and the USA (Beta, Eta and Lambda). Additionally, we identify sites with shared mutations under directional evolution in the SARS-CoV-2 Spike-encoding protein of VOC and VOI, tracing a heretofore-undescribed correlation with viral spread in South America, India and the USA. Our evidence-based analysis provides well-supported evidence of similar pathways of evolution for such mutations in all SARS-CoV-2 variants and sub-lineages. This raises two pivotal points: (i) the co-circulation of variants and sub-lineages in close evolutionary environments, which sheds light onto their trajectories into convergent and directional evolution, and (ii) a linear perspective into the prospective vaccine efficacy against different SARS-CoV-2 strains.
Nearly two decades after the last epidemic caused by a severe acute respiratory syndrome coronavirus (SARS-CoV), newly emerged SARS-CoV-2 quickly spread in 2020 and precipitated an ongoing global public health crisis. Both the continuous accumulation of point mutations, owed to the naturally imposed genomic plasticity of SARS-CoV-2 evolutionary processes, as well as viral spread over time, allow this RNA virus to gain new genetic identities, spawn novel variants and enhance its potential for immune evasion. Here, through an in-depth phylogenetic clustering analysis of upwards of 200,000 whole-genome sequences, we reveal the presence of previously unreported and hitherto unidentified mutations and recombination breakpoints in Variants of Concern (VOC) and Variants of Interest (VOI) from Brazil, India (Beta, Eta and Kappa) and the USA (Beta, Eta and Lambda). Additionally, we identify sites with shared mutations under directional evolution in the SARS-CoV-2 Spike-encoding protein of VOC and VOI, tracing a heretofore-undescribed correlation with viral spread in South America, India and the USA. Our evidence-based analysis provides well-supported evidence of similar pathways of evolution for such mutations in all SARS-CoV-2 variants and sub-lineages. This raises two pivotal points: (i) the co-circulation of variants and sub-lineages in close evolutionary environments, which sheds light onto their trajectories into convergent and directional evolution, and (ii) a linear perspective into the prospective vaccine efficacy against different SARS-CoV-2 strains.
In the last two decades, human health has been threatened by the emergence of three important zoonotic and pathogenic betacoronaviruses, namely the severe acute respiratory syndrome coronavirus (SARS-CoV) [1], the Middle East respiratory syndrome coronavirus (MERS-CoV) [2] and, most recently, the causative agent of the Coronavirus Disease 2019 (COVID-19) pandemic, SARS-CoV-2 [3]. Likely originated from bats, pandemic SARS-CoV-2, like other endemic human alpha- (NL63 and 229E) and beta- (OC43 and HKU1) CoVs known for causing upper respiratory tract infections, overcame the interspecies barrier as a result of spillover and/or recombination events, and gained a pervasive ability to rapidly infect and spread around the globe [4-6].The COVID-19 pandemic precipitated an intense genomic surveillance via data depositories and sequencing platforms and led to an unprecedented accumulation of public genomic data concerning a human pathogenic virus [5, 7]. The sheer amount of available sequencing data has the potential to facilitate higher-precision micro-evolutionary analyses mapping escape and point mutations in presumed positively selected sites and residues putatively associated to an increased virus fitness and pathogenesis and allows inferences concerning the dynamics of SARS-CoV-2 spread [8, 9].Although the analysis of micro-evolutionary mechanisms is of paramount importance and may provide powerful information to promote the prediction of vaccination perspectives and the tracing of SARS-CoV-2 epidemiological chains, there is as yet a lack of data-based investigations examining the presence of eventual shared mutations and their evolutionary characteristics in classified SARS-CoV-2 Variants of Concern (VOC) and Variants of Interest (VOI) [10, 11].Given the importance of monitoring mutations to track the emergence of novel variants, here we investigate the influence of directional selection and the dynamics of SARS-CoV-2 genomic plasticity in VOC and VOI by clustering partition high-scale phylogenetic and directional evolution (DEPS) approaches. Additionally, we show the presence of several mutations common for both VOI/VOC and convergently emerged sub-lineages, and provide a perspective of possible effects on the vaccination efficacy and the ongoing COVID-19 pandemic.
Methods
Sequence data and filtering strategy
High-coverage and complete HCoV-229E and HCoV-NL63 (alpha-CoVs), HCoV-OC43, HCoV-HKU1, MERS-CoV, SARS-CoV and SARS-CoV-2 VOC and VOI (beta-CoVs) genome sequences (≥ 29,000 bp), sampled from humans, were retrieved from the Global Initiative on Sharing Avian Influenza Data-EpiCoV (GISAID-EpiCoV) and GenBank databases at three different times, each date representing a snapshot of the COVID-19 spread at that time: February 12th (MERS-CoV, SARS-CoV and SARS-CoV-2), July 12th (HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1 and SARS-CoV-2) and updated in August 26th 2021 (SARS-CoV-2), totalling 238,990 sequences. With regards to SARS-CoV-2, we particularly focused on strains of countries from South America, China, India, and the United States of America (USA). At the time of analysis, India, the USA, and Brazil had reported the largest numbers of cumulative confirmed COVID-19 cases and deaths. This approach was used to compare putative mutual sites and residue changes under directional evolution over time [12].Subsequently, sequences were filtered via Sequence Cleaner, a biopython-based program, utilising the following script: sequence_cleaner -q INPUT_DIRECTORY -o OUTPUT_DIRECTORY -ml 29,000 (MINIMUM_LENGTH) -mn 0 (PERCENTAGE_N)—remove_ambiguous. The outcome was a set of unambiguous sequences equal to and greater than 29,000 pb with zero percent of unknown nucleotides. Next, the datasets were aligned by adding coding-sequences related to references for HCoV-229E (NC_002645.1), HCoV-NL63 (NC_005831.2), HCoV-OC43 (NC_006213.1), HCoV-HKU1 (NC_006577.2), MERS-CoV (NC_038294.1), SARS-CoV (NC_004718.3), and SARS-CoV-2 (NC_045512.2), using default settings, with the rapid calculation of full-length multiple sequence alignment of closely-related viral genomes (MAFFT v.7 web-version program; https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html) and were edited by the UGENE v.38.1 [13].
Clustering and sub-clustering analysis
A methodological approach to extract large-scale phylogenetic partitions was applied to identify transmission cluster chains on the large Maximum Likelihood (ML) phylogenetic trees of the SARS-CoV-2 variants circulating in South America, China, India and the USA on the basis of a depth-first search algorithm which unifies evaluation of node reliability, tree topology and patristic distance [14]. In this case, different datasets from each particular SARS-CoV-2 scenario were used in order to extract the clustering and sub-clustering data. Therefore, each ML tree was implemented in FastTree v.2.1.7 by using the standard implementation General Time Reversible (GTR) plus CAT with 20 gamma distribution parameters and a mix of Nearest-Neighbor Interchanges (NNI) and Sub-Tree-Prune-Regraft (SPR) [15]. Thereafter, in view of identifying SARS-CoV-2 cluster transmission events, for datasets comprising more than 100 sequences, we first selected sequences (one per cluster) from nodes/sub-trees with ≥ 2 distinct individuals and ≥ 90% reliability of statistical support (Shimodaira-Hasegawa test), where initially the patristic distance was adjusted to find a representative number of clusters (n = 100, which represent 100 sequences) from each large reconstructed ML tree. In addition to this strategy, a second approach included sub-clustering analysis as an indirect way to infer and investigate the possibility of co-circulating sub-lineages. For this, we selected sequences (two per cluster) with ≥ 95% node reliability of statistical support from a threshold of 0.05, thus corresponding to the 5th percentile when considering the whole-tree patristic distance distribution.
Recombination and directional evolution analyses
Before proceeding to directional evolution analysis, sequence datasets from the output provided by the clustering analyses were submitted to the Genetic Algorithm for Recombination Detection (GARD), a likelihood-based tool to pinpoint recombination breakpoints [16]. To check the outcome of the strategy described above, an additional test was conducted using the Pairwise Homoplasy Index (PHI; default settings) [17]. Evidence-based analysis through phylogenetic maximum-likelihood was then performed implementing the Datamonkey web-server and the program Hyphy v.2.5 to track directional selection in amino acid sequences (DEPS) [18]. The DEPS method identifies both the residue and sites evolving toward it with great accuracy and detects frequency-dependent selection-scenarios as well as selective sweeps and convergent evolution that can confound most existing tests [8]. Further, the DEPS method has shown better performance than (traditional) substitution rate-based analyses (dN/dS) in detecting transient and frequency-dependent selection and directionally evolving sites and residues. For the most part, a Beta-Gamma site-to-site rate variation was used to conduct the analysis. The best-fit protein substitution model was chosen according to the corrected Akaike Information Criterion (cAIC). Only target sites and residues with Empirical Bayes Factors for evidence in favour of a directional selection model equal to or greater than 100 were considered for further exploration. Certain datasets, randomly chosen, were run multiple times (more than eight) to confirm obtained results.
Statistical analysis
Data pertaining to SARS-CoV and MERS-CoV-related cases and deaths were extracted from the National Health Service (NHS, UK) (https://www.nhs.uk/conditions/sars/) and European Centre for Disease Prevention and Control (ECDC) (https://www.ecdc.europa.eu/en/publications-data/distribution-confirmed-cases-mers-cov-place-infection-and-month-onset-1), respectively. Information concerning SARS-CoV-2 was collected from the World Health Organization (WHO) (https://covid19.who.int/). Population demographic data were retrieved from the Our World in Data website (https://ourworldindata.org/grapher/world-population-by-world-regions-post 1820?tab = table&country = Oceania~~North+America~Europe~Africa~Asia).Statistical analyses were performed using one-way analysis of variance (ANOVA) and nonparametric methods followed by post hoc Kruskal-Wallis and Friedman (both with Dunn’s Multiple Comparison), and Bartlett’s tests (Tukey’s, Newman-Keuls and Bonferroni’s multiple comparisons). Additionally, Mann Whitney and Wilcoxon matched-pairs signed-rank (T test) and Pearson/Spearman (Correlation), all one-tailed methods with 99% confidence interval (CI), were run. P-values equal to or less than 0.005 (p ≤ 0.005, SARS-CoV-2 from South America: DEPS [sites and/or residues] vs infections) and 0.05 (p ≤ 0.05, SARS-CoV-2 from Brazil, China, India and the USA: DEPS [residues] vs circulating variants and infections) were considered as statistically significant. Data analyses were carried out using GraphPad Prism v. 5.01 (GraphPad Software, San Diego, California, USA). Figures and data visualization were performed using the ggplot2 v.3.3.5 package in the R (RStudio v.1.4.1717) language environment. Final graphics were edited with the open-source software drawing tool Inkscape v.1.0.2.
Results and discussion
Recombination is known to be a crucial evolutionary process for many RNA viruses [19-21]; the process is frequently observed in the Coronaviridae family where recombination is likely facilitated by discontinuous transcription involving jumps of the replication-transcription complex during minus strand RNA synthesis. However, the consequences of recombination events occurring in the context of the current SARS-CoV-2 VOC/VOI evolutionary landscape are still speculative [22, 23]. Here, we address this knowledge gap, revealing the presence of recombination and shared mutations in the SARS-CoV-2 Spike-encoding protein, demonstrating them to be under directional and convergent evolution amongst SARS-CoV-2 VOC/VOI and sub-lineages, and tracing an interconnection with viral spread. First, endemic and epidemic human coronaviruses (HCoVs) were compared to identify similar evolutionary patterns that could help clarify the evolution of SARS-CoV-2. An initial recombination breakpoint analysis showed that four of six HCoVs analyzed presented such signals (Fig 1A). Endemic viruses OC43, NL63 and HKU1 also showed a similar pattern of residue accumulation and directional evolution, despite these viruses being subject to differing selective pressures [24].
Fig 1
Directionally-evolving sites and residues in alpha- (229E and NL63) and beta- (OC43, HKU1, SARS-CoV, MERS-CoV and SARS-CoV-2) coronavirus sequences.
(A) Directed-evolving residues in six different endemic and epidemic human coronaviruses (HCoVs), (B) Sites and residues vs infections in epidemic and pandemic coronaviruses (CoVs). (C) Linear-regression curve and co-relationship on the absolute amount of sites and/or residues under directional-positive selection given SARS-CoV-2 infections per 100,000 people from South America countries and (D) the total number of SARS-CoV-2 infections in Brazil, China, India and the USA. In panels A, B and D, the symbol ≠ represents the presence of recombination breakpoints signals. In panel B, on the upper left, different intensities in grayscale are directly related to the number of infections. In panel C, the number inside the circle represents the amount of clusters found in Brazil and Chile. In panel D, n represents the amount of SARS-CoV-2 variants and the numbers in parentheses indicate sites under directional and convergent evolution in the Spike-encoding protein. P-values were obtained by considering the DEPS [sites and/or residues] vs infections per 100.000 inhabitants from one-way analysis of variance (ANOVA) and nonparametric methods, as described in the Statistical analysis section. Colors and symbols used in the panels are defined in the legend to the right of the figure.
Directionally-evolving sites and residues in alpha- (229E and NL63) and beta- (OC43, HKU1, SARS-CoV, MERS-CoV and SARS-CoV-2) coronavirus sequences.
(A) Directed-evolving residues in six different endemic and epidemic human coronaviruses (HCoVs), (B) Sites and residues vs infections in epidemic and pandemic coronaviruses (CoVs). (C) Linear-regression curve and co-relationship on the absolute amount of sites and/or residues under directional-positive selection given SARS-CoV-2 infections per 100,000 people from South America countries and (D) the total number of SARS-CoV-2 infections in Brazil, China, India and the USA. In panels A, B and D, the symbol ≠ represents the presence of recombination breakpoints signals. In panel B, on the upper left, different intensities in grayscale are directly related to the number of infections. In panel C, the number inside the circle represents the amount of clusters found in Brazil and Chile. In panel D, n represents the amount of SARS-CoV-2 variants and the numbers in parentheses indicate sites under directional and convergent evolution in the Spike-encoding protein. P-values were obtained by considering the DEPS [sites and/or residues] vs infections per 100.000 inhabitants from one-way analysis of variance (ANOVA) and nonparametric methods, as described in the Statistical analysis section. Colors and symbols used in the panels are defined in the legend to the right of the figure.A subsequent comparison of SARS-CoV-2 to the other two pathogenic HCoVs (SARS-CoV and MERS-CoV), highlighted differences in the number of directionally-evolving sites and residues (Fig 1B). These patterns, putatively reflecting the initial evolutionary paths of the individual viruses, may suggest that SARS-CoV was initially under lower positive evolutionary pressure than MERS-CoV and SARS-CoV-2. Nevertheless, deletions and mutations acquired by SARS-CoV have been shown to have had an impact on adaptation to human-to-human transmission, modifying both the capacity for viral proliferation and profiles of pathogenesis [25-27].The evolution of SARS-CoV-2 was initially marked by genetic drift in a typical process of neutral evolution [28, 29]; the virus reached a large number of new and susceptible hosts and, although some mutations appeared along the genome, there was no significant shift [30]. However, as SARS-CoV-2 rapidly spread worldwide [31, 32], fitness changes resulting from mutations in the viral genome as well as the emergence of new variants were increasingly reported [33-35].The first epidemic wave of SARS-CoV-2 severely affected most countries in South America as a probable result of multiple viral introductions [36]; rapid increases of case numbers were especially reported in Brazil, the largest and most populous country in Latin America [37, 38]. The uncontrolled viral spread created a favorable scenario for the emergence of new variants [39-42]. To identify the impact of directional-positive selection sites at the rate of infections under these particular conditions, we traced the evolutionary scenario of SARS-CoV-2 in South America (via analysis of a significant and representative amount of genome sequences).Remarkably, our data showed that an increase of DEPS was correlated with viral spread dynamics, with Brazil exhibiting a lower proportion of COVID-19 cases (per 100,000 inhabitants) when compared to French Guiana and the same amount of SARS-CoV-2 clusters inferred in Chile (n = 97) (Fig 1C), probably due to a higher diversity of circulating viruses. Our results also highlighted a series of mutations; while certain mutations have previously been described, but have hitherto remained unidentified in SARS-CoV-2 VOC and VOI, multiple further mutations are identified for the first time in this study, to our knowledge (Table 1).
Table 1
Mutational landscape of SARS-CoV-2 Spike protein VOC and VOI based on the WHO label.
Sequences collection date
Inferred substitutions (Spike location)
SARS-CoV-2 carrying this mutation (from WHO)
Additional SARS-CoV-2 variants carrying this mutation (from this study)
Empirical Bayes Factors
Time interval (months)
Obs.: new mutations are underlined
Convergent evolution: ●
L18F ● (NTD)
Beta and Gamma
Alpha
>105
T20N ● (NTD)
Gamma
Alpha
>105
P26S/P26L* ● (NTD)
Gamma
Alpha and Epsilon/Zeta*
2129.2
D138H/D138Y* ● (NTD)
Gamma
Alpha and Epsilon/Delta*
>105
Feb 12th 2021
R190S ● (NTD)
Gamma
Delta
9869.0
K417T ● (RBD)
Gamma
-
1966.6
E484K/E484Q ● (RBD)
Beta, Gamma, Eta, Iota, Mu, Theta and Zeta
-
>105
N501Y ● (RBD)
Alpha, Beta, Gamma, Omicron, Mu and Theta
Eta, Kappa and Lambda
>105
T1027I ● (CH)
Gamma
-
>105
5
S13I ● (SP-NTD)
Epsilon
Alpha
166.9
R21I/R21T* ● (NTD)
-
Gamma/Epsilon*
168.0
R34L/R34P* (NTD)
-
Unsigned/Eta*
262.6
S50L (NTD)
-
unsigned
124.0
L54F● (NTD)
-
Gamma
>105
W152L*/W152C (NTD)
Epsilon
Gamma*
226.2
Jul 12th 2021 Obs.: without Delta variant
S255F● (NTD)
-
Gamma, Delta and Kappa
160.0
N501Y ● (RBD)
Alpha, Beta, Gamma, Omicron, Mu and Theta
Eta, Kappa and Lambda
428.8
A570D ● (CT1)
Alpha
Eta, Kappa and Lambda
>105
P681H* (CT2)
Alpha, Omicron, Mu and Theta
Gamma* and Lambda*
>105
A688V● (S1/S2)
-
Alpha, Gamma and Zeta
113.4
T716I ● (S1/S2)
Alpha
Epsilon
2054.1
D1118H/D1118Y* (CD1)
Alpha
Lambda/Zeta*
>105
C1235F● (CTail)
-
unsigned
317.0
0
S13I● (SP-NTD)
Epsilon
Alpha
137.1
T19R/T19I* ● (NTD)
Delta
Eta*
>105
R21I/R21T* ● (NTD)
-
Gamma/Epsilon*
138.2
R34L (NTD)
-
unsigned
241.4
L54F● (NTD)
-
Gamma
327.6
G142D/G142S* ● (NTD)
Delta and Kappa
Zeta*
>105
W152L (NTD)
-
Gamma
201.2
R237M● (NTD)
-
unsigned
418.7
Jul 12th 2021 Obs.: with Delta variant
L452R/L452Q ● (RBD)
Delta, Epsilon, Iota, Lambda and Kappa
-
>105
T478K ● (RBD)
Delta and Omicron
-
>105
E484K (RBD)
Beta, Gamma, Eta, Iota, Mu, Theta and Zeta
-
>105
N501Y ● (RBD)
Alpha, Beta, Gamma, Omicron, Mu and Theta
Eta, Kappa and Lambda
>105
A570D ● (CT1)
Alpha
Eta, Kappa and Lambda
>105
D574Y● (CT1)
-
unsigned
236.3
P681R/P681H* ● (CT2)
Alpha, Delta, Omicron, Kappa, Mu and Theta
Gamma* and Lambda*
>105
T716I ● (S1/S2)
Alpha
Epsilon
1013.4
D936Y● (HR1)
-
Gamma and Kappa
236.6
S982A ● (HR1)
Alpha
-
9852.5
D1118H ● (CD1)
Alpha
Lambda
>105
D1163G● (HR2)
-
Gamma
237.4
Obs.: R158- and G142- deletions were also found in the Delta and Theta SARS-CoV-2 variants, respectively. NTD, N-terminal domain; RBD, receptor binding domain; CD1, connector domain 1; CH, center helix; CT1, C-terminal domain 1; CT2, C-terminal domain 2; CTail, cytoplasmic tail; HR1, heptad repeat 1; HR2, heptad repeat 2; S1/S2, cleavage site and SP-NTD, Signal peptide-N-terminal domain. The asterisks (*) represent mutations also found in such particular variant(s). Sources: CDC (https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html), ECDC (https://www.ecdc.europa.eu/en/covid-19/variants-concern) and Stanford Coronavirus Antiviral & Resistance Database (CoVDB) (https://covdb.stanford.edu/page/mutation-viewer/).
Obs.: R158- and G142- deletions were also found in the Delta and Theta SARS-CoV-2 variants, respectively. NTD, N-terminal domain; RBD, receptor binding domain; CD1, connector domain 1; CH, center helix; CT1, C-terminal domain 1; CT2, C-terminal domain 2; CTail, cytoplasmic tail; HR1, heptad repeat 1; HR2, heptad repeat 2; S1/S2, cleavage site and SP-NTD, Signal peptide-N-terminal domain. The asterisks (*) represent mutations also found in such particular variant(s). Sources: CDC (https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html), ECDC (https://www.ecdc.europa.eu/en/covid-19/variants-concern) and Stanford Coronavirus Antiviral & Resistance Database (CoVDB) (https://covdb.stanford.edu/page/mutation-viewer/).Spike mutations such as E484K, N501Y, L452R, S13I and W152C, seem to be fundamentally important in the process of adaptation of SARS-CoV-2 to human hosts, this by enhancing the affinity to the human ACE2 receptor and mediating immune system evasion [43-45]. Our analyzes allowed us to follow SARS-CoV-2 spread dynamics over time in Brazil, showing an increasing number of sites under DEPS, primarily in the Spike-encoding protein. Nine sites are highlighted prior to February 2021, followed by fourteen sites until July 2021 (SARS-CoV-2 Delta variant not included). With the introduction of the Delta variant, both the presence of recombination signals as well as an increase of sites under DEPS were detected (Table 1), allowing for inferences concerning a SARS-CoV-2 reproductive number increase. An increase in virus circulation augments the chance of viral coinfection, which in turn (and as a prerequisite for recombination), can heighten the risk of emergence of new variants [46, 47].The Delta variant, first identified in late 2020 in India as B.1.617.2 [48], harbors a constellation of non-synonymous mutations in the Spike protein [49] and had become the leading VOC worldwide by the end of July 2021, this VOC accounted for 90% of all sequenced samples [50, 51]. Brazil, India and the USA, the countries most severely affected by the pandemic, are now once again threatened by this highly contagious variant. Analysis of the molecular evolution of SARS-CoV-2 taking into account the influence of local demography in these specific scenarios has the potential to generate important insights into the spread and infection dynamics of this pathogen.Using SARS-CoV-2 sequences from China (the most populated country in the world and also a country where control measures against COVID-19 infections have been deployed very effectively) as reference, we analyzed all datasets from Brazil, India, and the USA via a large-scale phylogenetic partitions analysis. Increases in SARS-CoV-2 infections were observed to be proportional to locally circulating variants and were not (in the scenarios analyzed), correlated with any particular demography (Fig 1D); this indirectly reinforces the importance of measures implemented to avoid viral propagation. Analysis of phylogenetic partition clusters along the length of the circa 30 kb CoV genome evidenced several directionally-evolving sites under convergent evolution. Thus, a possible association between the number of infections from locally circulating SARS-CoV-2 variants carrying distinct residue profiles as well as sites in the Spike-encoding protein under DEPS may be established (Fig 1D). Interestingly, this supports a hypothesis of convergent evolution due to repeated and multiple site-specific substitutions in distinct SARS-CoV-2 VOC and VOI (see Table 1 and Fig 2).
Fig 2
Compiled data Table 1 and Fig 1D Venn diagram showing the shared mutations in distinct SARS-CoV-2 Variants of Concern (VOC) (A) as well as the additional Variants of Interest (VOI) (B) carrying such substitutions. The diagram was created through VIB web tool https://bioinformatics.psb.ugent.be/webtools/Venn/.
Compiled data Table 1 and Fig 1D Venn diagram showing the shared mutations in distinct SARS-CoV-2 Variants of Concern (VOC) (A) as well as the additional Variants of Interest (VOI) (B) carrying such substitutions. The diagram was created through VIB web tool https://bioinformatics.psb.ugent.be/webtools/Venn/.Additionally, we also inferred the possible appearance of SARS-CoV-2 sub-lineages and traced the influence of an environment favoring directional evolution acting on SARS-CoV-2 variants. We showed different patterns among sites in the VOC and VOI, with a particular emphasis on the Kappa VOI currently circulating in the USA. We further demonstrated recombination among SARS-CoV-2 VOC and VOI from India (Beta, Eta and Kappa) and the USA (Beta, Eta and Lambda) (Fig 3A).
Fig 3
Transmission clustering and sub-clustering analyses on SARS-CoV-2 Variants of Concern (VOC) and Variants of Interest (VOI) sequences.
(A) Directionally evolving sites in the VOC (dashed line) and VOI (unbroken line) sampled from the model-based phylogenetic Maximum Likelihood (ML) method and (B) the number of sub-clusters inferred in strains circulating in Brazil, China, India and the USA. Each color represents a particular country. RBS stands for recombination breakpoints signal (≠) and the scale bar shows the proportion of ten sites under DEPS (A). The numbers in parentheses indicate Spike-encoding protein sites under convergent evolution (B). Obs.: (1) In panel A, the majority of these sites, distributed across the genome, are also under convergent evolution and (2) the data extracted from the India’s SARS-CoV-2 Eta dataset did not show sites under DEPS, but did exhibit signals of recombination (see Data Availability).
Transmission clustering and sub-clustering analyses on SARS-CoV-2 Variants of Concern (VOC) and Variants of Interest (VOI) sequences.
(A) Directionally evolving sites in the VOC (dashed line) and VOI (unbroken line) sampled from the model-based phylogenetic Maximum Likelihood (ML) method and (B) the number of sub-clusters inferred in strains circulating in Brazil, China, India and the USA. Each color represents a particular country. RBS stands for recombination breakpoints signal (≠) and the scale bar shows the proportion of ten sites under DEPS (A). The numbers in parentheses indicate Spike-encoding protein sites under convergent evolution (B). Obs.: (1) In panel A, the majority of these sites, distributed across the genome, are also under convergent evolution and (2) the data extracted from the India’s SARS-CoV-2 Eta dataset did not show sites under DEPS, but did exhibit signals of recombination (see Data Availability).As one of the first countries in the world to develop efficient immunizations and implement a vaccination policy, the USA vaccinated more than 30% of its population by April 2021 [52, 53]. By September 2021, 60% of the booster-immunized population possessed neutralizing antibodies against several viral variants [54]. Similar outcomes were observed following widespread vaccination with various SARS-CoV-2 vaccines (different technologies leveraged for vaccine production) in many other regions, including South America and India [55, 56]. We hypothesize that the site-specific mutations found under convergent evolution at strategic positions in the SARS-CoV-2 VOC/VOI Spike protein targeted by vaccine-induced neutralizing antibodies, as shown by Andreata-Santos et al. (2022) [57], may also help to explain these findings (Fig 4). Nonetheless, viral circulation in the face of incomplete immunization has been described as one of the probable causes of the emergence of new variants [42] (Sabino et al., 2021). Accordingly, our own analysis identified SARS-CoV-2 VOC and VOI sub-clusters (Fig 3B), thus indicating co-circulation of variants and sub-lineages carrying new mutations under convergent evolution. Surprisingly, the same evolutionary pattern was also observed for other endemic and epidemic CoVs studied (see Data availability).
Fig 4
Structural representation of the SARS-CoV-2 spike glycoprotein (PDB 7A98).
On the left side, different colors represent ACE2, angiotensin-converting enzyme 2 (silver); NTD, N-terminal domain (orange); RBM, receptor binding motif (yellow); RBD, receptor binding domain (red) and S1/S2, cleavage site to S2 (pink). The structures in blue represent the chain B and C, respectively. Colored spheres highlight the mutations mapped in the study. On the right side, our hypotheses about a linear perspective into the prospective vaccine efficacy against different SARS-CoV-2 strains. nAb, neutralizing antibody. Spike protein image was created with the Visual Molecular Dynamics (VMD) v.1.9.3 [58].
Structural representation of the SARS-CoV-2 spike glycoprotein (PDB 7A98).
On the left side, different colors represent ACE2, angiotensin-converting enzyme 2 (silver); NTD, N-terminal domain (orange); RBM, receptor binding motif (yellow); RBD, receptor binding domain (red) and S1/S2, cleavage site to S2 (pink). The structures in blue represent the chain B and C, respectively. Colored spheres highlight the mutations mapped in the study. On the right side, our hypotheses about a linear perspective into the prospective vaccine efficacy against different SARS-CoV-2 strains. nAb, neutralizing antibody. Spike protein image was created with the Visual Molecular Dynamics (VMD) v.1.9.3 [58].This study demonstrates the influence of positive directional evolution on SARS-CoV-2 circulating in South America and in those countries most severely affected by the COVID-19 pandemic. Furthermore, our methodology allowed the identification of distinct transmission sub-clusters and recombination breakpoints in many SARS-CoV-2 variants, to our knowledge, not previously shown so far. We were able to indirectly infer transmission of a viral epidemiological chain and the generation of new variants. Through sequence data collected in February, July and August 2021, we also further identified and classified several convergently emerged shared mutations in different SARS-CoV-2 VOC and VOI. Lastly, we hypothesize that the co-circulation of SARS-CoV-2 variants and their possible sub-lineages takes place within a very close evolutionary environment, which can be translated to a setting of strong convergent and directional evolution. The latter may have affected the number of different lineages within each country, with highly infected countries being impacted the most (as can be seen from comparison against China). This agrees with the general virological perspective that the larger the number of viruses circulating, as well as the number of lineages within a region, the more likely it becomes that sub-lineages will originate and spread. Our results confirm the importance of critical assessment, monitoring, and control of SARS-CoV-2 lineages and sub-lineages throughout the pandemics, even more so in countries with large populations where many opportunities arise for positive selection, due to both recurrent substitutions and recombination events.
Authors: Simon Pollett; Matthew A Conte; Mark Sanborn; Richard G Jarman; Grace M Lidl; Kayvon Modjarrad; Irina Maljkovic Berry Journal: Sci Rep Date: 2021-08-30 Impact factor: 4.996
Authors: Matthew McCallum; Jessica Bassi; Anna De Marco; Alex Chen; Alexandra C Walls; Julia Di Iulio; M Alejandra Tortorici; Mary-Jane Navarro; Chiara Silacci-Fregni; Christian Saliba; Kaitlin R Sprouse; Maria Agostini; Dora Pinto; Katja Culap; Siro Bianchi; Stefano Jaconi; Elisabetta Cameroni; John E Bowen; Sasha W Tilles; Matteo Samuele Pizzuto; Sonja Bernasconi Guastalla; Giovanni Bona; Alessandra Franzetti Pellanda; Christian Garzoni; Wesley C Van Voorhis; Laura E Rosen; Gyorgy Snell; Amalio Telenti; Herbert W Virgin; Luca Piccoli; Davide Corti; David Veesler Journal: Science Date: 2021-07-01 Impact factor: 47.728
Authors: Paola Cristina Resende; Tiago Gräf; Anna Carolina Dias Paixão; Luciana Appolinario; Renata Serrano Lopes; Ana Carolina da Fonseca Mendonça; Alice Sampaio Barreto da Rocha; Fernando Couto Motta; Lidio Gonçalves Lima Neto; Ricardo Khouri; Camila I de Oliveira; Pedro Santos-Muccillo; João Felipe Bezerra; Dalane Loudal Florentino Teixeira; Irina Riediger; Maria do Carmo Debur; Rodrigo Ribeiro-Rodrigues; Anderson Brandao Leite; Cliomar Alves do Santos; Tatiana Schäffer Gregianini; Sandra Bianchini Fernandes; André Felipe Leal Bernardes; Andrea Cony Cavalcanti; Fábio Miyajima; Claudio Sachhi; Tirza Mattos; Cristiano Fernandes da Costa; Edson Delatorre; Gabriel L Wallau; Felipe G Naveca; Gonzalo Bello; Marilda Mendonça Siqueira Journal: Viruses Date: 2021-04-21 Impact factor: 5.048
Authors: Shuo Su; Gary Wong; Weifeng Shi; Jun Liu; Alexander C K Lai; Jiyong Zhou; Wenjun Liu; Yuhai Bi; George F Gao Journal: Trends Microbiol Date: 2016-03-21 Impact factor: 17.079
Authors: Allison J Greaney; Andrea N Loes; Katharine H D Crawford; Tyler N Starr; Keara D Malone; Helen Y Chu; Jesse D Bloom Journal: Cell Host Microbe Date: 2021-02-08 Impact factor: 21.023
Authors: Amarendra Pegu; Sarah E O'Connell; Stephen D Schmidt; Sijy O'Dell; Chloe A Talana; Lilin Lai; Jim Albert; Evan Anderson; Hamilton Bennett; Kizzmekia S Corbett; Britta Flach; Lisa Jackson; Brett Leav; Julie E Ledgerwood; Catherine J Luke; Mat Makowski; Martha C Nason; Paul C Roberts; Mario Roederer; Paulina A Rebolledo; Christina A Rostad; Nadine G Rouphael; Wei Shi; Lingshu Wang; Alicia T Widge; Eun Sung Yang; John H Beigel; Barney S Graham; John R Mascola; Mehul S Suthar; Adrian B McDermott; Nicole A Doria-Rose; Jae Arega; John H Beigel; Wendy Buchanan; Mohammed Elsafy; Binh Hoang; Rebecca Lampley; Aparna Kolhekar; Hyung Koo; Catherine Luke; Mamodikoe Makhene; Seema Nayak; Rhonda Pikaart-Tautges; Paul C Roberts; Janie Russell; Elisa Sindall; Jim Albert; Pratap Kunwar; Mat Makowski; Evan J Anderson; Amer Bechnak; Mary Bower; Andres F Camacho-Gonzalez; Matthew Collins; Ana Drobeniuc; Venkata Viswanadh Edara; Srilatha Edupuganti; Katharine Floyd; Theda Gibson; Cassie M Grimsley Ackerley; Brandi Johnson; Satoshi Kamidani; Carol Kao; Colleen Kelley; Lilin Lai; Hollie Macenczak; Michele Paine McCullough; Etza Peters; Varun K Phadke; Paulina A Rebolledo; Christina A Rostad; Nadine Rouphael; Erin Scherer; Amy Sherman; Kathy Stephens; Mehul S Suthar; Mehgan Teherani; Jessica Traenkner; Juton Winston; Inci Yildirim; Lee Barr; Joyce Benoit; Barbara Carste; Joe Choe; Maya Dunstan; Roxanne Erolin; Jana Ffitch; Colin Fields; Lisa A Jackson; Erika Kiniry; Susan Lasicka; Stella Lee; Matthew Nguyen; Stephanie Pimienta; Janice Suyehira; Michael Witte; Hamilton Bennett; Nedim Emil Altaras; Andrea Carfi; Marjorie Hurley; Brett Leav; Rolando Pajon; Wellington Sun; Tal Zaks; Rhea N Coler; Sasha E Larsen; Kathleen M Neuzil; Lisa C Lindesmith; David R Martinez; Jennifer Munt; Michael Mallory; Caitlin Edwards; Ralph S Baric; Nina M Berkowitz; Eli A Boritz; Kevin Carlton; Kizzmekia S Corbett; Pamela Costner; Adrian Creanga; Nicole A Doria-Rose; Daniel C Douek; Britta Flach; Martin Gaudinski; Ingelise Gordon; Barney S Graham; LaSonji Holman; Julie E Ledgerwood; Kwanyee Leung; Bob C Lin; Mark K Louder; John R Mascola; Adrian B McDermott; Kaitlyn M Morabito; Laura Novik; Sarah O'Connell; Sijy O'Dell; Marcelino Padilla; Amarendra Pegu; Stephen D Schmidt; Wei Shi; Phillip A Swanson; Chloe A Talana; Lingshu Wang; Alicia T Widge; Eun Sung Yang; Yi Zhang; James D Chappell; Mark R Denison; Tia Hughes; Xiaotao Lu; Andrea J Pruijssers; Laura J Stevens; Christine M Posavad; Michael Gale; Vineet Menachery; Pei-Yong Shi Journal: Science Date: 2021-08-13 Impact factor: 63.714
Authors: Pragya D Yadav; Varsha A Potdar; Manohar Lal Choudhary; Dimpal A Nyayanit; Megha Agrawal; Santosh M Jadhav; Triparna D Majumdar; Anita Shete-Aich; Atanu Basu; Priya Abraham; Sarah S Cherian Journal: Indian J Med Res Date: 2020 Feb & Mar Impact factor: 2.375
Authors: Paola Stefanelli; Giovanni Faggioni; Alessandra Lo Presti; Stefano Fiore; Antonella Marchi; Eleonora Benedetti; Concetta Fabiani; Anna Anselmo; Andrea Ciammaruconi; Antonella Fortunato; Riccardo De Santis; Silvia Fillo; Maria Rosaria Capobianchi; Maria Rita Gismondo; Alessandra Ciervo; Giovanni Rezza; Maria Rita Castrucci; Florigio Lista Journal: Euro Surveill Date: 2020-04
Authors: Nadia B Olivero; Ana S Gonzalez-Reiche; Viviana E Re; Gonzalo M Castro; María B Pisano; Paola Sicilia; María G Barbas; Zenab Khan; Adriana van de Guchte; Jayeeta Dutta; Paulo R Cortes; Mirelys Hernandez-Morfa; Victoria E Zappia; Lucia Ortiz; Ginger Geiger; Daniela Rajao; Daniel R Perez; Harm van Bakel; Jose Echenique Journal: BMC Genomics Date: 2022-07-14 Impact factor: 4.547