| Literature DB >> 35711666 |
Jiri Zahradník1, Jaroslav Nunvar2,3, Gideon Schreiber1.
Abstract
Viruses rapidly co-evolve with their hosts. The 9 million sequenced SARS-CoV-2 genomes by March 2022 provide a detailed account of viral evolution, showing that all amino acids have been mutated many times. However, only a few became prominent in the viral population. Here, we investigated the emergence of the same mutations in unrelated parallel lineages and the extent of such convergent evolution on the molecular level in the spike (S) protein. We found that during the first phase of the pandemic (until mid 2021, before mass vaccination) 31 mutations evolved independently ≥3-times within separated lineages. These included all the key mutations in SARS-CoV-2 variants of concern (VOC) at that time, indicating their fundamental adaptive advantage. The omicron added many more mutations not frequently seen before, which can be attributed to the synergistic nature of these mutations, which is more difficult to evolve. The great majority (24/31) of S-protein mutations under convergent evolution tightly cluster in three functional domains; N-terminal domain, receptor-binding domain, and Furin cleavage site. Furthermore, among the S-protein receptor-binding motif mutations, ACE2 affinity-improving substitutions are favoured. Next, we determined the mutation space in the S protein that has been covered by SARS-CoV-2. We found that all amino acids that are reachable by single nucleotide changes have been probed multiple times in early 2021. The substitutions requiring two nucleotide changes have recently (late 2021) gained momentum and their numbers are increasing rapidly. These provide a large mutation landscape for SARS-CoV-2 future evolution, on which research should focus now.Entities:
Keywords: SARS-CoV-2; convergent evolution; mutations; spike (S) protein; virus
Mesh:
Substances:
Year: 2022 PMID: 35711666 PMCID: PMC9197234 DOI: 10.3389/fcimb.2022.748948
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 6.073
Figure 1Convergent evolution of SARS-CoV-2 S-protein AA residues among different lineages (autumn 2020 – spring 2021). All S-protein AA mutations present in SARS-CoV-2 VOC lineages (alpha, beta, gamma, delta) were collected. The phylogenetic pattern of mutations at each of these AA positions was assessed visually in the representative global subsample of SARS-CoV-2 genomes (https://nextstrain.org/ncov/global) (Hadfield et al., 2018). NextStrain was scanned repeatedly between March to June 2021;lineages were confirmed to show a recent localized rise to high frequency (indicative of selective advantage). All additional S-protein AA mutations present in these lineages were subjected to the same procedure as above. After several iterations, a final set of S-protein positions that experienced ≥3 independent mutations was established. Information about the distribution and within-lineage frequency of S-protein mutations, and the spatiotemporal characterization of SARS-CoV-2 lineages was retrieved from (Tsueng et al., 2022) (15 June 2021). The AA parallelism score was established for all convergently mutated AAs, by summing their independent emergence events. The lineage parallelism score was calculated by summing the parallelism scores of S-protein AAs mutated in individual lineages. *AA substitutions for G142; Mutations inherited from the same progenitor are considered a single evolutionary origin in parallelism score calculations.
Figure 2Localization of convergent mutations in SARS-CoV-2 S-protein structure, breakdown and progression of RBD AA mutations. (A) The S-protein parts missing in the crystal structure (PDB ID: 6zge) were modelled by the Modeller suite implemented in UCSF Chimera 1.13.1 (Webb and Sali, 2016). Residues under convergent evolution are depicted in spheres representation and colored according to their parallelism scores ( ). Inset d) shows the region covering a portion of heptad repeat, central helix, and β-hairpin domains. Green residues in inset c) highlight the Furin cleavage site. (B) The mutational sequence space of RBD binding interface residues 472 – 505, and its coverage by mutations present in the GISAID database. All possible SNC (single-nucleotide change) AA substitutions are depicted. For TNC (two-nucleotide change) substitutions, only the subset of AA substitutions that were sampled in GISAID are depicted (in background color scale according to the legend), together with substitutions with a positive binding impact (frame). The AA position is invariant to its later occurrence to highlight differences. Deep-mutational scanning ΔLog10(K D, App) values were extracted from https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/ (Starr et al., 2020). ACE2 lane depicts residue distances in Å from the ACE2 receptor.