| Literature DB >> 35254107 |
Shenghui Weng1,2, Hangyu Zhou1,2, Chengyang Ji1,2, Liang Li3, Na Han1,2, Rong Yang1,2, Jingzhe Shang1,2, Aiping Wu1,2.
Abstract
SARS-CoV-2 continues adapting to human hosts during the current worldwide pandemic since 2019. This virus evolves through multiple means, such as single nucleotide mutations and structural variations, which has brought great difficulty to disease prevention and control of COVID-19. Structural variation, including multiple nucleotide changes like insertions and deletions, has a greater impact relative to single nucleotide mutation on both genome structures and protein functions. In this study, we found that deletion occurred frequently in not only SARS-CoV-2 but also in other SARS-related coronaviruses. These deletions showed obvious location bias and formed 45 recurrent deletion regions in the viral genome. Some of these deletions showed proliferation advantages, including four high-frequency deletions (nsp6 Δ106-109, S Δ69-70, S Δ144, and Δ28271) that were detected in around 50% of SARS-CoV-2 genomes and other 19 median-frequency deletions. In addition, the association between deletions and the WHO reported variants of concern (VOC) and variants of interest (VOI) of SARS-CoV-2 indicated that these variants had a unique combination of deletion patterns. In the spike (S) protein, the deletions in SARS-CoV-2 were mainly in the N-terminal domain. Some deletions, such as S Δ144/145 and S Δ243-244, have been confirmed to block the binding sites of neutralizing antibodies. Overall, this study revealed a conservative regional pattern and the potential effect of some deletions in SARS-CoV-2 over the whole genome, providing important evidence for potential epidemic control and vaccine development. IMPORTANCE Mutations in SARS-CoV-2 were studied extensively, while only the structure variations on the spike protein were discussed well in previous studies. To study the role of structural variations in virus evolution, we described the distribution of structure variations on the whole genome. Conserved patterns were found of deletions among SARS-CoV-2, SARS-CoV-2-like, and SARS-CoV-like viruses. There were 45 recurrent deletion regions (RDRs) in SARS-CoV-2 generated through the integration of deleted positions. In these regions, four high-frequency deletions parallelly appeared in multiple strains. Furthermore, in the spike protein, the deletions in SARS-CoV-2 were mainly in the N-terminal domain, blocking the binding sites of some neutralizing antibodies, while the structural variations in SARS-related coronavirus were mainly in the N-terminal domain and receptor binding domain. The receptor binding domain is highly related to hosting recognition. The deletions in the receptor binding domain may play a role in host adaption.Entities:
Keywords: SARS-CoV-2; adaptive evolution; mutation; recurrent deletion; structural variation
Mesh:
Substances:
Year: 2022 PMID: 35254107 PMCID: PMC9045279 DOI: 10.1128/spectrum.02191-21
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1The genomic distribution of insertions and deletions in SARS-CoV-2 and SARS-CoV-2-like viruses. (A) Deletions are dotted in the SARS-CoV-2 genome (X-axis) with time (Y-axis). The distribution of insertions (blue) and deletions (red) in SARS-CoV-2-like (B) and SARS-CoV-like viruses (C) are shown in the same way. Three shared high deletion/insertion areas 1-3 (HDAs 1-3) are highlighted with grey boxes. The reference genomes are EPI_ISL_402124 and NC_004718.3 for SARS-CoV-2 and SARS-CoV, respectively. (D) Context of S Δ69-70 in aligned SARS-CoV-2-like virus genome sequences. S Δ69-70 of SARS-CoV-2-like viruses is framed out by a red rectangle.
FIG 2Deletion types in the RDR regions of the SARS-CoV-2 genome. (A) The distribution of RDR regions on the SARS-CoV-2 genome together with the number of deletion types in each RDR. The RDR regions are in red boxes. The length of the regions that are longer than 100 nt is labeled under their boxes. The black points on the red boxes represent the count of their pattern type. The counts larger than 20 times are labeled out above the points. The high-frequency (B) and medium-frequency (C) deletions were observed in the SARS-CoV-2 genome. High-frequency deletions occurred more than 600, 000 times, and middle-frequency deletions showed more than 1, 000 times. (D) The variants containing four high-frequency deletions (red) are highlighted in the phylogenetic tree of SARS-CoV-2. The other variants are colored in grey as background.
FIG 3Deletions in different SARS-CoV-2 variants. (A) The number of overlapping sequences of high-frequency deletions in SARS-CoV-2 is shown in a Venn diagram. Group a-o represents each part in the Venn diagram. (B) Group a-o consists of various forms of deletion combination. The gradation of color represents the rate of each combination in each strain. The sum of each column is 1. (C) Deletions (red) and mutations (grey) are mapped in six VOC and VOI SARS-CoV-2 strains. The high-frequency and median-frequency deletions are colored in red and purple, respectively. (D) The number of daily samples of VOC and VOI strains is displayed in bar plots.
FIG 4Relationships between deletions, mutations, and antibody binding regions in the S protein of SARS-CoV-2. (A–D) The distribution of deletions (A), mutations (B), IgA and IgG epitopes (C), and known neutralizing antibodies (D) on the SARS-CoV-2 S protein. (E) The RDRs are displayed on the 3-D model of the SARS-CoV-2 S protein. NTD and RBD are labeled as spheres and surfaces. The RDRs of SARS-CoV-2 (blue) and SARS-CoV-2-like viruses (red) are colored, and their overlapping area is in yellow.