| Literature DB >> 33615041 |
Nariman Shahhosseini1, Gary Wong1,2, Gary P Kobinger1,3,4,5, Sadegh Chinikar6,7.
Abstract
In late 2019, a novel Coronavirus emerged in China. Perceiving the modulating factors of cross-species virus transmission is critical to elucidate the nature of virus emergence. Using bioinformatics tools, we analyzed the mapping of the SARS-CoV-2 genome, modeling of protein structure, and analyze the evolutionary origin of SARS-CoV-2, as well as potential recombination events. Phylogenetic tree analysis shows that SARS-CoV-2 has the closest evolutionary relationship with Bat-SL-CoV-2 (RaTG13) at the scale of the complete virus genome, and less similarity to Pangolin-CoV. However, the Receptor Binding Domain (RBD) of SARS-CoV-2 is almost identical to Pangolin-CoV at the aa level, suggesting that spillover transmission probably occurred directly from pangolins, but not bats. Further recombination analysis revealed the pathway for spillover transmission from Bat-SL-CoV-2 and Pangolin-CoV. Here, we provide evidence for recombination event between Bat-SL-CoV-2 and Pangolin-CoV that resulted in the emergence of SARS-CoV-2. Nevertheless, the role of mutations should be noted as another influencing factor in the continuing evolution and resurgence of novel SARS-CoV-2 variants.Entities:
Keywords: Bat-SL-CoV-2, Bat SARS like Coronavirus 2; COVID-19, coronavirus disease 2019; CoV, coronavirus; MERS, Middle East Respiratory Syndrome; Mutation; Pandemic; Phylogenetics; RBD, receptor binding domain; Recombination; SARS, severe acute respiratory syndrome; SARS-CoV-2; Virulence; hACE2, human angiotensin-converting enzyme 2
Year: 2021 PMID: 33615041 PMCID: PMC7884226 DOI: 10.1016/j.genrep.2021.101045
Source DB: PubMed Journal: Gene Rep ISSN: 2452-0144
Genomic organization, nucleotide and amino acid lengths of each ORF of the SARS-CoV-2 compared with genetically relevant Betacoronaviruses.
| Virus species (nt length)/Gene Bank Acc. No. | Genomic organization/ Open reading frames | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SARS-CoV-2 | ||||||||||||||||||||||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | |
| 21,288 | 7096 | 3822 | 1273 | 825 | 275 | 225 | 75 | 666 | 222 | 183 | 61 | 363 | 121 | 129 | 43 | 363 | 121 | 1257 | 419 | 114 | 38 | |
| Pangolin-CoV | ||||||||||||||||||||||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | aa | ||
| 21,288 | 7096 | 3837 | 1279 | 831 | 277 | 225 | 75 | 666 | 222 | 183 | 61 | 364 | 121 | 129 | 43 | 363 | 121 | 1257 | 419 | |||
| Bat-SL-CoV-2 | ||||||||||||||||||||||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | |
| 21,285 | 7095 | 3807 | 1269 | 825 | 275 | 225 | 75 | 663 | 221 | 183 | 61 | 363 | 121 | 129 | 43 | 363 | 121 | 1257 | 419 | |||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | |
| 21,210 | 7070 | 3735 | 1245 | 825 | 275 | 225 | 75 | 666 | 222 | 183 | 61 | 363 | 121 | 363 | 121 | 1257 | 419 | 291 | 97 | 210 | 70 | |
| SARS-CoV | ||||||||||||||||||||||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | |
| 21,219 | 7073 | 3765 | 1255 | 822 | 274 | 462 | 154 | 228 | 76 | 663 | 221 | 189 | 63 | 366 | 122 | 366 | 122 | 1266 | 422 | 294 | 98 | |
The submitted sequence of Pangolin-CoV (MT084071) to GenBank contains unread regions. In order to fill the gaps, a consensus sequence was generated from Pangolin-CoV metagenome, NCBI BioProject: PRJNA573298.
Nucleotide and amino acid sequence identities among the genes of SARS-CoV-2 compared with Pangolin-CoV, Bat-SL-CoV-2, Bat-SL-CoV and SARS-CoV.
| Virus strains | Gene regions (identity %) | |||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ORF1a | ORF1b | S | ORF3 | E | M | ORF6 | ORF7a | ORF7b | ORF8 | N | ORF10 | |||||||||||||
| nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | nt | aa | |
| Bat-SL-CoV-2/MN996532 | 96 | 98 | 97.3 | 99.3 | 92.8 | 97.4 | 96.2 | 97.7 | 99.5 | 100 | 95.5 | 98.6 | 98.3 | 100 | 95.5 | 97 | 99.2 | 97.2 | 96.9 | 95 | 96.9 | 99 | 99.1 | 97.3 |
| Pangolin-CoV/MT084071 | 89.3 | 95.8 | 89.6 | 99.2 | 83 | 89.7 | 92.3 | 97 | 99.1 | 100 | 93.3 | 98.6 | 95.5 | 96.4 | 93.3 | 97.4 | 91.4 | 95.2 | 92.1 | 94.2 | 94.6 | 97 | 99.1 | 97.3 |
| Bat-SL-CoV/MG772934 | 90.9 | 95.7 | 86.1 | 95.5 | 74.9 | 79.8 | 88.8 | 91.6 | 98.6 | 100 | 93.4 | 98.6 | 95 | 93.2 | 89.6 | 90 | 95.3 | 92.7 | 88.5 | 94.2 | 91.1 | 94.2 | 100 | 100 |
| SARS-CoV/AY304488 | 76 | 81 | 86.2 | 95.6 | 72.5 | 75.9 | 75.5 | 96.9 | 93.5 | 94.7 | 84.9 | 89.1 | 76.5 | 68.2 | 84.1 | 89 | 86.1 | 83.8 | 40.9 | 16.1 | 88.0 | 90 | 93.1 | 82.4 |
The submitted sequence of Pangolin-CoV (MT084071) to GenBank contains unread regions. In order to fill the gaps, a consensus sequence was generated from Pangolin-CoV metagenome, NCBI BioProject: PRJNA573298.
Fig. 1The aa sequence of the spike is colored based on protein structure (A). The predicted 3D structure of the SARS-CoV-2 spike on the surface of membrane protein. The surface of the spike model is colored based on protein structures (B). The representative shows spike from anchor side to host cells (C). The enlarged RBD with key residues binding to hACE2 (D).
Fig. 2The amino acids alignment in RBD (A) and O-linked glycan (B) of SARS-CoV-2 (MN908947), SARS-CoV-2 VOC-202012-01 (GISAID ID: EPI-ISL-601443), Pangolin-CoV (MT084071), Bat-SL-CoV-2 (MN996532), Bat-SL-CoV (MG772934), and SARS-CoV (AY394996) are shown respectively. Key residues in RBD are marked with black boxes (A). Three residues in O-linked glycan are marked with black boxes. The amino acid insertion in SARS-CoV-2 is marked with red box. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3The cladogram of coronaviruses (A) and the phylogenetic NJ tree of Betacoronaviruses (B) were constructed from aligned sequences using tools implemented within the Geneious software. The bootstrap values and number of bootstrap replications were greater than 70% and 1000, respectively. The neighbor-net network based on the full-length sequence of Sarbecoviruses was constructed using the Split tree software (C).
Fig. 4Recombination analysis by RDP4 of SARS-CoV-2 as query sequence and close strains in Sarbecovirus are shown with Similarity plotting on a window size of 200 nucleotides and moving in steps of 20 nt along the alignment (A). Schematic sequence displays of recombinant region in green. The Bootscan analysis shows the most probable positions of breakpoint pairs, which are enlarged (B). Phylogenetically plausible alternative parents trees inferred in different genomic regions of recombinant sequence based on the major parent (C) and minor parent (D). MaxChi breakpoint matrix. Colors represent chi-squared values for breakpoint; dark red peaks indicate the most probable positions of breakpoint pair (E). The number of bootstrap replicates was adjusted on 100 and cutoff percentage on 70%. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5The proposed cycle for the emergence and resurgence of SARS-CoV-2 variants causing 2019–21 pandemic.