| Literature DB >> 33885726 |
Abozar Ghorbani1, Samira Samarfard2,3, Neda Eskandarzade4, Alireza Afsharifar1, Mohammad Hadi Eskandari5, Ali Niazi6, Keramatollah Izadpanah1, Thomas P Karbanowicz2.
Abstract
Coronavirus disease 2019 has developed into a dramatic pandemic with tremendous global impact. The receptor-binding motif (RBM) region of the causative virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), binds to host angiotensin-converting enzyme 2 (ACE2) receptors for infection. As ACE2 receptors are highly conserved within vertebrate species, SARS-CoV-2 can infect significant animal species as well as human populations. An analysis of SARS-CoV-2 genotypes isolated from human and significant animal species was conducted to compare and identify mutation and adaptation patterns across different animal species. The phylogenetic data revealed seven distinct phylogenetic clades with no significant relationship between the clades and geographical locations. A high rate of variation within SARS-CoV-2 mink isolates implies that mink populations were infected before human populations. Positions of most single-nucleotide polymorphisms (SNPs) within the spike (S) protein of SARS-CoV-2 genotypes from the different hosts are mostly accumulated in the RBM region and highlight the pronounced accumulation of variants with mutations in the RBM region in comparison with other variants. These SNPs play a crucial role in viral transmission and pathogenicity and are keys in identifying other animal species as potential intermediate hosts of SARS-CoV-2. The possible roles in the emergence of new viral strains and the possible implications of these changes, in compromising vaccine effectiveness, deserve urgent considerations.Entities:
Keywords: SARS-CoV-2; SNP; coronavirus; selective pressure
Year: 2021 PMID: 33885726 PMCID: PMC8083239 DOI: 10.1093/bib/bbab144
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1A full reference phylogenetic tree is drawn from 243.779 high-coverage SARS-CoV-2 sequences that have been registered in the GISAID database from all over the world. The tree was constructed using a neighbor-joining (NJ) method. It shows that the SARS-CoV-2 is generally divided into seven different clades (S, L, V, G, GH, GR and GV). The presence of different clades of the virus was possible in all geographical areas. Color codes show different continents.
Matrix of estimated evolutionary distance between different host groups calculated with maximum likelihood (ML) approach
| Human | Bat | Dog | Cat | Lion | Mink | Mouse | Pangolin | |
|---|---|---|---|---|---|---|---|---|
| Bat | 1.009 | |||||||
| Dog | 0.000 | 1.011 | ||||||
| Cat | 0.000 | 1.010 | 0.000 | |||||
| Lion | 0.001 | 1.011 | 0.001 | 0.001 | ||||
| Mink | 2.193 | 3.045 | 2.196 | 2.194 | 2.193 | |||
| Mouse | 0.001 | 1.006 | 0.001 | 0.001 | 0.001 | 2.187 | ||
| Pangolin | 8.819 | 8.075 | 8.834 | 8.813 | 8.819 | 8.427 | 8.789 | |
| Tiger | 0.000 | 1.010 | 0.000 | 0.000 | 0.001 | 2.194 | 0.001 | 8.814 |
Average evolutionary divergence and gamma distribution between coronavirus populations within each species based on maximum likelihood (ML) algorithm
| Animal host | Transition/transversion bias ( | Gamma distribution | Average evolutionary divergence over sequence pairs within groups |
|---|---|---|---|
| Human | 3.56 | 200 | 0.000408 |
| Dog | 1 | 11.2382 | 0.000263 |
| Pangolin | 1.24 | 1.3765 | 3.5921 |
| Cat | 1 | 37.1736 | 0.000105 |
| Lion | 2 765 874.42 | 0.05 | 0.000526 |
| Mink | 1.28 | 0.05 | 3.2147 |
| Tiger | 2 | 66.0911 | 0.000175 |
Figure 2Phylogenetic analysis of SARS CoV-2 S protein sequence. The radial phylogenetic tree was constructed using the neighbor-joining (NJ) algorithm in the MEGA 7 package to illustrate the evolutionary relationship between different host sets of the virus. The colors used in the tree indicate different hosts from which the sequences were sampled.
Figure 3Abundance and position of SNP in SARS-CoV-2 S protein in different hosts. S1 subunit (14–685 residues) comprised NTD (N-terminal domain, 14–305 residues), RBM (receptor-binding motif, 319–541 residues). S2 subunit (686–1273 residues) comprised FP (fusion peptide, 788–806 residues), HR1 (heptapeptide repeat sequence 1, 912–984 residues), HR2 (heptapeptide repeat sequence 2, 1163–1213 residues), TM (transmembrane domain, 1213–1237 residues) and IC (cytoplasmic domain, 1237–1273 residues). Symbols of the same color lined up in a column indicate the same nucleotide position.
Figure 4Frequency and location of SNPs on the 3D structure of SARS-CoV-2 S protein among all virus isolates from different clades. Green line: ACE2 human host receptor, gray line: spike glycoprotein trimer : Spike glycoprotein variation occurring >100 times : Spike glycoprotein variation occurring 100 times or less : Spike glycoprotein variation near host receptor with effect history : Spike glycoprotein variation near host receptor or other functional annotation : Insertion/deletion : Spike glycoprotein variation altering potential N-glycosylation sites.