| Literature DB >> 32975856 |
Iman Safari1, Kolsoum InanlooRahatloo1, Elahe Elahi1.
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes serious disease in humans. First identified in November/December 2019 in China, it has rapidly spread worldwide. We analyzed 2790 SARS-CoV-2 genome sequences from 56 countries that were available on April 2, 2020, to assess the evolution of the virus during this early phase of its expansion. We aimed to assess sequence variations that had evolved in virus genomes, giving the greatest attention to the S gene. We also aimed to identify haplotypes that the variations may define and consider their geographic and chronologic distribution. Variations at 1930 positions that together cause 1203 amino acid changes were identified. The frequencies of changes normalized to the lengths of genes and encoded proteins were relatively high in ORF3a and relatively low in M. A variation that causes an Asp614Gly near the receptor-binding domain of S were found at a high frequency, and it was considered that this may contribute to the rapid spread of viruses with this variation. Our most important findings relate to haplotypes. Sixty-six haplotypes that constitute thirteen haplotype groups (H1-H13) were identified, and 84.6% of the 2790 sequences analyzed were associated with these haplotypes. The majority of the sequences (75.1%) were associated with haplotype groups H1-H3. The distribution pattern of the haplotype groups differed in various geographic regions. A few were country/territory specific. The location and time of emergence of some haplotypes are discussed. Importantly, nucleotide variations that define the various haplotypes and Tag/signature variations for most of the haplotypes are reported. The practical applications of these variations are discussed.Entities:
Keywords: SARS-CoV-2; Tag SNVs; amino acid changes; haplotypes; nucleotide variations
Mesh:
Substances:
Year: 2020 PMID: 32975856 PMCID: PMC7537300 DOI: 10.1002/jmv.26553
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Figure 1Schematic presentation of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genome and S encoded protein (spike). The upper panel shows the SARS‐CoV‐2 genomic regions. The middle panel depicts the S protein, and the lines therein represent positions of non‐synonymous amino acid changes in S found among the sequences analyzed. The region of the receptor‐binding domain (RBD) domain of the SARS‐CoV‐2 spike protein that includes amino acids thought to be most important for interaction with the human angiotensin‐converting enzyme 2 (ACE2) receptor is shown in the bottom panel, and is aligned with the RBD domain of four other coronaviruses. The six residues shaded in yellow are the most important SARS‐CoV‐2 interacting amino acids, and the 12 shaded in green and the four shaded in gray are at the next levels of importance. The stars in the lower panel show the positions of non‐synonymous amino acid changes within the RBD domain
Figure 2Numbers and distribution of different nucleotide sequence variations and non‐synonymous amino acid changes observed in 2790 severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genome sequences in SARS‐CoV‐2 genes and proteins. A, Numbers of nucleotide variations (green bars) and nonsynonymous amino acid changes (red bars). B, The numbers normalized to number of changes per 100 nucleotides (green line) and 100 amino acids (red line). nsp1 and ORF10 that are positioned near 5ʹ and 3ʹ termini of SARC‐CoV‐2 genome are not included because of relatively high frequencies of nucleotides reported as N
Figure 3Haplotype network of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genome sequences. The nodes that represent the 28 most frequent haplotypes are labeled according to the haplotype designations of Figure 4. The node that represents the reference sequence and two nodes designated by single variations at positions 11 083 and 26 144 are also labeled
Figure 4Sixty‐six most frequently observed haplotypes among 2790 severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genome sequences. The 13 graphs represent 13 haplotype groups. A presumably ancestral haplotype is described at the apex of each graph, and all its sub‐haplotypes that were each identified in ≥10 sequences are described at lower levels. Haplotypes with lower frequencies are not shown. Each haplotype is defined by its own single‐nucleotide variations (SNVs) and all the SNVs of all its upper‐level haplotypes. The percent of each of the haplotype groups among the 2790 genome sequences is shown in adjacent boxes. The percent of each sub‐haplotype among its immediately upper‐level haplotype is shown on the respective edges. * ≥ 85% of genome sequences with the sequence variation had the predicted haplotype; ** ≥ 95% of genome sequences with the sequence variation had the predicted haplotype. The minor allele (T) at position 241 was assumed for 23 of 1248 sequences with this haplotype wherein the nucleotide was not read well (reported as N). Reference to the specific haplotypes in the text will be preceded by the letter H; H is not included in the nomenclature of the figure because of space limitations
Percent of haplotypes 1–13 and respective sub‐haplotypes in various geographic regions and countries/territories
| Geographic regions and countries/territories | No. sequences | H1–H13 and sub‐haplotypes | Haplotypes and respective sub‐haplotypes | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H1 | H2 | H3 | H4 | H5 | H6 | H7 | H8 | H9 | H10 | H11 | H12 | H13 | |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| China: Mainland | 267 | 39.3 | 0 | 31.8 | 0.4 | 0 | 3.4 | 0 | 0 | 0 | 0 | 0 | 0.7 | 3 | 0 |
| China: Hong Kong | 50 | 64 | 2 | 6 | 18 | 38 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| China: Taiwan | 17 | 82.4 | 11.8 | 11.8 | 5.9 | 0 | 47.1 | 0 | 0 | 0 | 0 | 0 | 5.9 | 0 | 0 |
| Japan | 95 | 40 | 19 | 4.2 | 0 | 0 | 0 | 0 | 0 | 0 | 14.7 | 0 | 0 | 2.1 | 0 |
| Malaysia | 7 | 57 | 0 | 28.5 | 0 | 0 | 0 | 0 | 0 | 28.5 | 0 | 0 | 0 | 0 | 0 |
| Singapore | 24 | 50 | 0 | 8.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 41.7 |
| South Korea | 13 | 69.2 | 0 | 69.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Vietnam | 8 | 75 | 37.5 | 37.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Other | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Australia | 67 | 93.6 | 17.9 | 16.4 | 12 | 0 | 31 | 1.5 | 1.5 | 9 | 0 | 0 | 3 | 1.5 | 0 |
| New Zealand | 5 | 100 | 20 | 40 | 0 | 0 | 20 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Brazil | 18 | 94.4 | 77.9 | 5.6 | 11.2 | 0 | 0 | 5.6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Chile | 7 | 100 | 14.3 | 85.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Other | 5 | 100 | 80 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Canada | 111 | 95.9 | 45.9 | 35.6 | 0.9 | 0 | 9 | 1.8 | 0 | 1.8 | 0 | 0 | 0.9 | 0 | 0 |
| USA | 741 | 91.1 | 28.9 | 59.3 | 1.8 | 0 | 0.3 | 0.2 | 0 | 0.3 | 0.3 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Congo | 19 | 94.7 | 89.5 | 0 | 5.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Senegal | 11 | 100 | 90.9 | 9.1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Other | 3 | 100 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Belgium | 103 | 98.1 | 88 | 1 | 3.9 | 0 | 0 | 4.8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Denmark | 9 | 100 | 84.6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Finland | 33 | 97 | 84.8 | 0 | 6 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| France | 135 | 91.9 | 88.1 | 1.5 | 2.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Georgia | 10 | 100 | 50 | 10 | 30 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Germany | 39 | 89.7 | 51.3 | 2.6 | 0 | 0 | 2.6 | 33.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Iceland | 295 | 94.2 | 76.5 | 3.7 | 12.7 | 0 | 0 | 0.6 | 0 | 0 | 0 | 0.7 | 0.3 | 0 | 0 |
| Ireland | 11 | 81.8 | 63.6 | 0 | 18.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Italy | 23 | 95.6 | 86.9 | 0 | 8.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Luxembourg | 46 | 100 | 93.4 | 0 | 6.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Netherlands | 164 | 95.7 | 59.4 | 2.4 | 9.1 | 0 | 0.6 | 2.4 | 21.9 | 0 | 0 | 0 | 0 | 0 | 0 |
| Norway | 6 | 100 | 16.7 | 0 | 66.7 | 0 | 16.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Portugal | 46 | 100 | 82.6 | 2.2 | 10.9 | 0 | 0 | 0 | 2.2 | 0 | 0 | 2.2 | 0 | 0 | 0 |
| Spain | 47 | 91.5 | 38.8 | 47 | 6.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Switzerland | 35 | 100 | 96.7 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| UK: England | 201 | 85.1 | 43.3 | 1 | 33 | 0 | 0.5 | 0.5 | 1.5 | 0 | 4.5 | 0.5 | 0 | 0 | 0 |
| UK: Scotland | 6 | 100 | 66.7 | 0 | 33.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| UK: Wales | 80 | 91.3 | 36.3 | 1.3 | 7.5 | 0 | 0 | 63.5 | 0 | 0 | 0 | 2.5 | 0 | 0 | 0 |
| Other | 17 | 94.3 | 76.6 | 5.9 | 0 | 0 | 0 | 5.9 | 5.9 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cambodia and Thailand.
Colombia, Mexico, Panama and Peru.
Algeria and South Africa.
Georgia included in Europe because of proximity to countries of Eastern Europe.
Czech Republic, Greece, Hungary, Lithuania, Poland, Russia, Slovakia, and Sweden.
Israel, Kuwait, and Saudi Arabia.
India, Nepal, and Pakistan.
Figure 5Frequency of haplotypes H1–H13 and their subhaplotypes among severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genome sequences of various geographic regions. The number (n) of sequences from each region is indicated, and the percent of the haplotypes in each region is written within the colored bars that represent the various haplotype groups. The haplotypes of the few samples from the Middle East and West Asia are not shown