Literature DB >> 33106569

The global population of SARS-CoV-2 is composed of six major subtypes.

Ivair José Morais¹, Richard Costa Polveiro², Gabriel Medeiros Souza³, Daniel Inserra Bortolin³, Flávio Tetsuo Sassaki⁴, Alison Talis Martins Lima⁵.

Abstract

The World Health Organization characterized COVID-19 as a pandemic iene">n March 2020, the secoene">nd pandemic of the twen class="Chemical">nty-first century. Expanding virus populations, such as that of SARS-CoV-2, accumulate a number of narrowly shared polymorphisms, imposing a confounding effect on traditional clustering methods. In this context, approaches that reduce the complexity of the sequence space occupied by the SARS-CoV-2 population are necessary for robust clustering. Here, we propose subdividing the global SARS-CoV-2 population into six well-defined subtypes and 10 poorly represented genotypes named tentative subtypes by focusing on the widely shared polymorphisms in nonstructural (nsp3, nsp4, nsp6, nsp12, nsp13 and nsp14) cistrons and structural (spike and nucleocapsid) and accessory (ORF8) genes. The six subtypes and the additional genotypes showed amino acid replacements that might have phenotypic implications. Notably, three mutations (one of them in the Spike protein) were responsible for the geographical segregation of subtypes. We hypothesize that the virus subtypes detected in this study are records of the early stages of SARS-CoV-2 diversification that were randomly sampled to compose the virus populations around the world. The genetic structure determined for the SARS-CoV-2 population provides substantial guidelines for maximizing the effectiveness of trials for testing candidate vaccines or drugs.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2020 PMID： 33106569 PMCID： PMC7588421 DOI： 10.1038/s41598-020-74050-8

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

In December 2019, a local pneumonia outbreak of iene">nitially uene">nkene">nown aetiology was detected iene">n Wuhan (Hubei, Chiene">na) and quickly determiene">ned to be caused by a n class="Species">novel coronavirus[1], named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)[2], with the disease referred to as COVID-19[3]. SARS-CoV-2 belongs to family Coronaviridae, genus Betacoronavirus, which comprises enveloped, positive-stranded RNA viruses of vertebrates[2]. Two-thirds of SARS-CoV genomes are covered by ORF1ab, which encodes a large polypeptide that is cleaved into 16 nonstructural proteins (NSPs) involved in replication-transcription in vesicles from endoplasmic reticulum (ER)-derived membranes[4,5]. The last third of the virus genome encodes four essential structural proteins, namely, Spike (S), Envelope (E), Membrane (M), and Nucleocapsid (N), and several accessory proteins that interfere with the host innate immune response[6]. Populations of RNA viruses evolve rapidly due to their large sizes, short generation times, and high mutation rates, the last of which is a consequence of RNA-dependent RNA polymerase (n class="Gene">RdRP), which lacks proofreading activity[7]. In fact, virus populations are composed of a broad spectrum of closely related genetic variants resembling one or more master sequences[8-10]. Mutation rates inferred for SARS-CoVs are considered moderate[11,12] due to independent proofreading activity[13]. However, the large SARS-CoV genomes (from 27 to 31 kb)[14] allow efficient exploration of the sequence space[15]. To better understand the diversification of SARS-CoV-2 genomes during the pandemic (from December 2019 to March 25, 2020), we applied a simple but robust approach to reduce the complexity of the sequence space occupied by the virus population by detecting its widely shared polymorphisms. A total of 767 SARS-CoV-2 geene">nomes with high sequeene">nciene">ng coverage obtaiene">ned from GISAID (https://www.gisaid.org/) and n class="Gene">GenBank were clustered into 593 haplotypes (Table S1). We conducted a fine-scale sequence variation analysis of the 593 genome-containing alignment by calculating nucleotide diversity (π) using sliding window and step sizes of 300 and 20 nucleotides, respectively (multiple sequence alignments generated in this study are available from the authors upon request). Such an approach allows the identification of genomic regions with increased genetic variation from polymorphic sites harbouring two or more distinct nucleotide bases. Noticeably, one or more large clusters of closely related sequences, when analysed by this approach, show locally increased nucleotide diversity. We observed contrasting distributions of genetic variation across the full-length genomes of SARS-CoV-2 (Fig. 1), with eight segments (S) showing increased genetic variation, arbitrarily defined as nucleotide (nt) segments with π ≥ 0.001. Seven out of eight segments were approximately 280 nucleotides (nt) in length, corresponding approximately to the size of a single sliding window, except S10, whose length was equivalent to two sliding windows (600 nt). To further investigate the diversification of segments with contrasting degrees of genetic variability, we constructed maximum likelihood (ML) phylogenetic trees and analysed the diversification patterns of eight segments with higher genetic variation (S2, 4, 6, 8, 10, 12, 14 and 16) and nine with lower genetic variation (S1, 3, 5, 7, 9, 11, 13, 15 and 17).

Figure 1

Mean pairwise number of nucleotide differences per site (nucleotide diversity, π) calculated using a sliding window of 300 nucleotides across the multiple sequence alignment for full-leene">ngth geene">nomes of n class="Species">SARS-CoV-2. The red dashed line at π = 0.001 represents an arbitrary threshold used to subdivide the segments (S) with higher (S2, 4, 6, 8, 10, 12, 14 and 16) and lower (S1, 3, 5, 7, 9, 11, 13,15 and 17) levels of genetic variation. The SARS-CoV-2 genome organization is represented on top of the plot. Although the data set was composed of hundreds of SARS-CoV-2 geene">nomes sampled from around the world, in the S2-based tree, we observed two clusters (Fig. S1a). Notably, each cluster was composed of very closely related, if not iden class="Chemical">ntical, sequences. Therefore, the increased degree of genetic variation at S2 was a result of inter-cluster sequence comparisons. Similar results were obtained for the other seven ML trees based on segments with increased genetic variation (Fig. S1b–h). In contrast, the ML trees based on segments with lower genetic variation did not show a consistent number of well-defined clusters (Fig. S2). We mapped the polymorphic sites in segments with iene">ncreased geene">netic variatioene">n respoene">nsible for the segregatioene">n of ML trees in class="Chemical">nto two well-defined clusters (Table 1). Only a few (from one to three) nt positions with polymorphisms shared by a number of SARS-CoV-2 genomes could be identified within each segment with increased genetic variation. These polymorphisms were henceforth referred to as ‘widely shared polymorphisms’ (WSPs), while the remaining nt positions in virus genomes were designated as ‘non-widely shared polymorphisms’ (nWSPs).

Table 1

Characterization of the WSPs detected in genomes of SARS-CoV-2.

Segment ID	Segment position^a (begin–end)	WSPs^b	nt mutation (# isolates)	Position in the codon	#codon	Amino acid
S2	2899–3179	nsp3-[3,037]	U (184)/C (409)	Third	106	Phenylalanine/Phenylalanine
S4	8639–8919	nsp4-[8,782]	U (183)/C (410)	Third	76	Serine/Serine
S6	10,959–11,219	nsp6-[11,083]	C (1)/U (99)/G (493)	Third	37	Phenylalanine/Phenylalanine/Leucine
S8	14,259–14,539	nsp12-[14,408]	U (184)/C (409)	Second	323	Leucine/Proline
S10	17,600–18,200	nsp13-[17,747]	U (101)/C (492)	Second	504	Leucine/Proline
		nsp13-[17,858]	G (101)/A (492)	Second	541	Cysteine/Tyrosine
		nsp14-[18,060]	U (105)/C (488)	Third	7	Leucine/Leucine
S12	23,270–23,550	S-[23,403]	G (185)/A (408)	Second	614	Glycine/Aspartate
S14	28,004–28,285	ORF8-[28,144]	C (184)/U (409)	Second	84	Serine/Leucine
S16	28,745–29,025	N-[28,881]	A (60)/G (533)	Second	203	Lysine/Arginine
		N-[28,882]	A (60)/G (533)	Third	203	Lysine/Arginine
		N-[28,883]	C (60)/G (533)	First	204	Glycine/Arginine

aRelative to the multiple sequence alignment constructed for full-length genomes.

bWidely shared polymorphism (WSP) positions are relative to the reference genome (GISAID accession ID: EPI_ISL_402124).

Characterization of the WSPs detected in geene">nomes of n class="Species">SARS-CoV-2. aRelative to the multiple sequence alignment coene">nstructed for full-leene">ngth geene">nomes. bWidely shared polymorphism (WSP) positions are relative to the refereene">nce geene">nome (GISAID accession ID: EPI_ISL_402124). We compared the topologies of the seventeeene">n ML trees (Figs. S1 and S2) by computiene">ng their pairwise distances followed by a multivariate analysis to group similar trees (Fig. 2). The seven class="Chemical">nteen trees were subdivided into seven groups, with the largest including nWSP-containing segment-based trees (S1, 3, 5, 7, 9, 11, 13, 15 and 17; Fig. 2, Group 7). Given the low genetic variation in these segments, the resulting trees were poorly resolved, suggesting that such regions represent a wide mutant spectrum of narrowly shared polymorphisms. It is important to note that there are minor clusters in nWSP-containing segment-based ML trees, e.g., in those for S1, S13 and S17. This is a consequence of our conservative threshold, as we focused on segments with π ≥ 0.001. S1, S13 and S17 also show locally increased genetic variation with π values higher than 0.0005 but lower than 0.001, for example, stretches 916–1196, 1436–1536 (within S1), 25,430–25,720 (S13), 29,565–29,637 (S17) (Fig. 1).

Figure 2

Multidimensional scaling (MDS) visualization of tree distances based on the Kendall-Colijn metric (λ = 0). The seventeen ML trees (each with 593 tips) are represented as dots, and groups of trees showing similar topologies are indicated by the same colour. The WSP-containing segment-based trees formed six groups: the first group comprised S2, S8 and S12 (indicated in blue), while the other five were represented by single trees (groups 2–6 indicated in red, green, orange, purple and brown, respectively). All nWSP-containing segment-based ML trees formed a single group, indicated in pink.

Multidimensional scaling (MDS) visualizatioene">n of tree distances based oene">n the Keene">ndall-Colijene">n metric (λ = 0). The seven class="Chemical">nteen ML trees (each with 593 tips) are represented as dots, and groups of trees showing similar topologies are indicated by the same colour. The WSP-containing segment-based trees formed six groups: the first group comprised S2, S8 and S12 (indicated in blue), while the other five were represented by single trees (groups 2–6 indicated in red, green, orange, purple and brown, respectively). All nWSP-containing segment-based ML trees formed a single group, indicated in pink. The S2-, S8- and S12-based ML trees (Fig. 2, Group 1) were coene">nsiderably coene">ngruen class="Chemical">nt, and the nucleotides at their WSPs tended to co-segregate (UUG or CCA, Table 1), which resulted in two major subtypes of SARS-CoV-2 (Figs. S1a, 1d and 1f). Reciprocally, the incongruency among the S4-, 6-, 10-, 14- and 16-based trees (Fig. 2, Groups 2–6) suggests the segregation of nucleotides at their WSPs, which increases the possible combinations of virus genotypes. Therefore, our approach reduced the complexity of the sequence space occupied by the SARS-CoV-2 geene">nomes and provided a robust clusteriene">ng solutioene">n based oene">n the combiene">natioene">n of 12 n class="Gene">WSPs (Table 1) to barcode the major viral genotypes spread worldwide (Table 2 and Table S2). The global population of SARS-CoV-2 is structured into six major subtypes (I–VI), comprising 578 of 593 (approximately 97.5%) isolates analysed in this study. Subtype I (N = 132) was represented by the combination of the most frequent nucleotides at all WSPs, i.e., the canonical genotype CCGCCACAUGGG. The SARS-CoV-2 reference genome (GISAID accession ID: EPI_ISL_402124, GenBank accession: MN908947) is a representative member of this subtype. Subtype IV (N = 91) was represented by the combination of the most frequent nucleotides at eleven of 12 WSPs (U; the most frequent nucleotides at each WSP are highlighted in bold and underlined). Subtypes V (N = 74, UC), II (N = 122, UUG), III (N = 101, UUGUC) and VI (N = 58, UUGAAC) were represented by the combination of the most frequent nucleotides at ten, nine, seven and six of 12 WSPs, respectively.

Table 2

Unique genotypes of SARS-CoV-2 based on 12 WSPs and their associated amino acid replacements.

aSample size. bSegment containing the WSP. cNucleotide position relative to the reference genome (GISAID accession ID: EPI_ISL_402124). dNucleotide base and the encoded amino acid residue.

Unique genotypes of SARS-CoV-2 based on 12 n class="Gene">WSPs and their associated amino acid replacements. aSample size. bSegment con class="Chemical">ntaining the WSP. cNucleotide position relative to the reference genome (GISAID accession ID: EPI_ISL_402124). dNucleotide base and the encoded amino acid residue. It is important to note the intrinsic wide geographical coverage of these subtypes, since they were sampled from distinct countries or even continents, which describes the viral spread at a global scale (Fig. 3). A dynamic map of the spatial–temporal spread of isolates of the six subtypes of SARS-CoV-2 is available as a Microreact project (https://microreact.org/project/f25A3jAvE5TjzxAf38UCEq). Another important feature is that they are predominantly composed of genomes sequenced from original samples, minimizing any mutational bias due to in vitro virus replication (Fig. 4). Studies on the mutational dynamics of SARS-CoV-2 in cell culture have not been conducted thus far; however, previous studies on the mutational dynamics of SARS-CoV indicated a negligible mutation frequency after five serial Vero-E6 cell passages[16].

Figure 3

Geographical distribution of six subtypes of SARS-CoV-2 around the world. The genomic data set comprised isolates sampled from 40 distinct countries from December 24, 2019 to March 20, 2020. The pie charts show the proportion of each subtype of SARS-CoV-2 according to a colour key in the figure bottom. For more detailed information on virus spread, a dynamic map is available at https://microreact.org/project/f25A3jAvE5TjzxAf38UCEq (accessible via the QR code in the bottom left corner of the map).

Figure 4

Maximum likelihood phylogenetic tree based on 12 WSPs detected across the SARS-CoV-2 genomes. The background colour of the tips indicates the subtype (I–VI) or tentative subtype (VII–XVI). An outer strip indicates the geographic origin (Western or Eastern Hemisphere) and whether each isolate was subjected to intermediate cell culture passages before genome sequencing.

Geographical distribution of six subtypes of SARS-CoV-2 around the world. The geene">nomic data set comprised isolates sampled from 40 distinct countries from December 24, 2019 to March 20, 2020. The pie charts show the proportion of each subtype of SARS-CoV-2 according to a colour key in the figure bottom. For more detailed information on virus spread, a dynamic map is available at https://microreact.org/project/f25A3jAvE5TjzxAf38UCEq (accessible via the QR code in the bottom left corner of the map). Maximum likelihood phylogenetic tree based on 12 WSPs detected across the n class="Species">SARS-CoV-2 genomes. The background colour of the tips indicates the subtype (I–VI) or tentative subtype (VII–XVI). An outer strip indicates the geographic origin (Western or Eastern Hemisphere) and whether each isolate was subjected to intermediate cell culture passages before genome sequencing. Ten additional viral genotypes were poorly represented and, therefore, referred to as ten class="Chemical">ntative subtypes (Table 2). This category of tentative subtypes would be useful due to the continuous addition of genomes to public databases, where more representative members might be sampled from a wider geographical context. This conservative proposal would keep the inclusive nature of our clustering method, being able to incorporate a large fraction of the SARS-CoV-2 genetic variation at a global scale. The WSP-based phylogeene">netic tree depicting all 593 n class="Species">SARS-CoV-2 haplotypes (Fig. 4) showed some geographical structure with two clusters: a smaller cluster comprising isolates mostly sampled from the Western Hemisphere (Subtypes II and VI; tentative Subtypes IX, X and XI) and a larger cluster comprising isolates sampled from the Western and Eastern Hemispheres (Subtypes I, III, IV and V; tentative Subtypes VII, VIII, XII–XVI). The co-segregation of nucleotides at WSPs nsp3-[3,037], nsp12-[14,408] and S-[23,403] (Fig. 2) was responsible for the geographical structure in our ML tree (Fig. 4). The mutation nsp3U3,037C led to synonymous codons for phenylalanine in haplotypes from both clades, while the mutation nsp12U14,408C was non-synonymous, leading to leucine and proline in haplotypes from Western and Western/Eastern Hemisphere clades, respectively (Table 2). The SG23,403A mutation led to non-synonymous codons for glycine and aspartate in haplotypes from Western and Western/Eastern Hemisphere clades, respectively. Together, these results suggest the predominance of NSP12L323 and SpikeG614 in the Western Hemisphere. The WSP-based tree (Fig. 4) included oene">nly 12 n class="Chemical">nt sites, which represented 0.04% of the full-length multiple sequence alignment (29,412 nt after trimming the poor-quality 5′ and 3′ untranslated regions). Notably, the topologies of the WSP-based tree and the tree based on full-length genomes (Fig. S3) were highly similar, indicating a parsimonious approach for directly identifying the most informative sites in these viral genomes. Virus barcoding and phylogenetic approaches have been conducted for SARS-CoV[17] and n class="Species">Middle East respiratory syndrome coronavirus (MERS-CoV)[18], respectively. Due to the limited geographical coverage of the MERS-CoV epidemic, a study focusing on a region of Saudi Arabia tracked and distinguished two main genotypes of circulating MERS-CoV. On the other hand, 174 polymorphic loci in 101 complete genomes and 44 partial sequences of SARS-CoV allowed to further subdivide its population into two previously identified genotypes (named C and T)[19] and an additional eight “subgenotypes” (C1–C4 and T1–T4) due to 10 special loci or informative sites[17]. Interestingly, genotype C was compatible with the virus isolated during regional transmission and the early and intermediate phases of the SARS-CoV epidemic in the years 2003–2004, while genotype T was compatible with a viral type of international transmission from the late stage of virus spread[20,21]. These results are similar to those of our study, which subdivided the genomes into major lineages of SARS-CoV-2. Therefore, the epidemic subdivisions of SARS-CoV and SARS-CoV-2 are apparent and similar because of possible viral adaptation as the virus spreads throughout the world. Studies have been conducted to discriminate closely related bacterial taxa using Shannon entropy as a metric of sequeene">nce iene">nformatioene">n con class="Chemical">ntent[22]. A similar approach was recently applied to track the geographical and temporal dynamics of SARS-CoV-2[23]. In the latter study, analogous to our 12 WSP-based genotypes, 17 informative subtype markers (ISMs) were employed and revealed nine subtypes as the most represented in the overall virus population. All 12 WSPs detected in our study were identified as ISM sites, which indicates that nucleotide diversity is also an informative metric with which to search for subtype signatures. Notably, the ISMs were proposed to track the virus spread within and between countries and/or continents, while our WSP-based approach was focused on highlighting the potential biological implications of such founding mutations since nine of 12 lead to non-synonymous replacements at the protein level. It is important to note that the mutatioene">ns at n class="Gene">WSPs nsp13-[17,747 and 17,858] were responsible for the segregation of Subtype III, that is, were redundant, and only one would be sufficient to reproduce the segregation into six major and 10 tentative subtypes. As a consequence, the set of WSPs necessary for subdivision identical to that shown in this study might be reduced to 11 nucleotide bases. However, we kept such informative sites according to the proposed threshold in our fine-scale genetic variation analyses, and they might be useful in a scenario where novel subtypes (partially similar to the genotype of Subtype III) are included due to the combined efforts of many countries to sequence thousands of SARS-CoV-2 genomes. We hypothesize that our clustering method for the SARS-CoV-2 populatioene">n could iene">nvolve a biological con class="Chemical">ntext to some extent. The WSP nsp6-[11,083] in Subtype IV of SARS-CoV-2 led to phenylalanine at aa residue #37 of the protein and leucine in five other subtypes (Table 2). NSP6 is an integral membrane protein that interferes with autophagosome formation during SARS-CoV infection. Additionally, in yeast two-hybrid experiments[24], NSP6 has been shown to interact with NSP3. Some evidence demonstrates that NSP6 protein limits the expansion of autophagosomes or, alternatively, might remove host proteins involved in the inhibition of viral replication by activating autophagy from the ER[25]. The WSP nsp12-[14,408] resulted in n class="Chemical">proline in four subtypes of SARS-CoV-2 and leucine in two other subtypes at aa residue #323 of the NSP12 (RNA-dependent RNA polymerase, RdRP) protein. This WSP is located at the interface domain of RdRP of SARS-CoV-2, which is responsible for the connection between the nidovirus RdRP-associated nucleotidyltransferase domain (NiRAN) and the “right hand” polymerase domain[26]. The S protein mediates viral entry into host cells by first binding to a receptor, angiotensin-converting enzyme 2 (ACE2), through the receptor-binding domain (RBD) in the S1 subunit and then fusing the viral and host membranes through the S2 subunit[27-30]. Sites of glycosylation are important for S protein folding[31], affecting priming by host proteases[32] and might modulate antibody recognition[33,34]. The WSP S-[23,403] resulted in glycine and aspartate at aa residue #614 of the S protein in two and four subtypes of SARS-CoV-2, respectively. The replacement was mapped to the intermediate region between the S1 and S2 subunits. This WSP is near a glycosylation site (N616CT)[35]. The WSP n class="Gene">ORF8-[28,144] involved a non-synonymous mutation at codon #84 encoding leucine and serine in four and two subtypes, respectively. SARS-CoV ORF8 encodes an ER-associated protein that induces Activation of Transcription Factor 6 (ATF6), which is an ER stress-regulated transcription factor that stimulates the production of chaperones[36]. The ORF8 protein has also been demonstrated to induce apoptosis[37]. In the SARS epidemic, ORF8 was targeted by a number of mutations and recombination events during transmission from non-human animals to humans[38]. Three consecutive WSPs mapped in the N geene">ne led to two amino acid replacements at residues #203 and #204. The multifunctional N protein is composed of three domains[39], two of which are structurally independent: the N-terminal domain (NTD) and the C-terminal domain (CTD). Both amino acid replacements were mapped to an intermediary domain referred to as the linker region (LKR), a positively charged serine-arginine-rich region. As an intrinsically disordered region (IDR), it allows the independent folding of the NTD and CTD[40] and is also functionally implicated in RNA binding activity[39]. Key determinants of the interaction between the N and NSP3 proteins were also mapped to the LKR[41]. The SARS-CoV N protein is also responsible for an antigenic response in humans predominantly involving immunoglobulin G[42]. Although the host biological factors involved in the response to SARS-CoV-2 infection are still poorly known, the existence of distinct virus subtypes, all of them exhibiting amino acid replacements, could affect important aspects of COVID-19. We hypothesized that in the early stages of the SARS-CoV-2 epidemic, due to rapid virus populatioene">n expansioene">n, a number of geene">netic variants might have arisen, followed by their spread to other countries and continents. We argue that the virus subtypes and their associated WSPs detected in this study could serve as records of diversification in these early stages of the epidemic after transmission from non-human animals to humans. After virus introduction to a given geographic region, a number of unique or narrowly shared mutations accumulate; however, most of them reduce fitness and are removed by purifying selection on a medium- to long-term evolutionary scale, tending to decrease genetic variability[8]. Therefore, we propose classifying SARS-CoV-2 in class="Chemical">nto at least six distinct subtypes accounting for more than 97% of the isolates sampled from around the world. Such classification might guide the validation of candidate vaccines or drugs for the widest range of virus subtypes. In this context, our clustering solution provides a robust approach for effectively reducing the complexity of the mutant spectrum involving closely related SARS-CoV-2 genomes and a focus on WSPs. Additionally, through exhaustive sequencing, it would be possible to change the tentative status of the ten genotypes described in this study or even identify novel virus subtypes and follow the evolutionary dynamics of the SARS-CoV-2 population during the adaptation process imposed by the human host.

Methods

A total of 1,137 full-length genomes of SARS-CoV-2 were obtaiene">ned from n class="Gene">GenBank[43] and GISAID[44] (Table S1) on March 25, 2020, and comprised virus isolates sampled from December 24, 2019, to March 20, 2020. Only genomes with high sequencing coverage, intact ORFs (no frameshifts, except that of the nsp12 cistron) and no indeterminate nucleotide bases (indicated by ‘N’s or ambiguous codes), totalling 767 high-quality full-length sequences, were effectively analysed in this study. We wish to acknowledge all researchers who deposited the SARS-CoV-2 genomes in the GISAID and/or GenBank database. The genomic data set was aligned using MAFFT-FFT-NS-2[45]. The calculation of the average number of nucleotide differences per site (nucleotide diversity, π) was conducted in DnaSP v.6[46] using sliding window and step sizes of 300 and 20 nucleotides, respectively. Sites with gap alignment were not coene">nsidered in the analysis. Maximum likelihood (ML) phylogenetic trees were constructed using RAxML[47] under the general time-reversible with gamma distribution (GTRGAMMA) nucleotide substitution model. The branch support for ML trees based on 300 nucleotides and larger segments was assessed with 1000 and 5000 bootstrap replicates, respectively. The ML tree for full-leene">ngth geene">nomes was based oene">n a multiple aligene">nmen class="Chemical">nt whose 5′ and 3′ untranslated regions were trimmed. ML trees were used in this study essentially as a clustering method due to the weak phylogenetic signal in the data set. All phylogenetic trees were edited using iTOL[48]. To assess the similarity among ML-tree topologies, we computed all possible pairwise distances by applying the Kendall–Colijn metric[49], followed by principal coordinate analysis (PCoA), using the package treespace[50] in R[51]. The detection of polymorphic sites was conducted using PAUP* v. 4.0[52] and MEGA X[53]. The sites responsible for the segregation of the isolates into two clusters iene">n the ML trees were referred to as “widely shared polymorphisms” (n class="Gene">WSPs), while the remaining nt positions in the virus genomes were designated as “non-widely shared polymorphisms” (nWSPs). The WSP positions were relative to the reference genome (GISAID accession ID: EPI_ISL_402124). A Microreact project v70.0.0 was created for the metadata in a dynamic user interface[54]. In class="Chemical">nteractive visualization makes it possible to track virus sampling from a spatial–temporal perspective. A QR code for the interactive map was generated using the R package qrcode[55]. Supplementary Figure 1a. Supplementary Figure 1b. Supplementary Figure 1c. Supplementary Figure 1d. Supplementary Figure 1e. Supplementary Figure 1f. Supplementary Figure 1g. Supplementary Figure 1h. Supplementary Figure 2a. Supplementary Figure 2b. Supplementary Figure 2c. Supplementary Figure 2d. Supplementary Figure 2e. Supplementary Figure 2f. Supplementary Figure 2g. Supplementary Figure 2h. Supplementary Figure 2i. Supplementary Figure 3. Supplementary Table 1. Supplementary Table 2. Supplementary Iene">nformation.

46 in total

1. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors: Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal: Nucleic Acids Res Date: 2002-07-15 Impact factor: 16.971

2. Open reading frame 8a of the human severe acute respiratory syndrome coronavirus not only promotes viral replication but also induces apoptosis.

Authors: Chia-Yen Chen; Yueh-Hsin Ping; Hsin-Chen Lee; Kuan-Hsuan Chen; Yuan-Ming Lee; Yu-Juin Chan; Te-Cheng Lien; Tjin-Shing Jap; Chi-Hung Lin; Lung-Sen Kao; Yi-Ming Arthur Chen
Journal: J Infect Dis Date: 2007-06-19 Impact factor: 5.226

3. Severe acute respiratory syndrome-associated coronavirus genotype and its characterization.

Authors: Lanjuan Li; Zhigang Wang; Yiyu Lu; Qiyu Bao; Suhong Chen; Nanping Wu; Suyun Cheng; Jingqing Weng; Yanjun Zhang; Juying Yan; Lingling Mei; Xiaomeng Wang; Hanping Zhu; Yingpu Yu; Minli Zhang; Minhong Li; Jun Yao; Qunying Lu; Pingping Yao; Xiaochen Bo; Jianer Wo; Shengqi Wang; Songnian Hu
Journal: Chin Med J (Engl) Date: 2003-09 Impact factor: 2.628

4. Phylogeny of SARS-CoV as inferred from complete genome comparison.

Authors: Zhen Qi; Yu Hu; Wei Li; Yanjun Chen; Zhihua Zhang; Shiwei Sun; Hongchao Lu; Jingfen Zhang; Dongbo Bu; Lunjiang Ling; Runsheng Chen
Journal: Chin Sci Bull Date: 2003

Review 5. Coronavirus transcription: a perspective.

Authors: S G Sawicki; D L Sawicki
Journal: Curr Top Microbiol Immunol Date: 2005 Impact factor: 4.291

6. A new coronavirus associated with human respiratory disease in China.

Authors: Fan Wu; Su Zhao; Bin Yu; Yan-Mei Chen; Wen Wang; Zhi-Gang Song; Yi Hu; Zhao-Wu Tao; Jun-Hua Tian; Yuan-Yuan Pei; Ming-Li Yuan; Yu-Ling Zhang; Fa-Hui Dai; Yi Liu; Qi-Min Wang; Jiao-Jiao Zheng; Lin Xu; Edward C Holmes; Yong-Zhen Zhang
Journal: Nature Date: 2020-02-03 Impact factor: 49.962

7. Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003.

Authors: Vinsensius B Vega; Yijun Ruan; Jianjun Liu; Wah Heng Lee; Chia Lin Wei; Su Yun Se-Thoe; Kin Fai Tang; Tao Zhang; Prasanna R Kolatkar; Eng Eong Ooi; Ai Ee Ling; Lawrence W Stanton; Philip M Long; Edison T Liu
Journal: BMC Infect Dis Date: 2004-09-06 Impact factor: 3.090

8. Severe acute respiratory syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles.

Authors: Megan M Angelini; Marzieh Akhlaghpour; Benjamin W Neuman; Michael J Buchmeier
Journal: mBio Date: 2013-08-13 Impact factor: 7.867

9. GenBank.

Authors: Eric W Sayers; Mark Cavanaugh; Karen Clark; James Ostell; Kim D Pruitt; Ilene Karsch-Mizrachi
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor.

Authors: Markus Hoffmann; Hannah Kleine-Weber; Simon Schroeder; Nadine Krüger; Tanja Herrler; Sandra Erichsen; Tobias S Schiergens; Georg Herrler; Nai-Huei Wu; Andreas Nitsche; Marcel A Müller; Christian Drosten; Stefan Pöhlmann
Journal: Cell Date: 2020-03-05 Impact factor: 41.582

18 in total

1. Phylogenomics and population genomics of SARS-CoV-2 in Mexico during the pre-vaccination stage reveals variants of interest B.1.1.28.4 and B.1.1.222 or B.1.1.519 and the nucleocapsid mutation S194L associated with symptoms.

Authors: Francisco Barona-Gómez; Luis Delaye; Erik Díaz-Valenzuela; Fabien Plisson; Arely Cruz-Pérez; Mauricio Díaz-Sánchez; Christian A García-Sepúlveda; Alejandro Sanchez-Flores; Rafael Pérez-Abreu; Francisco J Valencia-Valdespino; Natali Vega-Magaña; José Francisco Muñoz-Valle; Octavio Patricio García-González; Sofía Bernal-Silva; Andreu Comas-García; Angélica Cibrián-Jaramillo
Journal: Microb Genom Date: 2021-11

2. A survey on the correlation between PM_2.5 concentration and the incidence of suspected and positive cases of COVID-19 referred to medical centers: A case study of Tehran.

Authors: Fallah Hashemi; Lori Hoepner; Farahnaz Soleimani Hamidinejad; Alireza Abbasi; Sima Afrashteh; Mohammad Hoseini
Journal: Chemosphere Date: 2022-04-19 Impact factor: 8.943

3. Genomic Surveillance of SARS-CoV-2 in a University Community: Insights Into Tracking Variants, Transmission, and Spread of Gamma (P.1) Variant.

Authors: Ilinca I Ciubotariu; Jack Dorman; Nicole M Perry; Lev Gorenstein; Jobin J Kattoor; Abebe A Fola; Amy Zine; G Kenitra Hendrix; Rebecca P Wilkes; Andrew Kitchen; Giovanna Carpi
Journal: Open Forum Infect Dis Date: 2022-05-26 Impact factor: 4.423

4. Genetic diversity and genomic epidemiology of SARS-CoV-2 in Morocco.

Authors: Bouabid Badaoui; Khalid Sadki; Chouhra Talbi; Driss Salah; Lina Tazi
Journal: Biosaf Health Date: 2021-02-03

Review 5. SARS-CoV-2, the pandemic coronavirus: Molecular and structural insights.

Authors: Swapnil B Kadam; Geetika S Sukhramani; Pratibha Bishnoi; Anupama A Pable; Vitthal T Barvkar
Journal: J Basic Microbiol Date: 2021-01-18 Impact factor: 2.281

Review 6. The Genetic Variant of SARS-CoV-2: would It Matter for Controlling the Devastating Pandemic?

Authors: Shuxin Guo; Kefang Liu; Jun Zheng
Journal: Int J Biol Sci Date: 2021-04-10 Impact factor: 6.580

7. A Study of 3CLpros as Promising Targets against SARS-CoV and SARS-CoV-2.

Authors: Seri Jo; Suwon Kim; Jahyun Yoo; Mi-Sun Kim; Dong Hae Shin
Journal: Microorganisms Date: 2021-04-03

8. A global analysis of conservative and non-conservative mutations in SARS-CoV-2 detected in the first year of the COVID-19 world-wide diffusion.

Authors: Nicole Balasco; Gianluca Damaggio; Luciana Esposito; Flavia Villani; Rita Berisio; Vincenza Colonna; Luigi Vitagliano
Journal: Sci Rep Date: 2021-12-30 Impact factor: 4.379

9. SARS-CoV-2 Gastrointestinal Infection Prolongs the Time to Recover From COVID-19.

Authors: Zhijie Xu; Meiwen Tang; Ping Chen; Hongyu Cai; Fei Xiao
Journal: Front Med (Lausanne) Date: 2021-06-04

Review 10. Advances in the relationship between coronavirus infection and cardiovascular diseases.

Authors: Mengmeng Zhao; Menglong Wang; Jishou Zhang; Jing Ye; Yao Xu; Zhen Wang; Di Ye; Jianfang Liu; Jun Wan
Journal: Biomed Pharmacother Date: 2020-05-13 Impact factor: 7.419