| Literature DB >> 32814791 |
M Rafiul Islam1, M Nazmul Hoque1,2, M Shaminur Rahman1, A S M Rubayet Ul Alam3, Masuda Akther1, J Akter Puspo1, Salma Akter1,4, Munawar Sultana1, Keith A Crandall5, M Anwar Hossain6,7.
Abstract
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a novel evolutionary divergent RNA virus, is responsible for the present devastating COVID-19 pandemic. To explore the genomic signatures, we comprehensively analyzed 2,492 complete and/or near-complete genome sequences of SARS-CoV-2 strains reported from across the globe to the GISAID database up to 30 March 2020. Genome-wide annotations revealed 1,516 nucleotide-level variations at different positions throughout the entire genome of SARS-CoV-2. Moreover, nucleotide (nt) deletion analysis found twelve deletion sites throughout the genome other than previously reported deletions at coding sequence of the ORF8 (open reading frame), spike, and ORF7a proteins, specifically in polyprotein ORF1ab (n = 9), ORF10 (n = 1), and 3´-UTR (n = 2). Evidence from the systematic gene-level mutational and protein profile analyses revealed a large number of amino acid (aa) substitutions (n = 744), demonstrating the viral proteins heterogeneous. Notably, residues of receptor-binding domain (RBD) showing crucial interactions with angiotensin-converting enzyme 2 (ACE2) and cross-reacting neutralizing antibody were found to be conserved among the analyzed virus strains, except for replacement of lysine with arginine at 378th position of the cryptic epitope of a Shanghai isolate, hCoV-19/Shanghai/SH0007/2020 (EPI_ISL_416320). Furthermore, our results of the preliminary epidemiological data on SARS-CoV-2 infections revealed that frequency of aa mutations were relatively higher in the SARS-CoV-2 genome sequences of Europe (43.07%) followed by Asia (38.09%), and North America (29.64%) while case fatality rates remained higher in the European temperate countries, such as Italy, Spain, Netherlands, France, England and Belgium. Thus, the present method of genome annotation employed at this early pandemic stage could be a promising tool for monitoring and tracking the continuously evolving pandemic situation, the associated genetic variants, and their implications for the development of effective control and prophylaxis strategies.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32814791 PMCID: PMC7438523 DOI: 10.1038/s41598-020-70812-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Genomic deletion analysis of SARS-CoV-2. Genomic deletion analysis of SARS-CoV-2 strains identified (a) 24 (nt) deletions in NSP1 in a Japanese strain; (b) 15-nt deletions in NSP1 of viral strains from USA, Japan and the Netherlands alongside three-nt deletions in USA and Netherlands; (c) three-nt deletion in NSP1 of American strains and very adjacent to that, nine-nucleotide deletion of strains from the USA, England and Canada and Iceland; (d) three-nucleotide deletions in NSP2 were observed in 99 strains from Netherlands, England, Portugal, Slovakia, Iceland, Wales, France and New Zealand (representatives from each countries were shown); (e) NSP8 undergoes three-nt deletion in Netherlandian/dutch/hollanders strains; (f) three-nucleotide deletion in NSP15 of USA strain; (g-g1) 35nt deletion, including start codon position of ORF10 of Spain strains, and the start codon in spacer position, has been used for ORF10 coding; and as a result, (g-g2) five aa residues deletion in those strains starting from position 1 to 5. Deletion of (h) 29-nt reported from Wuhan, and (i) 10-nt in 3′-UTR of strains belonged to Australia. The position of nt represents the starting position from each ORFs, for instance, position of ORF1ab was considered for the NSPs. MAFFT online tool[36] was used for alignment, and Unipro-UGENE[40] used for visualization.
Figure 2Amino-acid (aa) residues heterogeneity in (a) S, (b) E and (c) M proteins of SARS-CoV-2. The Fingerprint[39] protein analysis showed that aa residues in S, M and E proteins varied due to change and/or substitutions in their positions.
Figure 3The frequency spectra of amino-acid mutations in 2,492 SARS-CoV-2 complete genome sequences, and its impact on mortality rates. Amino-acid (aa) mutations found in the open reading frames (ORFs) of the SARS-CoV-2 genomes according to (a) geographic areas and (b) different climate zones. We found six core shared mutations (at residue position of R203K, G204R, G251V, L3606F, P4714L, D614G) in Asia, Europe, Africa, Australia, North America, and South America, and four core shared mutations (at residue position of Q57H, D614G, L3606F, and P4714L) in continental, diverse, dry, tropical and temperate conditions. In both cases (a and b), yellow circles represent frequency of aa substitutions shared by all categories, and the frequency of aa substitutions shared by at least two continents/climate zones are shown in green circle, while the pink colored ribbons indicate unique aa mutations in each individual regional and climate zone. (c) Estimated case fatality rates of SARS-CoV infections in 45 countries of the globe according to climatic variations. Numbers in parentheses indicate total confirmed cases/total deaths according to the World Health Organization (WHO) as of 30th March, 2020. Organization (WHO) as of 30th March, 2020. Data were finally visualized using custom Venn diagrams online tool (https://bioinformatics.psb.ugent.be/webtools/Venn/) along with illustrator CC 2019 trial version.