| Literature DB >> 32850499 |
Intikhab Alam1, Allan A Kamau1, Maxat Kulmanov1, Łukasz Jaremko2, Stefan T Arold1,3, Arnab Pain2,4, Takashi Gojobori1, Carlos M Duarte1.
Abstract
The spread of the novel coronavirus (Entities:
Keywords: ARDS (acute respiratory distress syndrome); COVID-19; E protein Inhibitors; SARS-CoV-2; envelope protein (E); therapeutic targets
Mesh:
Substances:
Year: 2020 PMID: 32850499 PMCID: PMC7396417 DOI: 10.3389/fcimb.2020.00405
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Reference genomes included in this study from taxonomic genus of Betacoronaviruses.
| Bovine coronavirus (AF391541) | Bovine | 11128 | 2002 | |
| Murine hepatitis virus (AY700211) | Rat | 11138 | 2006 | |
| Murine hepatitis virus (AF029248) | Rat | 11138 | 2000 | |
| Human coronavirus OC43 (AY585228) USA | Human | 31631 | 2004 | |
| Human coronavirus HKUl (AY597011) | Human | 290028 | 2006 | |
| Bat coronavirus HKU4-1 (EF065505) China | Bat | 424359 | 2007 | |
| Bat coronavirus HKU5-1 (EF065509) China | Bat | 424363 | 2007 | |
| Bat coronavirus HKU9-1 (EF065513) China | Bat | 424367 | 2007 | |
| Rat coronavirus Parker (FJ938068) | Rat | 502102 | 2009 | |
| SARS (AY274119) Canada: Toronto | Human | 694009 | 2017 | |
| Rabbit coronavirus HKU14 (JN874559) China | Rabbit | 1160968 | 2012 | |
| MERS (JX869059). | Human | 1235996 | 2012 | |
| MERS (KC164505) United Kingdom | Human | 1263720 | 2013 | |
| MERS (KF600630) Saudi Arabia | Human | 1335626 | 2014 | |
| Hedgehog | 1385427 | 2014 | ||
| Hedgehog | 1385427 | 2014 | ||
| Bat SARS-1ike coronavirus (MG772933) China | Bat | 1508227 | 2018 | |
| Bat Hp- | Bat | 1541205 | 2017 | |
| Rat | 1590370 | 2015 | ||
| Rousettus bat coronavirus (KU762338) China | Bat | 1892416 | 2016 | |
| SARS-CoV-2 (MN988713) USA | Human | 2697049 | 2020 | |
| SARS-CoV-2 (MN985325) USA | Human | 2697049 | 2020 | |
| SARS-CoV-2 (MN975262) China | Human | 2697049 | 2020 | |
| SARS-CoV-2 (MN938384) China | Human | 2697049 | 2020 |
Genomes with same taxon id are highlighted in yellow.
Figure 1Pangenome core and accessory gene clusters based on sequence comparison of genes from genomes of genus Betacoronavirus. Core clusters appear in all genomes and accessory clusters appear in subset of genomes. Additional annotations such as Protein Family (PFAM) Ids, as shown for SARS-CoV-2 related clusters, were obtained using Automatic Annotation of Microbial Genomes (AAMG) pipeline (see methods). (A) Thirty-seven gene clusters are shown with a binary heatmap representing presence (black) or absence (white) of genes in Betacoronavirus clades 1, 2, 3, and 4. Clade 2 is expanded to show presence absence of genes for its members that include SARS-CoV-2, SARS, and two bat coronaviruses (MG772933 and KF636752). One of the gene cluster, orf10, marked in yellow, is a case of annotation artifact as it appears to be unique in SARS-CoV-2 according to annotations from GenBank, this gene is not predicted in any other genomes, however a TBLASTN search of this protein against NCBI's Nucleotide database (NT) show sequence matches this gene with 100% coverage in other SARS and SARS-like coronaviruses. (B) This panel shows a phylogenetic tree based on SNPs from core (M, N, orf1ab, and S) gene clusters. Tree is labeled with Clade numbers to distinguish SARS-like and other coronaviruses. Coloring of the tree is obtained based on related host information.
Figure 2Pangenome analysis of 3 clusters related E protein. (A) The first E cluster, it shows much similar E proteins from SARS and SARS-like genomes, highlighted in the species tree (A left panel) alongside a gene tree (A right panel). (B). Protein alignment of SARS and SARS-like E protein cluster. It includes SARS (AY274119), SARS-COV-2 (MN985325 and other isolates), and two bat coronaviruses (MG772933 and KF636752). The two important features are the ion-channel forming amino acids (where N15 and V25 were shown to be key for function) and the PBM class II motif (DLLV). Both features are completely conserved in SARS and SARS-CoV-2. (C) Protein alignment from another E cluster that groups E sequences from clade 3 and 4, including MERS and coronaviruses from other animals. (D) Protein alignment of 3rd E cluster that groups sequences from clade 1, related to two bat coronaviruses. (E) Theoretical model for the SARS-nCoV-1 protein E pentamer. Left Panel: side view. The membrane is illustrated in pale orange, and membrane (MEM), luminal side (LUM), and cytoplasmic side (CYT) are labeled. Right panel: view from the cytoplasm. One chain (blue) highlights protein features. N, C: location of N and C terminus, respectively. Yellow: IC; orange: DLLV (side chains are shown as stick models). Location of the residues changed in SARS-CoV-2 E are labeled and shown in cyan. The protein structure was modeled based on the NMR model PDB id 5 × 29, and completed using our in-house modeling program (to be published). The position of the region C-terminal to K53 was adjusted compared to the NMR model (see Supplementary Figure S6) to avoid its positioning entirely within the membrane, which appears unlikely given its amino acid composition (in particular R61, K63, and N64).
Figure 3Prevalent mutations detected from comparing over 2,000 high quality SARS-CoV-2 isolates. Mutations are labeled here as gene name followed by potential amino-acid mutation and nucleotide mutation, joined by colon character, “:”. Majority of prevalent mutations were detected in gene orf1ab. Among non-synonymous mutations two mutations, one in orf1ab and one in Spike protein, S, appear in over 1,000 isolates. A full list of mutations detected are shown in Supplementary Table 1.