| Literature DB >> 31987001 |
Jasper Fuk-Woo Chan1,2,3,4, Kin-Hang Kok1,3,4, Zheng Zhu3, Hin Chu1,3,4, Kelvin Kai-Wang To1,2,3,4, Shuofeng Yuan1,3,4, Kwok-Yung Yuen2,3,4.
Abstract
A mysterious outbreakEntities:
Keywords: Coronavirus; SARS; Wuhan; bioinformatics; emerging; genome; respiratory; virus
Mesh:
Substances:
Year: 2020 PMID: 31987001 PMCID: PMC7067204 DOI: 10.1080/22221751.2020.1719902
Source DB: PubMed Journal: Emerg Microbes Infect ISSN: 2222-1751 Impact factor: 7.163
List of coronaviruses used in this study.
| Accession number | Name displayed on the tree | Name of full-length genome | Year |
|---|---|---|---|
| AY274119 | Human SARS-CoV Tor2 2003 | SARS-related coronavirus isolate Tor2 | 2003 |
| AY278488 | Human SARS-CoV BJ01 2003 | SARS coronavirus BJ01 | 2003 |
| AY278491 | SARS coronavirus HKU-39849 2003 | SARS coronavirus HKU-39849 2003 | 2003 |
| AY390556 | Human SARS-CoV GZ02 2003 | SARS coronavirus GZ02 | 2003 |
| AY391777 | Human CoV OC43 2003 | Human coronavirus OC43 | 2003 |
| AY515512 | Paguma SARS CoV HC/SZ/61/03 2003 | SARS coronavirus HC/SZ/61/03 (paguma SARS) | 2018 |
| EF065513 | Bat CoV HKU9-1 2006 | Bat coronavirus HKU9-1 | 2006 |
| FJ588686 | Bat SL-CoV Rs672 2006 | Bat SARS CoV Rs672/2006 | 2006 |
| KC881005 | Bat SL-CoV RsSHC014 2013 | Bat SARS-like coronavirus RsSHC014 | 2013 |
| KC881006 | Bat SL-CoV Rs3367 2013 | Bat SARS-like coronavirus Rs3367 | 2013 |
| KY417146 | Bat SL-CoV Rs4231 2016 | Bat SARS-like coronavirus isolate Rs4231 | 2016 |
| KY417149 | Bat SL-CoV Rs4255 2016 | Bat SARS-like coronavirus isolate Rs4255 | 2016 |
| MG772933 | Bat SL-CoV ZC45 2018 | Bat SARS-like coronavirus isolate bat-SL-CoVZC45 | 2018 |
| MG772934 | Bat SL-CoV ZXC21 2018 | Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 | 2018 |
| MK211377 | Bat CoV YN2018C 2018 | Coronavirus BtRs-BetaCoV/YN2018C | 2018 |
| MK211378 | Bat CoV YN2018D 2018 | Coronavirus BtRs-BetaCoV/YN2018Da | 2018 |
| MN975262 | HKU-SZ-005b | Human 2019-nCoV HKU-SZ-005b | 2020 |
| NC002645 | Human CoV 229E 2000 | Human coronavirus 229E | 2000 |
| NC006577 | Human CoV HKU1 2004 | Human coronavirus HKU1 | 2004 |
| NC009019 | Bat CoV HKU4-1 2006 | Bat coronavirus HKU4-1 | 2006 |
| NC009020 | Bat CoV HKU5-1 2006 | Bat coronavirus HKU5-1 | 2006 |
| NC014470 | Bat SARS-related CoV BM48-31 2009 | Bat coronavirus BM48-31/BGR/2008 | 2008 |
| NC019843 | Human MERS-CoV 2012 | Middle East respiratory syndrome coronavirus | 2012 |
aOne nucleotide was added within M gene to maintain the sequence in-frame.
Figure 1.Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
Putative functions and proteolytic cleavage sites of 16 nonstructural proteins in orf1a/b as predicted by bioinformatics.
| NSP | Putative function/domain | Amino acid position | Putative cleave site |
|---|---|---|---|
| nsp1 | suppress antiviral host response | M1 – G180 | (LNGG'AYTR) |
| nsp2 | unknown | A181 – G818 | (LKGG'APTK) |
| nsp3 | putative PL-pro domain | A819 – G2763 | (LKGG'KIVN) |
| nsp4 | complex with nsp3 and 6: DMV formation | K2764 – Q3263 | (AVLQ'SGFR) |
| nsp5 | 3CL-pro domain | S3264 – Q3569 | (VTFQ'SAVK) |
| nsp6 | complex with nsp3 and 4: DMV formation | S3570 – Q3859 | (ATVQ'SKMS) |
| nsp7 | complex with nsp8: primase | S3860 – Q3942 | (ATLQ'AIAS) |
| nsp8 | complex with nsp7: primase | A3943 – Q4140 | (VKLQ'NNEL) |
| nsp9 | RNA/DNA binding activity | N4141 – Q4253 | (VRLQ'AGNA) |
| nsp10 | complex with nsp14: replication fidelity | A4254 – Q4392 | (PMLQ'SADA) |
| nsp11 | short peptide at the end of orf1a | S4393 – V4405 | (end of orf1a) |
| nsp12 | RNA-dependent RNA polymerase | S4393 – Q5324 | (TVLQ'AVGA) |
| nsp13 | helicase | A5325 – Q5925 | (ATLQ'AENV) |
| nsp14 | ExoN: 3′–5′ exonuclease | A5926 – Q6452 | (TRLQ'SLEN) |
| nsp15 | XendoU: poly(U)-specific endoribonuclease | S6453 – Q6798 | (PKLQ'SSQA) |
| nsp16 | 2'-O-MT: 2'-O-ribose methyltransferase | S6799 – N7096 | (end of orf1b) |
Amino acid identity between the 2019 novel coronavirus and bat SARS-like coronavirus or human SARS-CoV.
| Amino acid identity (%) | 2019-nCoV | 2019-nCoV |
|---|---|---|
| vs. bat-SL-CoVZXC21 | vs. SARS-CoV | |
| NSP1 | 96 | 84 |
| NSP2 | 96 | 68 |
| NSP3 | 93 | 76 |
| NSP4 | 96 | 80 |
| NSP5 | 99 | 96 |
| NSP6 | 98 | 88 |
| NSP7 | 99 | 99 |
| NSP8 | 96 | 97 |
| NSP9 | 96 | 97 |
| NSP10 | 98 | 97 |
| NSP11 | 85 | 85 |
| NSP12 | 96 | 96 |
| NSP13 | 99 | 100 |
| NSP14 | 95 | 95 |
| NSP15 | 88 | 89 |
| NSP16 | 98 | 93 |
| Spike | 80 | 76 |
| Orf3a | 92 | 72 |
| Orf3b | 32 | 32 |
| Envelope | 100 | 95 |
| Membrane | 99 | 91 |
| Orf6 | 94 | 69 |
| Orf7a | 89 | 85 |
| Orf7b | 93 | 81 |
| Orf8/Orf8b | 94 | 40 |
| Nucleoprotein | 94 | 94 |
| Orf9b | 73 | 73 |
Figure 2.Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences.
Figure 4.Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
Figure 5.Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.