| Literature DB >> 23170002 |
Sander van Boheemen1, Miranda de Graaf, Chris Lauber, Theo M Bestebroer, V Stalin Raj, Ali Moh Zaki, Albert D M E Osterhaus, Bart L Haagmans, Alexander E Gorbalenya, Eric J Snijder, Ron A M Fouchier.
Abstract
UNLABELLED: A novel human coronavirus (HCoV-EMC/2012) was isolated from a man with acute pneumonia and renal failure in June 2012. This report describes the complete genome sequence, genome organization, and expression strategy of HCoV-EMC/2012 and its relation with known coronaviruses. The genome contains 30,119 nucleotides and contains at least 10 predicted open reading frames, 9 of which are predicted to be expressed from a nested set of seven subgenomic mRNAs. Phylogenetic analysis of the replicase gene of coronaviruses with completely sequenced genomes showed that HCoV-EMC/2012 is most closely related to Tylonycteris bat coronavirus HKU4 (BtCoV-HKU4) and Pipistrellus bat coronavirus HKU5 (BtCoV-HKU5), which prototype two species in lineage C of the genus Betacoronavirus. In accordance with the guidelines of the International Committee on Taxonomy of Viruses, and in view of the 75% and 77% amino acid sequence identity in 7 conserved replicase domains with BtCoV-HKU4 and BtCoV-HKU5, respectively, we propose that HCoV-EMC/2012 prototypes a novel species in the genus Betacoronavirus. HCoV-EMC/2012 may be most closely related to a coronavirus detected in Pipistrellus pipistrellus in The Netherlands, but because only a short sequence from the most conserved part of the RNA-dependent RNA polymerase-encoding region of the genome was reported for this bat virus, its genetic distance from HCoV-EMC remains uncertain. HCoV-EMC/2012 is the sixth coronavirus known to infect humans and the first human virus within betacoronavirus lineage C. IMPORTANCE: Coronaviruses are capable of infecting humans and many animal species. Most infections caused by human coronaviruses are relatively mild. However, the outbreak of severe acute respiratory syndrome (SARS) caused by SARS-CoV in 2002 to 2003 and the fatal infection of a human by HCoV-EMC/2012 in 2012 show that coronaviruses are able to cause severe, sometimes fatal disease in humans. We have determined the complete genome of HCoV-EMC/2012 using an unbiased virus discovery approach involving next-generation sequencing techniques, which enabled subsequent state-of-the-art bioinformatics, phylogenetics, and taxonomic analyses. By establishing its complete genome sequence, HCoV-EMC/2012 was characterized as a new genotype which is closely related to bat coronaviruses that are distant from SARS-CoV. We expect that this information will be vital to rapid advancement of both clinical and vital research on this emerging pathogen.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23170002 PMCID: PMC3509437 DOI: 10.1128/mBio.00473-12
Source DB: PubMed Journal: MBio Impact factor: 7.867
FIG 1Genome organization and expression of HCoV-EMC/2012. (A) The coding part of the genome and terminal untranslated regions are depicted, respectively, by a gray background and horizontal lines. Rectangles indicate ORFs and their locations in three reading frames. The dashed lines in ORF1a and ORF5 indicate base ambiguities observed during sequencing. Triangles represent sites in the replicase polyproteins pp1a and pp1ab that are predicted to be cleaved by papain-like proteinases (gray) or the 3C-like cysteine proteinase (black). Cleavage products are numbered nsp1 to nsp16, according to the convention established for other coronaviruses (23). The −1 ribosomal frameshift site (RFS) in the ORF1a/ORF1b overlap region is indicated. The location of the leader TRS (transcription-regulatory sequences) (L) and seven body TRSs (numbered) are highlighted by black dots. All coordinates correspond to the scale shown at the bottom. (B) Sequence comparison of leader TRS region and seven body TRSs. The fully conserved TRS core sequence AACGAA is highlighted. Nucleotides in the body TRSs are written in uppercase letters if the complementary nucleotide can base pair with the corresponding residue in the leader TRS region (including G-U base pairs). TRS starting coordinates in the HCoV-EMC/2012 genome are shown at the left; for the body TRSs, the numbers of (potential) base pairs with the leader TRS region are shown at the right.
Cleavage products of the replicase polyproteins of HCoV-EMC/2012
| Cleavage product | Position in polyprotein | Protein size | Putative functional domain(s) |
|---|---|---|---|
| nsp1 | 1Met-Gly193 | 193 | |
| nsp2 | 194Asp-Gly853 | 660 | |
| nsp3 | 854Ala-Gly2740 | 1887 | ADRP, PL2pro, TM1 |
| nsp4 | 2741Ala-Gln3247 | 507 | TM-2 |
| nsp5 | 3248Ser-Gln3553 | 306 | 3CLpro |
| nsp6 | 3554Ser-Gln3845 | 292 | TM-3 |
| nsp7 | 3846Ser-Gln3928 | 83 | |
| nsp8 | 3929Ala-Gln4127 | 199 | Putative primase |
| nsp9 | 4128Asn-Gln4237 | 110 | |
| nsp10 | 4238Ala-Gln4377 | 140 | |
| nsp11 | 4378Ser-Leu4391 | 14 | |
| nsp12 | 4378Ser-Gln5310 | 933 | RdRp |
| nsp13 | 5311Ala-Gln5908 | 598 | ZD, HEL1 |
| nsp14 | 5909Ser-Gln6432 | 524 | ExoN, NMT |
| nsp15 | 6433Gly-Gln6775 | 343 | NendoU |
| nsp16 | 6776Ala-Arg7078 | 303 | OMT |
Amino acids of the replicase proteins pp1a and pp1ab were numbered with the assumption that a −1 ribosomal frameshift occurs to express ORF1b, as in other coronaviruses (see text); the use of the slippery sequence UUUAAAC is predicted to result in a peptide bond between Asn4385 and Arg4386 in pp1ab.
The major transmembrane domains and a selection of the most conserved domains with enzymatic activities that have been characterized functionally and/or structurally in coronaviruses are listed. Abbreviations: PL2pro, papain-like proteinase 2; ADRP, ADP-ribose 1″-phosphatase; TM, transmembrane domain; 3CLpro, 3C-like cysteine proteinase; RdRp, RNA-dependent RNA polymerase; ZD, putative zinc-binding domain; HEL1, superfamily 1 helicase; ExoN, 3′-to-5′ exonuclease; NMT, N7-methyltransferase; NendoU, nidoviral endoribonuclease specific for U; OMT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase.
FIG 2Phylogenetic trees for HCoV-EMC/2012 and selected other coronaviruses. Unrooted maximum likelihood phylogenies inferred from the nucleotide sequences of full-length ORF1ab (A) or a 332-nt fragment from the RdRp-encoding domain of ORF1b (B) are shown. HCoV-EMC/2012 and 20 viruses representing the recognized species diversity of coronaviruses were included, with bat-derived isolate VM314/2008 also included in the analysis presented in panel B (31). The viruses and corresponding species used are Alphacoronavirus 1 (Alpha-CoV1), Human coronavirus 229E (HCoV-229E), Human coronavirus NL63 (HCoV-NL63), Miniopterus bat coronavirus 1 (BtCoV-1AB), Miniopterus bat coronavirus HKU8 (BtCoV-HKU8), Porcine epidemic diarrhea virus (PED), Rhinolophus bat coronavirus HKU2 (BtCoV-HKU2), Scotophilus bat coronavirus 512 (BtCoV-512), Betacoronavirus 1 (Beta-CoV1), Human coronavirus HKU1 (HCoV-HKU1), Murine coronavirus (MHV), Tylonycteris bat coronavirus HKU4 (BtCoV-HKU4), Pipistrellus bat coronavirus HKU5 (BtCoV-HKU5), Rousettus bat coronavirus HKU9 (BtCoV-HKU9), Severe acute respiratory syndrome-related coronavirus (SARS-CoV), Avian coronavirus (IBV), Beluga whale coronavirus SW1 (BWCoV-SW1), Bulbul coronavirus HKU11 (ACoV-HKU11), Thrush coronavirus HKU12 (ACoV-HKU12), and Munia coronavirus HKU13 (ACoV-HKU13). Bootstrap values above 50 are shown. Arcs and symbols indicate the four coronavirus genera. The scale bar represents the number of nucleotide substitutions per site.
FIG 3Phylogenetic trees for HCoV-EMC/2012 and selected other coronaviruses. Unrooted maximum likelihood phylogenies based on coronavirus-wide conserved protein domains in replicase pp1ab (A) or on the conserved parts of structural proteins S2, E, M, and N (B) for HCoV-EMC/2012 and 20 viruses representing the recognized species diversity of coronaviruses are shown (see Fig. 2 legend for names and abbreviations). Branch support values are based on the Shimodaira-Hasegawa-like procedure and are in the range of zero to one; only nonoptimal values smaller than one are shown. Arcs and symbols indicate the four coronavirus genera. The scale bars represent average numbers of substitutions per amino acid position.
Percent amino acid sequence identity between conserved domains of the replicase polyprotein of HCoV-EMC/2012 and established betacoronaviruses[
| Virus strain | % amino acid sequence identity with conserved domain of the indicated HCoV-EMC/2012 replicase polyprotein | |||||||
|---|---|---|---|---|---|---|---|---|
| ADRP | 3CLpro | RdRp | Hel | ExoN | NendoU | O-MT | All domains | |
| BtCoV-HKU4.1 | 57.4 | 81.0 | 90.0 | 92.1 | 85.4 | 72.6 | 83.4 | 75.1 |
| BtCoV-HKU4.2 | 57.5 | 81.0 | 90.0 | 92.1 | 85.4 | 72.6 | 83.4 | 75.1 |
| BtCoV-HKU4.3 | 57.4 | 81.0 | 90.0 | 92.1 | 85.4 | 72.6 | 83.4 | 75.1 |
| BtCoV-HKU4.4 | 57.5 | 81.0 | 89.9 | 92.1 | 85.4 | 72.6 | 83.4 | 74.9 |
| BtCoV/133/2005 | 57.6 | 80.7 | 89.9 | 91.6 | 86.4 | 72.0 | 83.4 | 74.9 |
| BtCoV-HKU5.1 | 57.6 | 82.6 | 92.1 | 93.8 | 91.7 | 79.7 | 85.3 | 76.7 |
| BtCoV-HKU5.2 | 57.6 | 82.0 | 92.2 | 93.8 | 91.7 | 80.0 | 85.3 | 76.7 |
| BtCoV-HKU5.3 | 57.2 | 82.0 | 92.2 | 93.8 | 91.7 | 80.0 | 85.3 | 76.7 |
| BtCoV-HKU5.5 | 57.3 | 82.0 | 92.2 | 93.8 | 91.7 | 80.0 | 85.3 | 76.7 |
Accession numbers used are as follows: for BtCoV-HKU4 strains, EF065505, EF065506, EF065507, and EF065508; for BtCoV/133/2005, DQ648794; and for BtCoV-HKU5 strains, EF065509, EF065510, EF065511, and EF065512.
For abbreviations, see Table 1.
Percent identity between open reading frames of coronavirus HCoV-EMC/2012 and coronaviruses BtCoV-HKU4 and BtCoV-HKU5 at the nucleotide and amino acid levels
| Annotation | Annotation in | % identity | % identity | ||
|---|---|---|---|---|---|
| nt | aa | nt | aa | ||
| ORF1ab | ORF1ab | 70.6 | 72.2 | 70.7 | 73.8 |
| S | S | 66.3 | 66.1 | 63.8 | 63.5 |
| ORF3 | NS3a | 46.4 | 34.9 | 46.0 | 31.4 |
| ORF4a | NS3b | 51.5 | 37.5 | 47.8 | 38.0 |
| ORF4b | NS3c | 35.1 | 23.5 | 45.2 | 25.9 |
| ORF5 | NS3d | 56.6 | 46.9 | 58.1 | 54.2 |
| E | E | 74.6 | 69.5 | 75.1 | 68.2 |
| M | M | 72.8 | 82.6 | 73.0 | 82.2 |
| N | N | 67.2 | 71.8 | 66.7 | 67.8 |
| ORF8b | Undescribed | 45.3 | 32.1 | 48.0 | 33.8 |
Annotations used for HCoV-EMC/2012 differ from those used for BtCoV-HKU4 and BtCoV-HKU5 (10).
Accession numbers used for BtCoV-HKU4 and BtCoV-HKU5 were EF065505 and EF065509.