| Literature DB >> 12927536 |
Eric J Snijder1, Peter J Bredenbeek, Jessika C Dobbe, Volker Thiel, John Ziebuhr, Leo L M Poon, Yi Guan, Mikhail Rozanov, Willy J M Spaan, Alexander E Gorbalenya.
Abstract
The genome organization and expression strategy of the newly identified severe acute respiratory syndrome coronavirus (SARS-CoV) were predicted using recently published genome sequences. Fourteen putative open reading frames were identified, 12 of which were predicted to be expressed from a nested set of eight subgenomic mRNAs. The synthesis of these mRNAs in SARS-CoV-infected cells was confirmed experimentally. The 4382- and 7073 amino acid residue SARS-CoV replicase polyproteins are predicted to be cleaved into 16 subunits by two viral proteinases (bringing the total number of SARS-CoV proteins to 28). A phylogenetic analysis of the replicase gene, using a distantly related torovirus as an outgroup, demonstrated that, despite a number of unique features, SARS-CoV is most closely related to group 2 coronaviruses. Distant homologs of cellular RNA processing enzymes were identified in group 2 coronaviruses, with four of them being conserved in SARS-CoV. These newly recognized viral enzymes place the mechanism of coronavirus RNA synthesis in a completely new perspective. Furthermore, together with previously described viral enzymes, they will be important targets for the design of antiviral strategies aimed at controlling the further spread of SARS-CoV.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12927536 PMCID: PMC7159028 DOI: 10.1016/s0022-2836(03)00865-9
Source DB: PubMed Journal: J Mol Biol ISSN: 0022-2836 Impact factor: 5.469
Figure 1Overview of the SARS-CoV genome organization and expression. Comparison of the genome organizations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORF1a, ORF1b, and ribosomal frameshift site indicated. Arrows represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Cleavage products are provisionally numbered nsp1–nsp16 (see also Table 1). In the 3′-terminal part of the genomes, homologous structural protein genes are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the N-terminal half of replicase ORF1a, SARS-CoV lacks one of the PLpro domains (indicated in orange/green in BCoV) and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofmann et al.). The bottom part of the Figure illustrates which parts of the genome are conserved in the genus Coronavirus and in the order Nidovirales (the ORF1a sequence of toroviruses, which largely remains to be sequenced, could not be included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique domain; PLpro, papainlike cysteine proteinase; 3CLpro, 3C-like cysteine proteinase; TM, transmembrane domain; ADRP, adenosine diphosphate-ribose 1″-phosphatase; ExoN, 3′-to-5′ exonuclease; CLpro, chymotrypsin-like proteinase; RdRp, RNA-dependent RNA polymerase; HEL1, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific endoribonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase; CPD, cyclic phosphodiesterase. Domains Ac, X, and Y are described by Ziebuhr et al. and Gorbalenya et al.
Predicted SARS-CoV replicase cleavage products and their mode of expression
| Protein order | Position in polyproteins pp1a/pp1ab (amino acid residues) | Protein size (amino acid residues) | Associated putative functional domain(s) | Predicted mode of expression and release from polyproteins |
|---|---|---|---|---|
| nsp1-pp1a/pp1ab | 1Met-Gly180 | 180 | ? | TI+PL2pro |
| nsp2-pp1a/pp1ab | 181Ala-Gly818 | 638 | ? | PL2pro |
| nsp3 | 819Ala-Gly2740 | 1922 | Ac, X, PL2pro, Y (TM1), ADRP | PL2pro |
| nsp4-pp1a/pp1ab | 2741Lys-Gln3240 | 500 | TM2 | PL2+3CLpro |
| nsp5-pp1a/pp1ab | 3241Ser-Gln3546 | 306 | 3CLpro | 3CLpro |
| nsp6-pp1a/pp1ab | 3547Gly-Gln3836 | 290 | TM3 | 3CLpro |
| nsp7-pp1a/pp1ab | 3837Ser-Gln3919 | 83 | ? | 3CLpro |
| nsp8-pp1a/pp1ab | 3920Ala-Gln4117 | 198 | ? | 3CLpro |
| nsp9-pp1a/pp1ab | 4118Asn-Gln4230 | 113 | ? | 3CLpro |
| nsp10-pp1a/pp1ab | 4231Ala-Gln4369 | 139 | GFL | 3CLpro |
| nsp11-pp1a | 4370Ser-Val4382 | 13 | ? | 3CLpro+TT |
| nsp12-pp1ab | 4370Ser-Gln5301 | 932 | RdRp | RFS+3CLpro |
| nsp13-pp1ab | 5302Ala-Gln5902 | 601 | ZD, NTPase, HEL1 | RFS+3CLpro |
| nsp14-pp1ab | 5903Ala-Gln6429 | 527 | Exonuclease (ExoN homolog) | RFS+3CLpro |
| nsp15-pp1ab | 6430Ser-Gln6775 | 346 | NTD, endoRNase (XendoU homolog) | RFS+3CLpro |
| nsp16-pp1ab | 6776Ala-Asn7073 | 298 | 2′- | RFS+3CLpro+TT |
Predictions are based on the SARS-CoV sequences published by Michael Smith Genome Sciences Centre (Vancouver, Canada; Entrez Genomes accession number NC_004718 (AY274119)) and the Centers for Disease Control and Prevention (Atlanta, USA; GenBank accession number AY278741) and an alignment of SARS-CoV with previously characterized coronavirus sequences as summarized in Refs. 11., 18., 32..
For convenience, replicase cleavage products were provisionally numbered non-structural protein (nsp) 1–16 according to their position in the polyproteins.
Amino acids of replicase proteins pp1a and pp1ab were numbered assuming that, as in other coronaviruses, a −1 ribosomal frameshift occurs; use of the slippery sequence UUUAAAC is predicted to yield a peptide bond between Asn4378 and Arg4379 in pp1ab.
Abbreviations: PL2pro, papain-like proteinase 2; ADRP, adenosine diphosphate-ribose 1″-phosphatase; TM, transmembrane domain; 3CLpro, 3C-like cysteine proteinase; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZD, putative Zinc-binding domain; HEL1, superfamily 1 helicase; NTD, nidovirus conserved domain; ExoN, 3′-to-5′ exonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase. Domains Ac, X, and Y are described in Refs 32., 47..
Indicated are the SARS-CoV proteinases predicted to be involved in cleavage of the N- and/or C-termini of the cleavage products; TI, translation initiation; TT, translation termination; RFS, ORF1a/ORF1b ribosomal frameshift.
Compared to the corresponding cleavage product of BCoV (see Figure 1), nsp3 lacks PL1pro and contains a ∼375 amino acid insertion between the X and PL2pro domains which is unique for SARS-CoV (see also Figure 1).
Predicted SARS–CoV proteins expressed from subgenomic mRNAs 2 to 9
| ORF number | Protein size (amino acid residues) | Subgenomic mRNA predicted to be used for expression | Protein name/function |
|---|---|---|---|
| 2 | 1255 | 2 | Spike (S) protein |
| 3a | 274 | 3 | ? |
| 3b | 154 | 3 | ? |
| 4 | 76 | 4 | Envelope (E) protein |
| 5 | 221 | 5 | Membrane (M) protein |
| 6 | 63 | 6 | ? |
| 7a | 122 | 7 | ? |
| 7b | 44 | 7 | ? |
| 8a | 39 | 8 | ? |
| 8b | 84 | 8 | ? |
| 9a | 422 | 9 | Nucleocapsid (N) protein |
| 9b | 98 | 9 | ? |
Predictions are based on the SARS-CoV sequences published by Michael Smith Genome Sciences Centre (Vancouver, Canada; Entrez Genomes accession number NC_004718 (AY274119)) and the Centers for Disease Control and Prevention (Atlanta, USA; GenBank accession number AY278741).
See also Figure 1, Figure 3.
ORF3b (462 nucleotides) overlaps with the 3′ half of ORF3a, the RNA4 body TRS and the 5′ end of ORF4. It is the fifth largest reading frame downstream of ORF1b (after ORFs 2, 3a, 5 and 9a) making it a likely candidate to be expressed. Since its translation initiation codon is the 13th AUG codon in mRNA3, ORF3b expression should involve a mechanism like internal ribosomal entry (as previously suggested for some other coronavirus ORFs; Ref. 78) or the synthesis of an as yet undetected additional subgenomic mRNA.
The translation termination codon of ORF7a and translation initiation codon of ORF7b overlap. The absence of any other upstream AUG codons (with the exception of that of ORF7a) and good context for translation initiation of the ORF7b AUG codon suggest that ORF7b may be expressed from subgenomic RNA7 by “leaky scanning” of ribosomes.
The putative ORF8a start codon is in a good context for translation initiation and immediately follows the body TRS involved in mRNA8 transcription, making it likely that ORF8a is expressed from mRNA8. The mechanism used to express the larger downstream ORF8b is more puzzling, since its (putative) translation initiation codon appears to have a poor context for translation initiation and two additional AUG codon are present in the region between the putative start codons of ORFs 8a and 8b. Recently, some SARS-CoV isolates from human and civet cat origin (L.L.M.P. and Y.G., unpublished results) were reported to contain a 29 nucleotides insertion that results in the in-frame fusion of ORFs 8a and 8b. Consequently, ORF8b in the Frankfurt-1 and HKU-39849 isolates used in this study may be translationally silent.
A functional “internal” open reading frame, overlapping with the N protein gene, has been described for other group 2 coronavirus, e.g. BCoV; ORF9b appears to occupy a corresponding position and may be expressed following “leaky scanning” by ribosomes.
Figure 3SARS-CoV subgenomic mRNA synthesis. (A) Organization of ORFs in the 3′ end of the SARS-CoV genome with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of these TRSs for leader-to-body fusion are depicted below, with mRNAs predicted to be functionally bicistronic indicated with an asterisk (∗). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with SARS-CoV, Frankfurt-1 (Fr) and HKU-39849 (HK) isolates. See Materials and Methods for technical details. Oligonucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic 3′ end both recognized a set of nine RNA species (the genome (RNA1) and eight subgenomic RNAs) confirming the presence of common 5′ and 3′ sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which produces only five subgenomic mRNAs of known sizes was run in the same gel and used as a size marker. (C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension of minus strands.19., 39. Whereas genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 3′ end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus strand that would then serve as template for the transcription of subgenomic mRNAs.
Figure 2Phylogenetic analysis of coronavirus replicase genes. SARS-CoV replicase ORF1b amino acid sequences (Entrez Genomes accession number NC_004718 (AY274119)) were compared with those from viruses representing the three coronavirus subgroups and the genus Torovirus. Group 1: transmissible gastroenteritis virus (TGEV), NC_002306; human coronavirus 229E (HCoV-229E), NC_002645; porcine epidemic diarrhea virus (PEDV), NC_003436. Group 2: mouse hepatitis virus A59 (MHV-A59), NC_001846; bovine coronavirus (BCoV-Lun) AF391542. Group 3: infectious bronchitis virus (IBV), strains Beaudette (NC_001451) and LX4 (AY223860). Torovirus: equine torovirus (EToV), X52374. A multiple protein alignment of these sequences was generated with the help of the ClustalX1.82 program and was adjusted manually. Two regions of poor conservation were removed from the alignment, which was converted subsequently into the nucleotide form. All columns containing gaps were removed. The resulting alignment contains the following SARS-CoV sequences fused: 13,623–13,859, 14,310–18,857 and 20,076–21,482. It included 5487 characters with 3207 of them being parsimony-informative. Using the PAUP program (version 4.0.0d55) and parsimony criterion, an exhaustive tree search of the 135,135 evaluated trees identified the best tree having a score of 10,927 and the second best tree having a score of 10,964; the worst tree had a score of 13,611. A total of 1000 bootstrap trials were conducted using the parsimony criterion and a branch-and-bound search to generate a bootstrap 50% majority-rule consensus tree. The frequency of occurrence of particular bifurcations in bootstraps is indicated at the nodes. Similar trees with similar high bootstrap support above 960 were obtained using the NJ method that was applied to distance matrices obtained for either nucleotide or amino acid alignments (not shown).
Supplementary data 1
Figure 4Sequence alignments of protein families that include cellular enzymes involved in RNA processing and their nidovirus homologs. Our in-depth comparative sequence analysis (see Materials and Methods) revealed a statistically significant relationship between functionally uncharacterized proteins (domains) of nidoviruses, including SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron excision to produce mature tRNA and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA (Figure 5). Shown are alignments for key regions of a few selected members of the following groups of enzymes: (A) XendoU family; (B) ExoN family; (C) 2′-O-MT family; (D) CPD family; and (E) ADRP family. These protein families may be known also under other names. Cellular homologs, not necessarily including proteins involved in the discussed RNA processing pathways, are listed in the top segment of each alignment and nidovirus proteins in the bottom segment. In the CPD family, along with group 2 coronavirus representatives, proteins of two rotaviruses (double-stranded RNA viruses), which were identified in this study, are listed. In both segments, residues are highlighted independently: black for absolutely conserved residues and different shades of grey to indicate different levels of conservation; amino acid similarity groups used were: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) F, W, Y; and (v) I, L, M, V. Positions occupied by identical or similar residues in all proteins under comparison are indicated with an asterisk (∗) and colon (:), respectively, in the inter-segment row. For the ExoN family, three motifs conserved in the DEDD superfamily and Zn-finger unique for the ExoN family are indicated. Database accession numbers for nidovirus genome sequences: SARS-CoV, Entrez Genomes accession number NC_004718 (AY274119); MHV-A59, NC_001846; BCoV-Lun, AF391542; HCoV-229E, NC_002645; IBV-B, NC_001451; PEDV, NC_003436; TGEV, NC_002306; equine torovirus (EToV), X52374; equine arteritis virus (EAV), X53459; porcine reproductive and respiratory syndrome virus (PRRSV), M96262; gill-associated virus (GAV), AF227196. Abbrevations and NCBI protein database ID number or SwissProt names of the remaining protein sequences are: (A) Npun 0562, hypothetical protein of Nostoc punctiforme, ZP_00106190; Poliv smB, pancreatic protein of Paralichthys olivaceus, BAA88246; Celeg Pp11, placental protein 11-like precursor of Caenorhabditis elegans, NP_492590); Xlaev endoU, endoU protein of Xenopus laevis, CAD45344; pp1b, ORF1b-encoded part of nidovirus replicase polyprotein 1ab. (B) Yeast PAN2, PAB-dependent poly(A)-specific ribonuclease subunit PAN2 of Saccharomyces cerevisiae, P53010; Mycge DPO3, DNA polymerase III polC-type, containing exonuclease domain, of Mycoplasma genitalium, P47277; Bacsu DING, probable ATP-dependent helicase dinG homolog, containing exonuclease domain, of Bacillus subtilis, P54394; Ecoli DP3E, DNA polymerase III, epsilon chain, containing exonuclease domain, of Escherichia coli, P03007 (PDB: 1J53 and 1J54); Ecoli RNT, exoribonuclease T of Escherichia coli, P30014. (C) Hsap AKA, A-kinase anchoring protein 18 gamma of Homo sapiens, AAF28106; Athal CPD1, putative CPD1 of Arabidopsis thaliana, CAA16750; Athal CPD2, putative CPD2 of Arabidopsis thaliana, CAA16751; yeast YG59, hypothetical 26.7 kDa protein of yeast, P53314; Ecoli LIGT, 2′-5′ RNA ligase of Escherichia coli, P37025; ns2, non-structural protein (ORF2-encoded) of the coronaviruses HCoV-O43 (AAA74377), BCoV-Quebec (P18517), and MHV-A59 (P19738); EToV pp1a, C-terminal fragment of EToV pp1a, S11237; HRoV VP3, VP3 of human rotavirus, BAA84964; ARoV VP3, VP3 of avian rotavirus PO-13, BAA24128. (D) Ecoli o177, putative polyprotein of Escherichia coli, AAC74129; Hsap Y1268a, KIAA1268 protein of Homo sapiens, BAA86582; Hsap H2A1.1, histone macroH2A1.1 of Homo sapiens, AAC33434; yeast YMX7, hypothetical 32.1 kDa protein of yeast, Q04299; yeast YBN2, hypothetical 19.9 kDa protein of yeast, P38218. (E) Yeast YBR1, putative ribosomal RNA methyltransferase (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P38238; yeast SPB1, putative rRNA methyltransferase SPB1 of yeast, P25582; yeast YGN6, putative ribosomal RNA methyltransferase YGL136c (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P53123; Ecoli FTSJ, cell division protein of Escherichia coli, NP_417646.
Figure 5Nidoviruses encode homologs of cellular enzymes involved in RNA processing. (A) The cellular pathways for processing of pre-U16 snoRNA and pre-tRNA splicing are summarized, with relevant enzymatic activities indicated. For details, see the text. Homologs of the highlighted enzymes have been identified in nidoviruses (see also Figure 1 and the text). (B) Table summarizing the conservation of homologs of the cellular enzymes presumably involved in RNA processing in SARS-CoV and different nidovirus groups.