Literature DB >> 12927536

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage.

Eric J Snijder¹, Peter J Bredenbeek, Jessika C Dobbe, Volker Thiel, John Ziebuhr, Leo L M Poon, Yi Guan, Mikhail Rozanov, Willy J M Spaan, Alexander E Gorbalenya.

Abstract

The genome organization and expression strategy of the newly identified severe acute respiratory syndrome coronavirus (SARS-CoV) were predicted using recently published genome sequences. Fourteen putative open reading frames were identified, 12 of which were predicted to be expressed from a nested set of eight subgenomic mRNAs. The synthesis of these mRNAs in SARS-CoV-infected cells was confirmed experimentally. The 4382- and 7073 amino acid residue SARS-CoV replicase polyproteins are predicted to be cleaved into 16 subunits by two viral proteinases (bringing the total number of SARS-CoV proteins to 28). A phylogenetic analysis of the replicase gene, using a distantly related torovirus as an outgroup, demonstrated that, despite a number of unique features, SARS-CoV is most closely related to group 2 coronaviruses. Distant homologs of cellular RNA processing enzymes were identified in group 2 coronaviruses, with four of them being conserved in SARS-CoV. These newly recognized viral enzymes place the mechanism of coronavirus RNA synthesis in a completely new perspective. Furthermore, together with previously described viral enzymes, they will be important targets for the design of antiviral strategies aimed at controlling the further spread of SARS-CoV.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2003 PMID： 12927536 PMCID： PMC7159028 DOI： 10.1016/s0022-2836(03)00865-9

Source DB: PubMed Journal: J Mol Biol ISSN： 0022-2836 Impact factor: 5.469

Introduction

Severe acute respiratory syndrome (SARS) is a life-threatening form of atypical pneumonia that recently emerged in Guangdong Province, China. A previously unknown coronavirus was isolated from SARS patients1., 2., 3. and is considered the cause of this emerging respiratory disease. In an extraordinary effort, the full-length genome sequence of the SARS-coronavirus (SARS-CoV) was elucidated within weeks after the identification of this novel pathogen and published by the Michael Smith Genome Sciences Center (Vancouver, Canada, Entrez Genomes accession number NC_004718 (AY274119)), the Centers for Disease Control and Prevention (Atlanta, USA, GenBank accession number AY278741), and others. The SARS-CoV genome is ∼29.7 kb long and contains 14 open reading frames (ORFs) flanked by 5′ and 3′-untranslated regions of 265 and 342 nucleotides, respectively (Figure 1). Homologs of proteins conserved in all coronaviruses are encoded by the overlapping ORFs 1a and 1b, and by ORFs 2, 4, 5, 6 and 9a (Figure 1; Table 1, Table 2) .

Figure 1

Overview of the SARS-CoV genome organization and expression. Comparison of the genome organizations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORF1a, ORF1b, and ribosomal frameshift site indicated. Arrows represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Cleavage products are provisionally numbered nsp1–nsp16 (see also Table 1). In the 3′-terminal part of the genomes, homologous structural protein genes are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the N-terminal half of replicase ORF1a, SARS-CoV lacks one of the PLpro domains (indicated in orange/green in BCoV) and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofmann et al.). The bottom part of the Figure illustrates which parts of the genome are conserved in the genus Coronavirus and in the order Nidovirales (the ORF1a sequence of toroviruses, which largely remains to be sequenced, could not be included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique domain; PLpro, papainlike cysteine proteinase; 3CLpro, 3C-like cysteine proteinase; TM, transmembrane domain; ADRP, adenosine diphosphate-ribose 1″-phosphatase; ExoN, 3′-to-5′ exonuclease; CLpro, chymotrypsin-like proteinase; RdRp, RNA-dependent RNA polymerase; HEL1, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific endoribonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase; CPD, cyclic phosphodiesterase. Domains Ac, X, and Y are described by Ziebuhr et al. and Gorbalenya et al.

Table 1

Predicted SARS-CoV replicase cleavage products and their mode of expression

Protein ordera in polyproteins pp1a/pp1ab	Position in polyproteins pp1a/pp1ab (amino acid residues)b	Protein size (amino acid residues)	Associated putative functional domain(s)c	Predicted mode of expression and release from polyproteinsd
nsp1-pp1a/pp1ab	1Met-Gly180	180	?	TI+PL2^pro
nsp2-pp1a/pp1ab	181Ala-Gly818	638	?	PL2^pro
nsp3e-pp1a//pp1ab	819Ala-Gly2740	1922	Ac, X, PL2^pro, Y (TM1), ADRP	PL2^pro
nsp4-pp1a/pp1ab	2741Lys-Gln3240	500	TM2	PL2+3CL^pro
nsp5-pp1a/pp1ab	3241Ser-Gln3546	306	3CL^pro	3CL^pro
nsp6-pp1a/pp1ab	3547Gly-Gln3836	290	TM3	3CL^pro
nsp7-pp1a/pp1ab	3837Ser-Gln3919	83	?	3CL^pro
nsp8-pp1a/pp1ab	3920Ala-Gln4117	198	?	3CL^pro
nsp9-pp1a/pp1ab	4118Asn-Gln4230	113	?	3CL^pro
nsp10-pp1a/pp1ab	4231Ala-Gln4369	139	GFL	3CL^pro
nsp11-pp1a	4370Ser-Val4382	13	?	3CL^pro+TT
nsp12-pp1ab	4370Ser-Gln5301	932	RdRp	RFS+3CL^pro
nsp13-pp1ab	5302Ala-Gln5902	601	ZD, NTPase, HEL1	RFS+3CL^pro
nsp14-pp1ab	5903Ala-Gln6429	527	Exonuclease (ExoN homolog)	RFS+3CL^pro
nsp15-pp1ab	6430Ser-Gln6775	346	NTD, endoRNase (XendoU homolog)	RFS+3CL^pro
nsp16-pp1ab	6776Ala-Asn7073	298	2′-O-MT	RFS+3CL^pro+TT

Predictions are based on the SARS-CoV sequences published by Michael Smith Genome Sciences Centre (Vancouver, Canada; Entrez Genomes accession number NC_004718 (AY274119)) and the Centers for Disease Control and Prevention (Atlanta, USA; GenBank accession number AY278741) and an alignment of SARS-CoV with previously characterized coronavirus sequences as summarized in Refs. 11., 18., 32..

For convenience, replicase cleavage products were provisionally numbered non-structural protein (nsp) 1–16 according to their position in the polyproteins.

Amino acids of replicase proteins pp1a and pp1ab were numbered assuming that, as in other coronaviruses, a −1 ribosomal frameshift occurs; use of the slippery sequence UUUAAAC is predicted to yield a peptide bond between Asn4378 and Arg4379 in pp1ab.

Abbreviations: PL2pro, papain-like proteinase 2; ADRP, adenosine diphosphate-ribose 1″-phosphatase; TM, transmembrane domain; 3CLpro, 3C-like cysteine proteinase; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZD, putative Zinc-binding domain; HEL1, superfamily 1 helicase; NTD, nidovirus conserved domain; ExoN, 3′-to-5′ exonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase. Domains Ac, X, and Y are described in Refs 32., 47..

Indicated are the SARS-CoV proteinases predicted to be involved in cleavage of the N- and/or C-termini of the cleavage products; TI, translation initiation; TT, translation termination; RFS, ORF1a/ORF1b ribosomal frameshift.

Compared to the corresponding cleavage product of BCoV (see Figure 1), nsp3 lacks PL1pro and contains a ∼375 amino acid insertion between the X and PL2pro domains which is unique for SARS-CoV (see also Figure 1).

Table 2

Predicted SARS–CoV proteins expressed from subgenomic mRNAs 2 to 9

ORF numbera	Protein size (amino acid residues)	Subgenomic mRNA predicted to be used for expressiona	Protein name/function
2	1255	2	Spike (S) protein
3a	274	3	?
3bb	154	3	?
4	76	4	Envelope (E) protein
5	221	5	Membrane (M) protein
6	63	6	?
7a	122	7	?
7bc	44	7	?
8ad	39	8	?
8b	84	8	?
9a	422	9	Nucleocapsid (N) protein
9be	98	9	?

Results and Discussion

SARS-CoV represents a lineage that has split off from the group 2 branch relatively late in coronavirus evolution

To optimize our understanding of the SARS-CoV genome, we sought to infer the phylogenetic position of the novel agent relative to known coronaviruses. Recent phylogenetic analyses of different SARS-CoV proteins using unrooted trees consistently showed that SARS-CoV does not segregate into any of the three currently established coronavirus groups.4., 5. These results were interpreted as support for the classification of SARS-CoV as the prototype of a novel, fourth group of coronaviruses.4., 5. However, in our opinion, the evidence leading to this conclusion was inconclusive and alternative interpretations, with SARS-CoV being an outlier in one of the established groups, remained possible. This uncertainty can be resolved only through the reconstruction of coronavirus evolution from its origin using a rooted phylogenetic tree, which is most reliable when an outgroup is included in the analysis. The closest known outgroup for coronaviruses are the toroviruses, which form a separate genus in the same virus family.8., 25. The ORF1b part of the replicase and the two virion proteins S and M are homologous in coronaviruses and toroviruses.26., 27., 28. Unfortunately however, the level of conservation of the S and M protein genes is so low that we consider only the phylogenetic analysis of replicase ORF1b to be truly informative. Consequently, to resolve the phylogenetic position of SARS-CoV, the equine torovirus (EToV) was included in our analysis, which was limited to replicase ORF1b, the most conserved part of the genome. It should be noted, however, that the size of this genome segment (∼5500 nucleotides) approximates the combined size of the genes encoding the four virion-associated proteins S, M, E, and N. A fully resolved tree was obtained, with all branches supported in more than 960 out of 1000 bootstrap trials (Figure 2). The topology of this tree suggests strongly that the SARS-CoV lineage was an early split-off from the group 2 branch, which occurred after the two bifurcations that gave rise to the three major coronavirus groups (Figure 2). Accordingly, in two regions of the replicase ORF1a polyprotein, nsp1 and one of the nsp3 domains, which differentiate the three coronavirus groups, SARS-CoV contains orthologs of domains that are unique for group 2 coronaviruses (see Figure S1 of the Supplementary Material ). The published unrooted trees for the virion proteins and 3CLpro are also compatible with this phylogeny,4., 5. although formally we cannot exclude the occurrence of recombination with other coronaviruses in very limited regions. In this respect, we would like to stress that the differences in the composition and arrangement of ORFs in the 3′-proximal region of the genome (downstream of ORF1b; see Figure 1) between SARS-CoV and established group 2 coronaviruses does not contradict the above results. Group 1 coronaviruses also differ in this region through the presence of unique so-called “accessory non-structural protein genes”.6., 7. Some of these genes have been found to be dispensable for virus reproduction in tissue culture and/or animals.6., 7., 29. The fact that, apparently, they can be acquired or lost easily in the course of evolution indicates that these genes can not be considered reliable group markers.

Figure 2

Supplementary data 1

Phylogenetic analysis of coronavirus replicase genes. SARS-CoV replicase ORF1b amino acid sequences (Entrez Genomes accession number NC_004718 (AY274119)) were compared with those from viruses representing the three coronavirus subgroups and the genus Torovirus. Group 1: transmissible gastroenteritis virus (TGEV), NC_002306; human coronavirus 229E (HCoV-229E), NC_002645; porcine epidemic diarrhea virus (PEDV), NC_003436. Group 2: mouse hepatitis virus A59 (MHV-A59), NC_001846; bovine coronavirus (BCoV-Lun) AF391542. Group 3: infectious bronchitis virus (IBV), strains Beaudette (NC_001451) and LX4 (AY223860). Torovirus: equine torovirus (EToV), X52374. A multiple protein alignment of these sequences was generated with the help of the ClustalX1.82 program and was adjusted manually. Two regions of poor conservation were removed from the alignment, which was converted subsequently into the nucleotide form. All columns containing gaps were removed. The resulting alignment contains the following SARS-CoV sequences fused: 13,623–13,859, 14,310–18,857 and 20,076–21,482. It included 5487 characters with 3207 of them being parsimony-informative. Using the PAUP program (version 4.0.0d55) and parsimony criterion, an exhaustive tree search of the 135,135 evaluated trees identified the best tree having a score of 10,927 and the second best tree having a score of 10,964; the worst tree had a score of 13,611. A total of 1000 bootstrap trials were conducted using the parsimony criterion and a branch-and-bound search to generate a bootstrap 50% majority-rule consensus tree. The frequency of occurrence of particular bifurcations in bootstraps is indicated at the nodes. Similar trees with similar high bootstrap support above 960 were obtained using the NJ method that was applied to distance matrices obtained for either nucleotide or amino acid alignments (not shown). Supplementary data 1 In conclusion, SARS-CoV is distantly related to established group 2 coronaviruses, a relationship comparable to that observed in group 1 between porcine epidemic diarrhoea coronavirus (PEDV) and human coronavirus 229E (HCoV-229E) on the one hand, and transmissible gastroenteritis coronavirus (TGEV) and related viruses on the other hand (Figure 2). Accordingly, the lack of antigenic cross-reactivity observed between distant group-mates in group 1 may be observed between SARS-CoV and the established group 2 viruses. Thus, SARS-CoV may be the first identified representative of a larger cluster that could be called subgroup 2b, if the established group 2 coronaviruses would be referred to as subgroup 2a. The 2b cluster should include the immediate ancestor of SARS-CoV, which may circulate in the field. If close relatives of SARS-CoV were to be identified in animal hosts, the virus would represent the second example of a group 2 coronavirus that may have crossed the animal–human barrier. The first putative case is that of the bovine coronavirus (BCoV) and human coronavirus OC43 (HCoV-OC43), two viruses that are so closely related at the genetic level30., 31. that they can be considered to be the same virus species.

Two proteinases are predicted to cleave the SARS-CoV replicase polyproteins into 16 subunits, the largest of these having a unique domain organization

A detailed comparison of the SARS-CoV replicase with that of its closest known relatives in group 2, mouse hepatitis coronavirus (MHV) and BCoV (Figure 1), revealed a replicase proteolytic processing scheme and domain organization that, with some notable exceptions (see below), proved to be typical for group 2 viruses.11., 32. Using the conserved signatures of the cleavage sites recognized by coronavirus proteinases11., 12., 33., 34. and their flanking sequences, we predict the generation of 16 replicase subunits through proteolysis mediated by 3CLpro (11 cleavages) and PL2pro (three cleavages) (Figure 1 and Table 1). The most conspicuous differences between known group 2 coronaviruses and SARS-CoV were identified in nsp3, the largest replicase subunit that is encoded by ORF1a (Table 1). Unlike all other coronaviruses, SARS-CoV does not have an ortholog of papain-like proteinase 1 (PL1pro; see close-up in Figure 1),13., 35. which was probably lost during evolution of this lineage. This observation implies that the three cleavages in the N-terminal half of pp1a must all be performed by the conserved PL2pro,36., 47. a downstream-located paralog of PL1pro. The ortholog of this proteinase appears to dominate over PL1pro in HCoV-229E, and is the only active PLpro in avian infectious bronchitis coronavirus (IBV).32., 37. Immediately upstream of PL2pro, we identified a 375 amino acid residue “orphan domain” in SARS-CoV (called SUD for SARS-CoV unique domain; Figure 1), which is not present in other coronaviruses. The corresponding ORF1a region differs profoundly among group 1 coronaviruses. In one of these viruses (TGEV), and in the group 3 IBV, this region contains just a few amino acid residues, essentially fusing PL2pro to the upstream X domain. In contrast, HCoV-229E and PEDV share a conserved domain in this position. Interestingly, nsp3 also was the main site of replicase differences between BCoV variants isolated from respiratory and intestinal samples from an animal that had died during an outbreak of fatal shipping pneumonia. Due to the plausible multifunctionality of nsp3, which may be involved in the control of subgenomic mRNA synthesis,13., 38. the gross internal rearrangements and point mutations in this protein may have pleiotropic effect(s) on SARS-CoV properties, including its pathogenic potential.

SARS-CoV produces eight subgenomic mRNAs to express the ORFs located in the 3′-proximal part of the genome

In a striking parallel with the unique features of nsp3, the 3′-proximal part of the SARS-CoV genome contains five ORFs (6, 7a, 7b, 8a and 8b) that are not present in established group 2 coronaviruses and for which no obvious homologs could be identified upon sequence comparison. Furthermore, SARS-CoV lacks counterparts for two genes inserted between replicase ORF1b and the S gene in subgroup 2a viruses (see the close-up in Figure 1).6., 7. All these ORFs (from 2 to 9b) are predicted to be expressed from sg mRNAs in SARS-CoV. In members of the genus Coronavirus and the related family Arteriviridae, all sg mRNAs are 3′-coterminal with the viral genome, and contain a common 5′ leader sequence that is identical with that of the genome.6., 7., 9., 39. The fusion of the leader to the coding part (or “body”) of each of the sg RNAs involves a discontinuous step in RNA synthesis, which is currently believed to occur during minus strand synthesis, thus producing composite subgenomic negative-stranded templates for sg mRNA synthesis (Figure 3(C)).19., 39., 40. Leader-to-body joining is guided by a base-pairing interaction involving conserved transcription-regulating sequences (TRSs; also previously termed “intergenic sequences (IGSs)” in coronaviruses), which are found at the 3′ end of the genomic leader (leader TRS) and at the 5′ end of each of the sg RNA bodies (body TRSs), often located exactly between two genes, but sometimes located within the coding sequence of an upstream gene (Figure 1, Figure 3). SARS-CoV subgenomic mRNA synthesis. (A) Organization of ORFs in the 3′ end of the SARS-CoV genome with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of these TRSs for leader-to-body fusion are depicted below, with mRNAs predicted to be functionally bicistronic indicated with an asterisk (∗). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with SARS-CoV, Frankfurt-1 (Fr) and HKU-39849 (HK) isolates. See Materials and Methods for technical details. Oligonucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic 3′ end both recognized a set of nine RNA species (the genome (RNA1) and eight subgenomic RNAs) confirming the presence of common 5′ and 3′ sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which produces only five subgenomic mRNAs of known sizes was run in the same gel and used as a size marker. (C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension of minus strands.19., 39. Whereas genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 3′ end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus strand that would then serve as template for the transcription of subgenomic mRNAs. In the SARS-CoV genome we readily identified a potential leader TRS (5′-CUAAACGAACUUU-3′) that has a 6–11 nucleotides match with a number of sequences in the 3′ end of the genome, many of which are positioned immediately upstream of viral genes (Figure 3(A)). As recognized also by others,4., 5., 34. the sequence 5′-ACGAAC-3′ is absolutely conserved and can be considered the core of the SARS-CoV TRS. Based on the SARS-CoV sequence with the largest 5′-terminal segment (accession number AY278741), the SARS-CoV leader sequence is (at least) 72 nucleotides long, similar to e.g. that of BCoV, with which it has a striking 20 out of 21 nucleotides match immediately upstream of the leader TRS (5′-GAUCUCUUGUAGAUCUGUUC-3′). On the basis of the location of putative body TRSs, the synthesis of nine mRNAs by SARS-CoV was expected: the genomic mRNA (RNA1) and eight subgenomic mRNAs with sizes of approximately 8.4, 4.6, 3.8, 3.5, 3.0, 2.6, 2.1 and 1.8 kb (including 5′ leader and 3′ poly(A)-tail). However, in the first published experimental analysis of the SARS-CoV-specific mRNAs generated in infected Vero cells, the synthesis of only five viral mRNAs could be confirmed. To investigate SARS-CoV RNA synthesis in more detail, Vero cells were infected with SARS-CoV isolates Frankfurt-1 and HKU-39849, and intracellular RNA was analyzed by hybridization with oligonucleotide probes complementary to a part of the 5′ leader sequence and a sequence just upstream of the 3′ poly(A) tail. The coronavirus IBV, which also replicates in Vero cells, was used as control and size marker. As illustrated in Figure 3(B), the genomic RNA and all eight predicted subgenomic transcripts were detected with both SARS-CoV probes, confirming the fact that these RNAs contain both common 5′-terminal and common 3′-terminal sequences. Remarkably, a slight mobility shift was observed for RNAs 7 and larger of the Frankfurt-1 isolate. The subsequent sequence analysis of this virus revealed that this was due to a 45 nt in-frame deletion in ORF7b, probably the first documented example of SARS-CoV genetic adaptation to cell culture conditions. The confirmation of leader-body fusion sites of the SARS-CoV subgenomic mRNAs will be published elsewhere. Remarkably, up to four of the eight SARS-CoV subgenomic mRNAs (3, 7, 8, and 9) may be functionally bicistronic (Table 2), as observed occasionally for other coronavirus subgenomic mRNAs.6., 7.

The replicase of coronaviruses includes a variety of putative RNA-processing enzymes

The production of a complex and diverse set of RNA molecules by nidoviruses (including SARS-CoV) is linked to an unparalleled complexity of their giant replicase, which contains a variety of (putative) enzymatic functions and a number of completely uncharacterized domains (Figure 1). We have initiated the characterization of coronavirus replicase by comparative genomics, and have regularly updated this analysis through recent years).18., 32. Our continuing analysis has now identified distant coronavirus homologs of not less than five cellular enzymes that are associated with RNA processing (Figure 4): poly(U)-specific endoribonuclease (XendoU), a 3′-to-5′ exonuclease (ExoN) that belongs to the DEDD superfamily, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase (2′-O-MT) of the RrmJ family, adenosine diphosphate-ribose 1″-phosphatase (ADRP), and cyclic phosphodiesterase (CPD).45., 46. In the SARS-CoV proteome, conserved domains presumably associated with these activities were mapped (from the N to C terminus) to the X domain of nsp3 (ADRP), the N-terminal domain of nsp14 (ExoN), a “nidovirus-specific” replicase domain26., 48. in the C-terminal part of nsp15 (XendoU), and nsp16 (2′-O-MT). The CPD-related domain is not conserved in SARS-CoV, but was identified in the product of ORF2 of established group 2 coronaviruses, and in the very C-terminal domain of the torovirus ORF1a polyprotein, as well as in some double-stranded RNA rotaviruses.

Figure 4

Sequence alignments of protein families that include cellular enzymes involved in RNA processing and their nidovirus homologs. Our in-depth comparative sequence analysis (see Materials and Methods) revealed a statistically significant relationship between functionally uncharacterized proteins (domains) of nidoviruses, including SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron excision to produce mature tRNA and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA (Figure 5). Shown are alignments for key regions of a few selected members of the following groups of enzymes: (A) XendoU family; (B) ExoN family; (C) 2′-O-MT family; (D) CPD family; and (E) ADRP family. These protein families may be known also under other names. Cellular homologs, not necessarily including proteins involved in the discussed RNA processing pathways, are listed in the top segment of each alignment and nidovirus proteins in the bottom segment. In the CPD family, along with group 2 coronavirus representatives, proteins of two rotaviruses (double-stranded RNA viruses), which were identified in this study, are listed. In both segments, residues are highlighted independently: black for absolutely conserved residues and different shades of grey to indicate different levels of conservation; amino acid similarity groups used were: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) F, W, Y; and (v) I, L, M, V. Positions occupied by identical or similar residues in all proteins under comparison are indicated with an asterisk (∗) and colon (:), respectively, in the inter-segment row. For the ExoN family, three motifs conserved in the DEDD superfamily and Zn-finger unique for the ExoN family are indicated. Database accession numbers for nidovirus genome sequences: SARS-CoV, Entrez Genomes accession number NC_004718 (AY274119); MHV-A59, NC_001846; BCoV-Lun, AF391542; HCoV-229E, NC_002645; IBV-B, NC_001451; PEDV, NC_003436; TGEV, NC_002306; equine torovirus (EToV), X52374; equine arteritis virus (EAV), X53459; porcine reproductive and respiratory syndrome virus (PRRSV), M96262; gill-associated virus (GAV), AF227196. Abbrevations and NCBI protein database ID number or SwissProt names of the remaining protein sequences are: (A) Npun 0562, hypothetical protein of Nostoc punctiforme, ZP_00106190; Poliv smB, pancreatic protein of Paralichthys olivaceus, BAA88246; Celeg Pp11, placental protein 11-like precursor of Caenorhabditis elegans, NP_492590); Xlaev endoU, endoU protein of Xenopus laevis, CAD45344; pp1b, ORF1b-encoded part of nidovirus replicase polyprotein 1ab. (B) Yeast PAN2, PAB-dependent poly(A)-specific ribonuclease subunit PAN2 of Saccharomyces cerevisiae, P53010; Mycge DPO3, DNA polymerase III polC-type, containing exonuclease domain, of Mycoplasma genitalium, P47277; Bacsu DING, probable ATP-dependent helicase dinG homolog, containing exonuclease domain, of Bacillus subtilis, P54394; Ecoli DP3E, DNA polymerase III, epsilon chain, containing exonuclease domain, of Escherichia coli, P03007 (PDB: 1J53 and 1J54); Ecoli RNT, exoribonuclease T of Escherichia coli, P30014. (C) Hsap AKA, A-kinase anchoring protein 18 gamma of Homo sapiens, AAF28106; Athal CPD1, putative CPD1 of Arabidopsis thaliana, CAA16750; Athal CPD2, putative CPD2 of Arabidopsis thaliana, CAA16751; yeast YG59, hypothetical 26.7 kDa protein of yeast, P53314; Ecoli LIGT, 2′-5′ RNA ligase of Escherichia coli, P37025; ns2, non-structural protein (ORF2-encoded) of the coronaviruses HCoV-O43 (AAA74377), BCoV-Quebec (P18517), and MHV-A59 (P19738); EToV pp1a, C-terminal fragment of EToV pp1a, S11237; HRoV VP3, VP3 of human rotavirus, BAA84964; ARoV VP3, VP3 of avian rotavirus PO-13, BAA24128. (D) Ecoli o177, putative polyprotein of Escherichia coli, AAC74129; Hsap Y1268a, KIAA1268 protein of Homo sapiens, BAA86582; Hsap H2A1.1, histone macroH2A1.1 of Homo sapiens, AAC33434; yeast YMX7, hypothetical 32.1 kDa protein of yeast, Q04299; yeast YBN2, hypothetical 19.9 kDa protein of yeast, P38218. (E) Yeast YBR1, putative ribosomal RNA methyltransferase (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P38238; yeast SPB1, putative rRNA methyltransferase SPB1 of yeast, P25582; yeast YGN6, putative ribosomal RNA methyltransferase YGL136c (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P53123; Ecoli FTSJ, cell division protein of Escherichia coli, NP_417646.

Figure 5

Nidoviruses encode homologs of cellular enzymes involved in RNA processing. (A) The cellular pathways for processing of pre-U16 snoRNA and pre-tRNA splicing are summarized, with relevant enzymatic activities indicated. For details, see the text. Homologs of the highlighted enzymes have been identified in nidoviruses (see also Figure 1 and the text). (B) Table summarizing the conservation of homologs of the cellular enzymes presumably involved in RNA processing in SARS-CoV and different nidovirus groups.

The conservation in the ExoN, 2′-O-MT and CPD-related domains of nidoviruses includes the catalytic and other active-site residues identified in the prototype cellular enzymes. Although the active-site residues of the ADRP and XendoU families are yet to be characterized, the most conserved amino acids of these families are found in their putative nidovirus homologs. Some of the nidovirus domains may contain unique and conserved additional domains. For instance, we noted that the nidovirus ExoN homologs contain an additional conserved domain resembling a mononuclear Zn-finger (Figure 4(B)) between the universally conserved blocks I and II, which include the catalytic residues (two Asp and one Glu). Another Zn-finger-like module has been inserted between blocks II and III in the ExoN homolog of roniviruses, a subset of nidoviruses (data not shown). Our combined observations indicate that the nidovirus homologs of these cellular RNA processing enzymes must be enzymatically active, although they may have evolved to act on specific (and unique) substrates or have additional unique components. The newly predicted enzymes could be involved in the metabolism of virus and/or cellular RNAs. For instance, the 2′-O-MT activity could be used to produce the 5′-cap of viral mRNAs, as was demonstrated for a homologous flavivirus enzyme. Based on a parallel with some cellular DNA-processing homologs, like exonuclease I and the exonuclease domain of DNA polymerases, it is tempting to speculate on a link between the ExoN activity and RNA proofreading, repair, and/or recombination. The first two activities are not known in RNA viruses, and recombination commonly proceeds through the copy-choice mechanism with RdRp switching templates to produce chimeric nascent chains. However, due to the extreme sizes of their giant genomes, coronaviruses may differ from other RNA viruses and share an unprecedented similarity with DNA-based life-forms in the mechanisms of genome biosynthesis and maintenance. If confirmed, these unusual properties would explain the preliminary reports on the resistance of SARS-CoV to ribavirin, a drug that was shown to force other RNA viruses into “error catastrophe”. The experimental verification of these predictions will be an important step in increasing our understanding of the functional roles these putative enzymes play in the replicative cycle of SARS-CoV and related viruses. Extensive attempts to demonstrate the 2′-O-MT activity of several coronaviruses (which was also recently predicted by others) in a 5′-RNA-capping reaction have not produced conclusive evidence so far (J.Z. and A.E.G., unpublished results). This development indicates that, as before with other distant nidovirus homologs (e.g. the helicase), the translation of bioinformatics predictions into a functional description is likely to be a laborious and time-consuming process, involving mainly the identification of virus-specific substrates and proper assay conditions. In this respect, we have made an observation that both provides additional support for the provisional assignments made above and may help in the experimental verification of the predicted activities. When the five enzyme families listed (Figure 4) above were analyzed as a single dataset, it became apparent that representatives of these families cooperate in two nuclear intron RNA processing pathways. These pathways are functionally antagonistic: intron excision aimed at the synthesis of mature tRNA and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA (Figure 5(A)). In the first pathway, XendoU initiates a cascade of poorly characterized endo- and exonuclease reactions that may involve ExoN, a homolog of the yeast Rrp6p exosome component, ultimately leading to the production of mature U16 and U86 snoRNAs. Subsequently, these snoRNAs may be utilized in diverse rRNA processing events involving nucleotide methylation by fibrillarin, a 2′-O-MT, and assisted by helicase(s). Strikingly, the homologs of three cellular enzymes from this pathway, encoded in the replicases of all nidoviruses except for arteriviruses, are genetically clustered in a single protein block (nsp14–nsp16) immediately downstream of the RNA-helicase (nsp13) (Figure 1, Figure 4). Because of the proximity of these four domains to each other, their expression must be tightly coordinated at the level of 3CLpro proteolysis and by the upstream ORF1a/ORF1b ribosomal frameshift signal. Nidoviruses encode homologs of cellular enzymes involved in RNA processing. (A) The cellular pathways for processing of pre-U16 snoRNA and pre-tRNA splicing are summarized, with relevant enzymatic activities indicated. For details, see the text. Homologs of the highlighted enzymes have been identified in nidoviruses (see also Figure 1 and the text). (B) Table summarizing the conservation of homologs of the cellular enzymes presumably involved in RNA processing in SARS-CoV and different nidovirus groups. In the other pathway, which involves tRNA-processing, the utilization of a 2′-phosphate group of a splicing intermediate involves the conversion of adenosine diphosphate ribose 1″-2″ cyclic phosphate (Appr>p) by CPD into adenosine diphosphate ribose 1″-phosphate (Appr-1″-p), of which the phosphate group may be further processed by an ADRP. Both these activities may drive the production of mature tRNA. Although the nidovirus homologs of CPD and ADRP remain to be characterized, they are not under the control of the ORF1a/ORF1b ribosomal frameshift signal (Figure 1) and may thus, unlike the ORF1b-encoded enzymes, be produced in larger quantities. The nidovirus homologs of the five RNA processing enzymes discussed above may interfere with these or similar cellular RNA processing pathways to reprogram the cell for the benefit of virus reproduction. It seems even more conceivable that they, alone or in concert with other enzymes like the RdRp or helicase, are involved directly in viral RNA synthesis, particularly in transcription, which, in an apparent parallel with snoRNA-driven processes, is guided by conserved oligonucleotide base-pairing interactions (Figure 3(C)). The viral enzymes, like their cellular counterparts, might be part of separate pathways or, alternatively, cooperate in a single pathway in which the XendoU, ExoN and 2′-O-MT homologs provide RNA specificity, and the CPD and ADRP homologs modulate the pace through processing of compound(s) containing 2′-phosphate groups. In this respect, we note that both the XendoU/ExoN/2′-O-MT and CPD/ADRP cellular pathways start with an endoribonuclease-mediated cleavage to produce molecule(s) with 2′-3′-cyclic phosphate termini (Figure 5), indicating the structural basis for possible cooperation of the coronavirus homologs of these enzymes in a single pathway. The expected functional hierarchy of the five putative nidovirus enzymes (Figure 5(A)) is supported by their corresponding evolutionary conservation, with the XendoU homolog being absolutely conserved and the CPD homolog being least conserved among nidoviruses (Figure 5(B)).

Concluding Remarks

The availability and comparative analysis of the SARS-CoV genome and proteome set the stage for the extensive biological characterization of this emerging pathogen and the development of anti-SARS-CoV strategies. Our conclusion that SARS-CoV is distantly related to group 2 coronaviruses (Figure 2) implies that viruses from this group, in particular the extensively studied mouse hepatitis virus and its derivatives lacking non-essential CPD-like and HE genes, may be the best available models for both in vitro and in vivo studies, in particular where the synthesis of viral macromolecules and the structure and function of the replication complex are involved. A detailed comparative characterization of the BCoV/HCoV-OC43 pair may provide invaluable insights into the processes of adaptation of a non-human coronavirus to a human host, which should be highly relevant to understanding the emergence of SARS-CoV. The SARS-CoV genome (Figure 1) lacks genes that are common in group 2 viruses, like PL1pro and CPD-like and HE genes, but encodes a number of unique protein sequences, underlining the ability of coronaviruses to the gross evolution. The comparative studies presented here have tentatively identified both known and novel viral enzymes (Figure 1, Figure 5), most of which may be involved in RNA processing and have homologs of which the tertiary structure has been solved (Figure 1). Intriguing parallels have been drawn between these putative viral enzymes and characterized, but distant cellular homologs that will guide the functional dissection of the replicases of SARS-CoV and related viruses and may put the mechanism of coronavirus RNA synthesis in a completely new perspective. The newly described putative enzymes of SARS-CoV double the list of potential targets for the design of antiviral strategies aimed at controlling this emerging virus infection.33., 34.

Materials and Methods

Analysis of intracellular SARS-CoV RNA

Vero cells were infected with SARS-CoV (Frankfurt 1 or HKU-39849) at an MOI of 0.01 or were mock infected. At the onset of cytopathogenic effect (approximately 40 hours post infection), intracellular RNA was isolated by cell lysis for ten minutes at room temperature with 5% (w/v) lithium dodecyl sulfate in LET buffer (10 mM Tris–HCl (pH 7.4), 100 mM LiCl, 1 mM EDTA), containing 20 μg/ml of proteinase K. After shearing of the cellular DNA using a syringe, lysates were incubated at 42 °C for 15 minutes, extracted with phenol (pH 4.0) and chloroform, and RNA was ethanol-precipitated. The RNAs were separated in denaturing 1% (w/v) agarose gels containing 2.2 M formaldehyde and Mops buffer (10 mM Mops (sodium salt) (pH 7), 5 mM sodium acetate, 1 mM EDTA). Dried gels were used for direct hybridization with 32P-labeled oligonucleotides SARSV001 (5′-CGAGGTTGGTTGGCTTTTCCTG-3′) and SARSV002 (5′-CACATGGGGATAGCACTAC-3′), which are complementary to sequences in the SARS-CoV leader sequence and the genomic 3′ end, respectively. After hybridization, gels were analyzed using a Personal FX Molecular Imager and Quantity One software (both from Bio-Rad).

Methods for bioinformatics

Genpeptides, Conserved domain (CD) and protein family (Pfam) databases were used in this study. Amino acid sequence alignments were generated using ClustalX1.81 and Dialign2 programs assisted by Blosum position-specific matrices, and were processed for presentation using GeneDoc. Multiple sequence alignments were converted into hidden Markov model (HMM) profiles using HMMER2.01 software. Sequence databases were searched in default mode, unless stated otherwise, using the HMMER2.01 package.64., 69. and a family of Blast programs. The expectation values of similarity (E) of 0.05 or lower for Blast searches and 0.1 or lower for HMMER-mediated searches were considered to be statistically significant. Database searches with nidovirus proteins (Table 1, Table 2) and their alignments were conducted in an iterative mode until no new homologs were identified. Also, sequences that were identified below the threshold during the last iteration were used to initiate reciprocal searches that might have resulted in new significant matches. This approach worked for all protein families described here, except for the identification of the relationship between the nidovirus ExoN family and cellular DEDD superfamily, which is known to be extremely diverse. In this latter case, using the MAST program, we found a strong match (p=3 e−10) between the most conserved motif III of a DEDD protein and a conserved block of the ExoN family that facilitated the identification of the two other motifs in the nidovirus proteins having a non-typical intermotif spacing partially occupied by Zn-finger(s) (see the text and Figure 4). Furthermore, we observed an approximately 30 times selective increase of the global similarity between the ExoN family and DEDD proteins, after the coronavirus sequences were modified artificially by removing putative Zn-fingers that are not present in the DEDD proteins. In the HMMER-mediated searches of >106 sequences using this Zn-finger-deficient ExoN family as a query, numerous DEDD proteins were retrieved immediately after the nidovirus proteins, starting with E=0.81. The relatively poor statistics of these hits were due to the failure by HMMER to align all three motifs. Cluster phylogenetic trees were reconstructed using the neighbour-joining algorithm described by Saitou & Nei with the Kimura correction, and were evaluated with 1000 bootstrap trials, as implemented in the ClustalX1.81 program. Parsimonious trees were generated using exhaustive search and evaluated with bootstrap branch-and-bound search using a UNIX version of the PAUP∗ 4.0.0d55 program that is included in the GCG-Wisconsin Package programs. The resulting trees were visualized using the TreeView program.

72 in total

Review 1. Evolution of domain families.

Authors: C P Ponting; J Schultz; R R Copley; M A Andrade; P Bork
Journal: Adv Protein Chem Date: 2000

2. Structure of Escherichia coli exonuclease I suggests how processivity is achieved.

Authors: W A Breyer; B W Matthews
Journal: Nat Struct Biol Date: 2000-12

3. Crystal structure of a fibrillarin homologue from Methanococcus jannaschii, a hyperthermophile, at 1.6 A resolution.

Authors: H Wang; D Boisvert; K K Kim; R Kim; S H Kim
Journal: EMBO J Date: 2000-02-01 Impact factor: 11.598

4. Combining evidence using p-values: application to sequence homology searches.

Authors: T L Bailey; M Gribskov
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

Review 5. Hidden Markov models.

Authors: S R Eddy
Journal: Curr Opin Struct Biol Date: 1996-06 Impact factor: 6.809

Review 6. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

Review 7. New insights into the mechanisms of RNA recombination.

Authors: P D Nagy; A E Simon
Journal: Virology Date: 1997-08-18 Impact factor: 3.616

8. Functions of the exosome in rRNA, snoRNA and snRNA synthesis.

Authors: C Allmang; J Kufel; G Chanfreau; P Mitchell; E Petfalski; D Tollervey
Journal: EMBO J Date: 1999-10-01 Impact factor: 11.598

9. An RNA cap (nucleoside-2'-O-)-methyltransferase in the flavivirus RNA polymerase NS5: crystal structure and functional characterization.

Authors: Marie-Pierre Egloff; Delphine Benarroch; Barbara Selisko; Jean-Louis Romette; Bruno Canard
Journal: EMBO J Date: 2002-06-03 Impact factor: 11.598

10. Sequence of the 3'-terminal end (8.1 kb) of the genome of porcine haemagglutinating encephalomyelitis virus: comparison with other haemagglutinating coronaviruses.

Authors: A Marie-Josée Sasseville; Martine Boutin; Anne-Marie Gélinas; Serge Dea
Journal: J Gen Virol Date: 2002-10 Impact factor: 3.891

700 in total

1. Severe acute respiratory syndrome coronavirus phylogeny: toward consensus.

Authors: Alexander E Gorbalenya; Eric J Snijder; Willy J M Spaan
Journal: J Virol Date: 2004-08 Impact factor: 5.103

2. The 3' cis-acting genomic replication element of the severe acute respiratory syndrome coronavirus can function in the murine coronavirus genome.

Authors: Scott J Goebel; Jill Taylor; Paul S Masters
Journal: J Virol Date: 2004-07 Impact factor: 5.103

Review 3. Viral quasispecies evolution.

Authors: Esteban Domingo; Julie Sheldon; Celia Perales
Journal: Microbiol Mol Biol Rev Date: 2012-06 Impact factor: 11.056

4. RNA 3'-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp10/nsp14 exoribonuclease complex.

Authors: Mickaël Bouvet; Isabelle Imbert; Lorenzo Subissi; Laure Gluais; Bruno Canard; Etienne Decroly
Journal: Proc Natl Acad Sci U S A Date: 2012-05-25 Impact factor: 11.205

5. Murine coronavirus induces type I interferon in oligodendrocytes through recognition by RIG-I and MDA5.

Authors: Jianfeng Li; Yin Liu; Xuming Zhang
Journal: J Virol Date: 2010-04-28 Impact factor: 5.103

6. Organ-specific attenuation of murine hepatitis virus strain A59 by replacement of catalytic residues in the putative viral cyclic phosphodiesterase ns2.

Authors: Jessica K Roth-Cross; Helen Stokes; Guohui Chang; Ming Ming Chua; Volker Thiel; Susan R Weiss; Alexander E Gorbalenya; Stuart G Siddell
Journal: J Virol Date: 2009-01-28 Impact factor: 5.103

7. Severe acute respiratory syndrome coronavirus evades antiviral signaling: role of nsp1 and rational design of an attenuated strain.

Authors: Marc G Wathelet; Melissa Orr; Matthew B Frieman; Ralph S Baric
Journal: J Virol Date: 2007-08-22 Impact factor: 5.103

8. Suppression of coronavirus replication by inhibition of the MEK signaling pathway.

Authors: Yingyun Cai; Yin Liu; Xuming Zhang
Journal: J Virol Date: 2006-11-01 Impact factor: 5.103

9. Dynamics of coronavirus replication-transcription complexes.

Authors: Marne C Hagemeijer; Monique H Verheije; Mustafa Ulasli; Indra A Shaltiël; Lisa A de Vries; Fulvio Reggiori; Peter J M Rottier; Cornelis A M de Haan
Journal: J Virol Date: 2009-12-09 Impact factor: 5.103

10. The ORF7b protein of severe acute respiratory syndrome coronavirus (SARS-CoV) is expressed in virus-infected cells and incorporated into SARS-CoV particles.

Authors: Scott R Schaecher; Jason M Mackenzie; Andrew Pekosz
Journal: J Virol Date: 2006-11-01 Impact factor: 5.103