Viruses in the family Coronaviridae have elicited new interest, with the outbreaks caused by SARS-HCoV in 2003 and the recent discovery of a new human coronavirus, HCoV-NL63. The genus Torovirus, within the family Coronaviridae, is less well characterized, in part because toroviruses cannot yet be grown in cell culture (except for the Berne virus). In this study, we determined the sequence of the complete genome of Breda-1 (BoTV-1), a bovine torovirus. This is the first complete torovirus genome sequence to be reported. BoTV-1 RNA was amplified using long RT-PCR and the amplicons sequenced. The genome has a length of 28.475 kb and consisted mainly of the replicase gene ( approximately 20.2 kb) which contains two large overlapping ORFs, ORF1a and ORF1b, encoding polyproteins pp1a and pp1b, respectively. Sequence analysis identified conserved domains within the predicted sequences of pp1a and pp1b. Sequence alignments and protein secondary structure prediction data suggest the presence of a 3C-like serine protease domain with similarity to the arterivirus 3C-like serine protease and a single papain-like cysteine protease domain with similarity to the picornavirus leader protease. The ADRP (APPR-1'') domain - unique to the Coronaviridae - was also located in BoTV pp1a. In addition, several hydrophobic domains were identified that are typical of a nidovirus replicase. Within the pp1b sequence the polymerase and helicase domains were identified, as well as sequences predicted to be involved in ribosomal frameshifting, including the conserved slippery sequence UUUAAAC and two potential pseudoknot structures.
Viruses in the family Coronaviridae have elicited new interest, with the outbreaks caused by SARS-HCoV in 2003 and the recent discovery of a new human coronavirus, HCoV-NL63. The genus Torovirus, within the family Coronaviridae, is less well characterized, in part because toroviruses cannot yet be grown in cell culture (except for the Berne virus). In this study, we determined the sequence of the complete genome of Breda-1 (BoTV-1), a bovine torovirus. This is the first complete torovirus genome sequence to be reported. BoTV-1 RNA was amplified using long RT-PCR and the amplicons sequenced. The genome has a length of 28.475 kb and consisted mainly of the replicase gene ( approximately 20.2 kb) which contains two large overlapping ORFs, ORF1a and ORF1b, encoding polyproteins pp1a and pp1b, respectively. Sequence analysis identified conserved domains within the predicted sequences of pp1a and pp1b. Sequence alignments and protein secondary structure prediction data suggest the presence of a 3C-like serine protease domain with similarity to the arterivirus 3C-like serine protease and a single papain-like cysteine protease domain with similarity to the picornavirus leader protease. The ADRP (APPR-1'') domain - unique to the Coronaviridae - was also located in BoTV pp1a. In addition, several hydrophobic domains were identified that are typical of a nidovirus replicase. Within the pp1b sequence the polymerase and helicase domains were identified, as well as sequences predicted to be involved in ribosomal frameshifting, including the conserved slippery sequence UUUAAAC and two potential pseudoknot structures.
Toroviruses are enveloped, single-stranded, positive-sense RNA viruses with a pleomorphic virion morphology. The first report on toroviruses described the identification of an unclassified virus from diarrheic calves (Woode et al., 1982), now designated as the Breda virus (BRV), or bovine torovirus (BoTV), two serotypes of which have been identified (BoTV-1 and BoTV-2). The virus is endemic in cattle herds, with asymptomatic cows possibly acting as reservoirs. Newborn calves generally develop symptoms of diarrhea, typically lasting 2–13 days (Koopmans et al., 1990, Koopmans et al., 1991) and BoTV has been commonly recovered from calves with diarrhea (Duckmanton et al., 1998a, Duckmanton et al., 1998b, Hoet et al., 2003a, Hoet et al., 2003b). Infection with BoTV has also been conclusively associated with diarrhea in veal calves (Hoet et al., 2003a, Hoet et al., 2003b). Interestingly, BoTV has been reported as well in nasal secretions, and may therefore be able to infect both the respiratory and gastrointestinal tract (Hoet et al., 2002), as is the case with some coronaviruses (Lai and Holmes, 2001), including the SARS coronavirus (Leung et al., 2003). The equine torovirus (EqTV), or Berne virus (BEV), has been designated as the genus prototype; it can be grown in cell culture (unlike BoTV) and consequently is better characterized. Despite the presence of anti-EqTV antibodies in horses, no disease has been firmly associated with EqTV infection. Toroviruses have also been reported in humans with gastroenteritis (Beards et al., 1984, Duckmanton et al., 1997, Jamieson et al., 1998) and pigs (Kroneman et al., 1998, Smits et al., 2003). Antibodies reacting against BoTV and EqTV antigens have been detected in various other mammals, suggesting toroviruses may be widespread (Brown et al., 1987, Weiss et al., 1984).In vitro studies on EqTV grown in cell culture demonstrated the generation of sub-genomic RNAs (sgRNAs) for translation of the ORFs coding for the structural proteins. Approximately 16 kb of sequence from the 3′ end of the genome, and approximately 1.5 kb from the 5′ end of genome have been determined. These sequence data have revealed the basic torovirus genome organization and the presence of conserved replicase domains(Snijder and Horzinek, 1993). These findings were instrumental in the revision of the Coronaviridae taxon to include the genus Torovirus, as well as the creation of the order Nidovirales which currently includes the families Coronaviridae (comprising two genera, Coronavirus and Torovirus), Arteriviridae and Roniviridae (Cavanagh, 1997, Cavanagh et al., 1994, Cavanagh and Horzinek, 1993, González et al., 2003). The complete genome sequence of EqTV has yet to be published, although Smits et al. (2003) reported the completion of the EqTV genome sequence in a recent study of torovirus field variants. The published sequence of BoTV is limited to that of the structural genes, which account for approximately 7.5 kb of the 3′ end of the genome (Cornelissen et al., 1997, Duckmanton et al., 1998a, Duckmanton et al., 1998b). Consequently, the genus Torovirus remains the only genus within the order Nidovirales without a representative complete genome sequence.In this study, we sequenced the entire genome of the bovine torovirus, BoTV-1, using long RT-PCR followed by the sequencing of amplicons. Assembly of the sequences resulted in a genome of 28,475 nucleotides. Sequence analysis revealed the presence of a nidovirus-like replicase gene with conserved domains.
Materials and methods
Source of BoTV-1
BoTV-1 was obtained from Dr. Gerald Woode, as described (Duckmanton et al., 1998a, Duckmanton et al., 1998b). The virus, consisting of stool specimens from gnotobiotic calvesinfected with the original Breda-1 virus, was aliquoted and stored at −80 °C.
Extraction of viral RNA
BoTV-1 RNA was extracted from fecal specimens as described (Duckmanton et al., 1998a, Duckmanton et al., 1998b) using TRIzol Reagent (Invitrogen, Burlington, Ont., Canada). The RNA was resuspended in 10 μl of ddH2O containing 10% 100 mM dithiothreitol (Invitrogen) and 5% 20–40 U/μl RNasin (Promega, Mississauga, Ont.), and stored at −80 °C.
Primers
Primers used in RT-PCR (synthesized by Invitrogen) were designed using Gene Runner 3.05 (Hastings Software, Inc.), based on the available BoTV-1 sequence, and EqTV sequence (GenBank accession no., X52374, and X56016). Primer names, sequences, and positions within the BoTV-1 genome are given in Table 1
. One primer, APPR-RS, was designed within a conserved region of the coronavirus open reading frame (ORF) 1a of the replicase gene. This region was first identified by ORF1a amino acid alignment, followed by the corresponding nucleotide alignment of the infectious bronchitis virus (IBV), GenBank accession no. NC001451, nucleotides (nts) 3646–3699; human coronavirus 229E (HCoV 229E), NC002645, nts 4193–4246; murine hepatitis virus (MHV) A59, NC001846, nts 4239–4292; and transmissible gastroenteritis virus (TGEV), AJ271965, nts 4365–4418. Alignments were done using ClustalX 1.8 (Thompson et al., 1997). The sequence of the APPR-RS primer was based on the consensus sequence of the nucleotide alignment described above. The EEAT7 primer used in the amplification of the 3′ non-coding region (3′ NC) is a coxsackie virus B6 specific primer (Martino et al., 1999). The complementary sequence of the first 21 nucleotides (3′–5′) of this primer were incorporated into the BRE-3′NC-CVB6 primer used in the reverse transcription reaction. Lastly, the AAP-CVB6 primer used in the 5′ RACE PCR reaction was designed to bind within the poly(A) of the cDNA transcripts, and contained sequence complementary to the 3′ end of the EEAT7 primer, allowing for the option of using EEAT7 in a reamplification of 1st round PCR products.
Table 1
RT-PCR primers used for the amplification of the BoTV-1 genome
Position in the BoTV-1 genome refers to the binding site of the primer. Where three or more primers are listed in a primer set, the first primer listed was the primer used in the RT reaction—BRE-1A-RS was used as the RT primer in production of template for PCR reactions with primer sets 6, 7 and 8, and GSP1 was used as the RT primer in production of template for PCR reactions with primer sets 10–12. Primer 1ASEQ-RS1 was used with BRE-1A-S and 1ASEQ-S2 was used with BRE-1A-NRS to generate amplicons H1 and H2, respectively. The names of the amplicons given in the “Primer Set” column refer to the amplicons generated with the given primer set—amplicons are displayed schematically, relative to the BoTV-1 genome, in Fig. 1. “*” indicates the primer was designed using EqTV sequence. Primer EEAT7 is a coxsackie virus B6 specific primer designed by Martino et al. (1999). The 3′ end of this primer was incorporated into the BRE-3′NC-CVB6 RT primer, allowing for the use of EEAT7 in PCR. Primer 1A-APPR-RS was designed using coronavirus consensus sequence in a conserved domain within the 5′ end of POL 1a (see ). The abridged anchor primer was the primer supplied by the Invitrogen 5′ RACE kit and binds within the poly(C) tail, whereas the AAP-CVB6 primer was designed to bind to the poly(A) tail of cDNA transcripts produced in the 5′ RACE reactions (see Section 2 for details).
RT-PCR primers used for the amplification of the BoTV-1 genomePosition in the BoTV-1 genome refers to the binding site of the primer. Where three or more primers are listed in a primer set, the first primer listed was the primer used in the RT reaction—BRE-1A-RS was used as the RT primer in production of template for PCR reactions with primer sets 6, 7 and 8, and GSP1 was used as the RT primer in production of template for PCR reactions with primer sets 10–12. Primer 1ASEQ-RS1 was used with BRE-1A-S and 1ASEQ-S2 was used with BRE-1A-NRS to generate amplicons H1 and H2, respectively. The names of the amplicons given in the “Primer Set” column refer to the amplicons generated with the given primer set—amplicons are displayed schematically, relative to the BoTV-1 genome, in Fig. 1. “*” indicates the primer was designed using EqTV sequence. Primer EEAT7 is a coxsackie virus B6 specific primer designed by Martino et al. (1999). The 3′ end of this primer was incorporated into the BRE-3′NC-CVB6 RT primer, allowing for the use of EEAT7 in PCR. Primer 1A-APPR-RS was designed using coronavirus consensus sequence in a conserved domain within the 5′ end of POL 1a (see ). The abridged anchor primer was the primer supplied by the Invitrogen 5′ RACE kit and binds within the poly(C) tail, whereas the AAP-CVB6 primer was designed to bind to the poly(A) tail of cDNA transcripts produced in the 5′ RACE reactions (see Section 2 for details).
Fig. 1
Schematic Illustration of BoTV-1 Genome and of the strategy used for amplification by long RT-PCR. Shown in A and B is the strategy used to amplify the BoTV-1 genome for sequencing purposes. Amplicons A–J are named with respect to the chronological order the reactions were performed. Primers (arrows) are the primers used in PCR (RT primers are not shown here). Fig. 1B is an enlarged area of the 5′ non-coding region and a portion of the 5′ end of ORF1a showing the two amplicons that were obtained to sequence the majority of the 5′ non-coding region—5′ RACE products are not depicted here. The sizes of amplicons (in base pairs) are as follows: A, 7546; B, 5086; C, 2530; D, 2580; E, 10,164; F, 8587; G, 5857; H1, 2582; H2, 3114; I, 825; J, 464. ORF1a: replicase ORF1a, ORF1b: replicase ORF1b, S: spike (peplomer), M: membrane, HE: hemagglutinin esterase, N: nucleocapsid; A: poly(A) tail; the 5′- and 3′ non-coding regions are not labeled in A.
Long RT-PCR
The long RT-PCR was done essentially as described (Tellier et al., 1996a, Tellier et al., 1996b, Tellier et al., 2003). Briefly, purified viral RNA was thawed on ice, incubated at 65 °C for 2 min then placed back on ice. To the RNA was added 10 μl of a master mix composed of 4 μl of 5× 1st Strand Synthesis Buffer (Invitrogen), 0.5 μl of RNasin (20–40 U/μl) (Promega), 1 μl of 100 mM dithiothreitol (Invitrogen), 1 μl of a 10 mM solution of deoxynucleotide triphosphates (dNTPs) (Pharmacia, Piscataway, NJ), 2.5 μl of a 10 μM solution of antisense primer, and 1 μl of Superscript II reverse transcriptase (Invitrogen). The reaction mixture was incubated at 42 °C for 1 h, after which 1 μl each of RNase H (1–4 U/μl) and RNase T1 (900–3000 U/μl) (Invitrogen) were added and the reaction mixture was incubated at 37 °C for 20 min. Ten-fold dilutions of the newly synthesized cDNA were then prepared using ddH2O and the neat and diluted cDNA samples were kept on ice for use in PCR or stored at −80 °C.The PCR master mix was composed of 5 μl of 10× Advantage 2 PCR Reaction Buffer (Clontech, Palo Alto, CA), 1.25 μl of a 10 mM solution of dNTP mixture (Amersham Biosciences, Piscataway), 1 μl each of a sense and antisense primer (10 μM solution), 1 μl of Advantage 2 DNA polymerase (Clontech), and 38.75 μl of ddH2O for each 50 μl reaction. The mixture was aliquoted in a thin-walled PCR tube (Stratagene, La Jolla, CA) and overlaid with 40 μl of molecular grade mineral oil (Sigma). The master mix was transferred to a room dedicated for cDNA synthesis where 2 μl of the RT (or dilution) was added under the oil and the reaction tube was transferred to a Robocycler thermal cycler (Stratagene) in another room. Unless indicated otherwise, all PCR reactions were carried out as follows: 35 cycles of denaturation at 99 °C for 35 s, annealing at 67 °C for 30 s, and elongation at 68 °C for a time specific for the target amplicon (Table 2
). Fig. 1
schematically illustrates the approach used to amplify the BoTV-1 genome by long RT-PCR. Several reactions required modifications from the protocol given above. For 2nd round amplifications where 5 μl of unpurified 1st round PCR product was used as template (e.g., amplicon J), the PCR master mix was adjusted accordingly: 4.5 μl of buffer was used to account for the buffer already present in the template and the volume of ddH2O was adjusted to bring the reaction volume to a total of 50 μl. For 2nd round PCR where the template used was an amplicon purified from an agarose gel, only the ddH2O was adjusted, to bring the final reaction volume to 50 μl. The RT protocol was modified for amplicon J, using the RT primer listed in Table 1, primer set 10. After thawing on ice, the 2.5 μl aliquot of primer was added to the RNA and the mixture was heated at 99 °C for 5 min (Kostouros et al., 2003); the remainder of the reaction was carried out according to the protocol above.
Table 2
PCR specifications for BoTV-1 amplicons used for genome sequencing
PCR
1st round primers
Template
Elongation time(s)
2nd round primers
Template
Elongation time(s)
A
BRE-XMA-S
10−1 RT
8 min × 25
BRE-XMA-S
2 μl Z1
8 min × 25
EEAT7
12 min × 10
EEAT7
12 min × 5
B
BRE-1b(2)-S
10−2 RT
4 min 30 s × 15
N/A
N/A
N/A
BRE-1b-RS
5 min × 10
6 min × 10
C
BRE-1b-S
10−2 RT
5 min × 35
N/A
N/A
N/A
POL1B-RS
D
BRE-5′
10−2 RT
4 min × 25
N/A
N/A
N/A
APPR-RS
5 min × 10
E
BRE-1A-S
10−1 RT
9 min 45 s × 15
N/A
N/A
N/A
BRE-1A-NRS
11 min × 10
13 min × 10
F
BRE-1A-NS
10−1 RT
8 min × 25
N/A
N/A
N/A
BRE-1A-NNRS
12 min × 10
G
BRE-1A-NNS
10−2 RT
4 min 30 s × 15
N/A
N/A
N/A
BRE-1A-N3RS
5 min × 10
6 min × 10
H1
BRE-1A-S
10−2 RT
4 min × 35
N/A
N/A
N/A
1ASEQ-RS1
H2
1ASEQ-S2
10−2 RT
4 min × 35
N/A
N/A
N/A
BRE-1A-NRS
I
5′NC-BEV
Neat RT
3 min × 352
5′NC-BEV
1 μl Y
3 min × 35
GSP2
10−1 RT
GSP4
J
BRV-END
Neat RT
3 min × 353
BRV-END
1 μl Z4
3 min × 25
GSP5
GSP5
K*
AAP
Neat TdT
2 min × 35
N/A
N/A
N/A
GSP6
L*
AAP-CVB6
Neat TdT
2 min × 35
N/A
N/A
N/A
GSP6
For PCR with multiple elongation times, the times and cycles are given in chronological order. The letters given in the “PCR” column refer to the amplicons shown in Fig. 1. The “Template” column provides the dilution of the RT or TdT (terminal deoxynucleotidyl transferase reaction of the 5′ RACE) template used in PCR. See Table 1 for primer details. Z: 1st round PCR product was purified from an agarose gel and used as template in 2nd round PCR. Y: 1 μl of 1st round PCR was used as template in a hemi-nested PCR reaction. 1–10 μl of the first round PCR product was purified from an agarose gel; a 10−1 dilution of the purified DNA was used in 2nd round PCR. 2: The annealing temperature used for both 1st and 2nd round PCR was 65 °C × 30 s. 3: The annealing temperature used in both 1st and 2nd round PCR was 50 °C × 1 min. 4–40 μl of 1st round PCR product was purified from an agarose gel; undiluted and 10−1 to 10−3 dilutions of the purified DNA were used as template in 2nd round PCR. *: Amplicons K and L were obtained via 5′ RACE reactions—see Section 2 for details.
PCR specifications for BoTV-1 amplicons used for genome sequencingFor PCR with multiple elongation times, the times and cycles are given in chronological order. The letters given in the “PCR” column refer to the amplicons shown in Fig. 1. The “Template” column provides the dilution of the RT or TdT (terminal deoxynucleotidyl transferase reaction of the 5′ RACE) template used in PCR. See Table 1 for primer details. Z: 1st round PCR product was purified from an agarose gel and used as template in 2nd round PCR. Y: 1 μl of 1st round PCR was used as template in a hemi-nested PCR reaction. 1–10 μl of the first round PCR product was purified from an agarose gel; a 10−1 dilution of the purified DNA was used in 2nd round PCR. 2: The annealing temperature used for both 1st and 2nd round PCR was 65 °C × 30 s. 3: The annealing temperature used in both 1st and 2nd round PCR was 50 °C × 1 min. 4–40 μl of 1st round PCR product was purified from an agarose gel; undiluted and 10−1 to 10−3 dilutions of the purified DNA were used as template in 2nd round PCR. *: Amplicons K and L were obtained via 5′ RACE reactions—see Section 2 for details.Schematic Illustration of BoTV-1 Genome and of the strategy used for amplification by long RT-PCR. Shown in A and B is the strategy used to amplify the BoTV-1 genome for sequencing purposes. Amplicons A–J are named with respect to the chronological order the reactions were performed. Primers (arrows) are the primers used in PCR (RT primers are not shown here). Fig. 1B is an enlarged area of the 5′ non-coding region and a portion of the 5′ end of ORF1a showing the two amplicons that were obtained to sequence the majority of the 5′ non-coding region—5′ RACE products are not depicted here. The sizes of amplicons (in base pairs) are as follows: A, 7546; B, 5086; C, 2530; D, 2580; E, 10,164; F, 8587; G, 5857; H1, 2582; H2, 3114; I, 825; J, 464. ORF1a: replicase ORF1a, ORF1b: replicase ORF1b, S: spike (peplomer), M: membrane, HE: hemagglutinin esterase, N: nucleocapsid; A: poly(A) tail; the 5′- and 3′ non-coding regions are not labeled in A.
5′ RACE
The 5′ RACE kit (Invitrogen) was used for amplification of the extreme 5′ end of the BoTV-1 genome, according to the manufacturer's protocol with modifications. The RT was performed using the protocol above, with Superscript II, GSP1 primer, and with the adjusted RNA denaturation incubation time and temperature of 99 °C for 5 min as above. The cDNA was purified using QIAquick PCR clean-up kit (Qiagen, Mississauga, Ont., Canada) according to the manufacturer's protocol and eluted in 30 μl of supplied elution buffer. The cDNA tailing reaction was performed with the terminal deoxynucleotidal transferase enzyme supplied with the kit, according to the manufacturer's protocol with the exception of an increased incubation time of 5 min at the protocol specified 94 °C. Tailing was performed in the presence of dCTP (provided), or dATP (Pharmacia). PCR was performed with Advantage 2 DNA polymerase using the protocol above, with modifications: 5 μl of the tailing reaction was used as template, and 2 μl of each primer were used (instead of 1 μl each); the volume of the ddH2O was adjusted accordingly to bring the final reaction volume to 50 μl.
Sequencing
Purified amplicons were submitted for automated sequencing of both strands, which was carried out by the DNA Sequencing Facility, Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ont., Canada. Amplicons were cut out of agarose gels under direct visualization on a Dark Reader transilluminator (Clare Chemical Research Inc., Dolores, CO), with a disposable scalpel and the DNA recovered in 40–50 μl of the supplied elution buffer using Clontech Nucleospin Gel Extraction Kit according to the manufacturer's instructions. The initial sequencing reactions were done using the PCR primers. This sequence then allowed for the design of sequencing specific primers. The sequencing continued in a similar step-wise fashion until the sequences of the top and bottom strand crossed over, at which point the remaining sequencing primers could be designed in a single step.
Sequence assembly and analysis
The individual sequence fragments were aligned and assembled using Gene Runner Version 3.05. The fully assembled BoTV-1 genome sequence was analyzed using Gene Runner; multiple alignments with ClustalX 1.8 (Thompson et al., 1997), BLAST, and PSI BLAST search programs (Altschul et al., 1997); secondary structure and protein fold prediction using the 3D-PSSM Protein Fold Recognition (Threading) Server (Fischer et al., 1999, Kelley et al., 1999, Kelley et al., 2000) (http://www.sbg.bio.ic.ac.uk/∼3dpssm/); and RNA pseudoknot secondary structures were identified using MFOLD Version 3.1 (Zuker, 2003) (http://www.bioinfo.rpi.edu/applications/mfold/old/rna/).The Breda-1 torovirus sequence was entered into the Virus Orthologous Clusters (VOCs) database for analysis and comparison using the complete coronavirus database and suite of tools (www.sarsresearch.ca).Composition vector trees (CVTrees) were generated using several complete genome sequence of representative viruses in the order Nidovirales, including members of all genera. The CVTrees were generated using the CVTree program (Qi et al., 2004a, Qi et al., 2004b) (http://cvtree.cbi.pku.edu.cn), using a value of K
= 5.
Results
Amplification and sequence analysis of the BoTV-1 genome
The BoTV-1 genome was reverse transcribed and amplified in 8 RT-PCR reactions yielding 8 overlapping products (Fig. 1) (amplicons A–E and I–K—amplicon K is the 5′ RACE product, not shown in Fig. 1). Five additional amplicons, F-H2, which overlapped the large 10,164 bp amplicon (E), were obtained to facilitate sequencing. Lastly, the 5′ end of the genome was amplified for sequencing using two 5′ RACE reactions.The complete BoTV-1 sequence was assembled from the sequences of overlapping amplicons. The total genome was composed of 28,475 nts and had a G + C content of 38%. The genome sequence has been deposited in GenBank with the accession no. AY427798.
Analysis of newly identified sequence of BoTV-1
Sequencing of the genome upstream of the structural genes yielded 20,920 nts of novel BoTV-1 sequence. Although all amplicons were sequenced on both strands, four positions were identified where two different nucleotides were equally credible at each of these four positions based on the analysis of the chromatograms on both strands. These ambiguities were considered genuine; only two of these resulted in a change of amino acid (nt positions 1895 and 9470).Analysis of this novel sequence identified two large ORFs that overlap by 13 nts. This arrangement is common to the nidovirus replicase gene, and the two ORFs were identified as ORF1a and ORF1b (Snijder and Horzinek, 1993). The 5′ end of the genome begins with a non-coding region of 858 nts. The torovirus 5′ non-coding region is substantially larger than the coronavirus 5′ non-coding region—the two largest being TGEV at 314 nts and IBV at 528 nts. Given the difficulty in amplifying this region of the genome (the full-length RACE amplicon of 464 bp (amplicon J, Fig. 1) was only obtained when an increased time and temperature of RNA denaturation were used), we suspect that strong secondary structures exist.
Comparison of the 3′ end of BoTV-1 sequence with that previously reported
To confirm the previously reported sequence of BoTV-1 (approximately 7.5 kb comprising the 3′ end of the genome) (Duckmanton et al., 1998a, Duckmanton et al., 1998b), this region was re-sequenced. Comparison of sequences showed an identity of 97% at the nucleotide and amino acid levels for the M, HE, and N genes and for the 3′ non-coding region. The sequence of the S gene in this study was 93.1% in agreement at the nucleotide level, and 91.6% at the amino acid level compared to that reported by Duckmanton et al., 1998a, Duckmanton et al., 1998b. Differences were scattered except for a cluster of discrepancies between the two sequences in the small region 21,773–21,985 of the sequence reported here, which displays only 57.7% identity with the corresponding region of the previously reported sequence. Comparison of the sequence reported here with the S gene sequence of the B145 BoTV isolate published by Smits et al. (2003) revealed an identity of 95.5% at the nucleotide level and 97.2% identity at the amino acid level.The length of the HE gene was 9 nts longer from that reported previously for BoTV-1 and BoTV-2 (Cornelissen et al., 1997, Duckmanton et al., 1998a, Duckmanton et al., 1998b), extending the protein sequence by 3 amino acids. A stretch of seven thymidine residues (27,715–27,721), as opposed to eight in the previously reported sequence, is responsible for this difference. This resulted in a shift of the ORF leading to extension of the gene and protein sequence. The sequences of several BoTV isolates characterized by Smits et al. (2003) are in agreement with the length of the HE gene/protein sequence reported here. Of the six isolates sequenced within the structural genes, four of these had HE genes of 1260 nts, identical to what is reported here. The other two isolates contained HE genes shorter by only 3 nts, although the structural genes were identified as having undergone recombination with a porcine torovirus resulting in an HE gene of mixed origin (Smits et al., 2003).The sequence of the 3′ non-coding region matches the previously reported sequence with only two mismatches. With the primer used in the reverse transcription reaction, a portion of the poly(A) tail was amplified in PCR. Consequently, sequencing of this amplicon included poly(A) sequence ensuring that the extreme 3′ end of the genome had been amplified and the last nucleotide at the 3′ end of the genome had been determined.
Analysis of BoTV-1 ORFs
Fig. 2
illustrates the potential ORFs of the BoTV-1 genome. Table 3
lists the gene and protein characteristic of the BoTV-1 genome, based on currently known genes of the family Coronaviridae.
Fig. 2
Map of BoTV genome. Top light colored line bars show all possible open reading frames. Start and stop codons in all three frames are represented by vertical bars in the middle section of the figure. The annotated top strand, or rightward transcribed, genes are shown in the bottom solid arrow bars above position numbers in the genome.
Table 3
Gene and protein characteristics of the BoTV-1 genome, established with the VOCs software (Upton et al., 2003)
Gene name
ORF start
ORF stop
Molecular weight
Adenine + thymine %
pI
No of amino acids
BoTV-1 pp1a
859
14,196
50,5921
62.59
5.84
4445
BoTV-1 pp1ab
859
21,059
76,5907
62.64
6.14
6733
BoTV-1 S
20,975
25,729
177,876
61.18
5.87
1584
BoTV-1 M
25,758
26,459
26,410
60.40
9.05
233
BoTV-1 HE
26,477
27,736
46,706
61.99
5.48
419
BoTV-1 N
27,775
28,278
18,868
54.16
11.82
167
Map of BoTV genome. Top light colored line bars show all possible open reading frames. Start and stop codons in all three frames are represented by vertical bars in the middle section of the figure. The annotated top strand, or rightward transcribed, genes are shown in the bottom solid arrow bars above position numbers in the genome.Gene and protein characteristics of the BoTV-1 genome, established with the VOCs software (Upton et al., 2003)The putative replicase gene of BoTV-1 is comparable to the replicase gene of the coronaviruses in many aspects. Most obvious is its size, which at 20,201 nts in length is comparable to the coronavirus replicase genes with a range of 20–22 kb (Lai and Cavanagh, 1997). The putative ORF1a of 13,338 nts and ORF1b of 6876 nts encode a ORF1a/1b fusion polyprotein (pp1a/1b) of 6733 aa. A nucleotide sequence comparison of ORF1b between BoTV-1 and EqTV, the latter having been reported previously (Snijder et al., 1990), revealed a high degree of identity. At the nucleotide and amino acid level the two viruses display 83.8 and 91.0% identity, respectively. Therefore, it is not surprising that similar potential secondary structures were identified in the ORF1a/ORF1b overlap, and the same conserved domains of ORF1b identified in EqTV and other nidoviruses were also found in BoTV ORF1b (Snijder et al., 2003, Snijder et al., 1990).The ORF1b protein product is believed to be translated via a ribosomal frameshifting event generating the pp1a/1b fusion polyprotein. The two elements required for efficient ribosomal frameshifting, a slippery sequence and a RNA pseudoknot (Brierley, 1995), were demonstrated in EqTV (Snijder et al., 1990). The heptanucleotide slippery sequence found in BoTV, UUUAAAC, (nts 14,148–14,190) is identical to the slippery sequence identified in EqTV and the coronaviruses(Lai and Cavanagh, 1997, Snijder et al., 1990). Secondary structure prediction of the ORF1a/1b overlap region identified two potential RNA pseudoknot structures downstream of the slippery sequence (Fig. 3
). Not surprisingly, these two structures are very similar to those predicted for EqTV.
Fig. 3
Predicted pseudoknot structures in the BoTV-1 ORF1a/1b overlap region. A and B represent the two structures predicted using MFOLD software in the analysis of the ORF1a/1b overlap region. Loop 1,2 and stem 1,2 in (A) and (B) refer to structural features of the depicted pseudoknots. Note that the two structures have identical hairpin loops, but differ in the region downstream of the hairpin which folds back to bind to the loop of the hairpin structure, i.e., B has a larger loop 2 and a different stem 2 of the pseudoknot. Also note that loop 2 in B has the potential to form an internal secondary structure. C and D are schematic representations of A and B, respectively. The 5′ and 3′ orientations are given in each figure. The ORF1a stop codon is underlined in A and B, and the slippery sequence, UUUAAAC, is boxed in all four figures. The dashed line in B indicates that the “C” residue can potentially bind to either of the G residues, or both. A and C show BoTV-1 nucleotides 14,184–14,255; B and D show nucleotides 14,184–14,305.
Predicted pseudoknot structures in the BoTV-1 ORF1a/1b overlap region. A and B represent the two structures predicted using MFOLD software in the analysis of the ORF1a/1b overlap region. Loop 1,2 and stem 1,2 in (A) and (B) refer to structural features of the depicted pseudoknots. Note that the two structures have identical hairpin loops, but differ in the region downstream of the hairpin which folds back to bind to the loop of the hairpin structure, i.e., B has a larger loop 2 and a different stem 2 of the pseudoknot. Also note that loop 2 in B has the potential to form an internal secondary structure. C and D are schematic representations of A and B, respectively. The 5′ and 3′ orientations are given in each figure. The ORF1a stop codon is underlined in A and B, and the slippery sequence, UUUAAAC, is boxed in all four figures. The dashed line in B indicates that the “C” residue can potentially bind to either of the G residues, or both. A and C show BoTV-1 nucleotides 14,184–14,255; B and D show nucleotides 14,184–14,305.Due to the relatively high degree of identity between BoTV and EqTV in pp1b, the conserved domains that have been identified in BEV, including the RNA dependant RNA polymerase (RdRp) and helicase, as well as domains with homology to cellular proteins involved in RNA processing (Snijder et al., 2003, Snijder et al., 1990) have also been identified in BoTV (Fig. 4B).
Fig. 4
Domains of the 1ab polyprotein. (A) Predicted conserved domains of the 1a polyprotein. Gray and white boxes represent the relative positions of the conserved domains of pp1a. ADRP, adenosine diphosphate-ribose 1″-phosphatase (formerly the “X” domain in the coronaviruses) (residues 1657–1781); HD, hydrophobic domain (residues ∼2186–2237, 2930–3042, 3506–3616); 3CLSP, 3C-like serine protease (3129–3393); PLP, papain-like protease (1 824–1 989); CPD, cyclic phosphodiesterase (4272–4443). The ADRP domain is named after its cellular homologue. (B) Domains of the 1b Polyprotein. Shown in gray are the relative positions of the conserved domains identified in the pp1b (which, technically is only produced as a fusion protein (pp1a/1b), although only the pp1b portion is shown here for simplicity). RdRP, RNA-dependent RNA polymerase (residues 515–744); Zn Finger, Zn Finger domain containing conserved Cys and His residues—nucleic acid binding domain (residues 845–928); Helicase, viral helicase domain (residues 1 099–1 374); ExoN, 3′-5′ exonuclease (residues 1 414–1 609); XendoU, homologue of poly-U-specific endoribonuclease (residues 1918–1974); 2′-O-MT, S-adenosymethionine-dependent ribose 2′-O-methyltransferase. The ExoN, XendoU, and 2′-O-MT domains are named after their cellular homologues (Snijder et al., 2003).
Domains of the 1ab polyprotein. (A) Predicted conserved domains of the 1a polyprotein. Gray and white boxes represent the relative positions of the conserved domains of pp1a. ADRP, adenosine diphosphate-ribose 1″-phosphatase (formerly the “X” domain in the coronaviruses) (residues 1657–1781); HD, hydrophobic domain (residues ∼2186–2237, 2930–3042, 3506–3616); 3CLSP, 3C-like serine protease (3129–3393); PLP, papain-like protease (1 824–1 989); CPD, cyclic phosphodiesterase (4272–4443). The ADRP domain is named after its cellular homologue. (B) Domains of the 1b Polyprotein. Shown in gray are the relative positions of the conserved domains identified in the pp1b (which, technically is only produced as a fusion protein (pp1a/1b), although only the pp1b portion is shown here for simplicity). RdRP, RNA-dependent RNA polymerase (residues 515–744); Zn Finger, Zn Finger domain containing conserved Cys and His residues—nucleic acid binding domain (residues 845–928); Helicase, viral helicase domain (residues 1 099–1 374); ExoN, 3′-5′ exonuclease (residues 1 414–1 609); XendoU, homologue of poly-U-specific endoribonuclease (residues 1918–1974); 2′-O-MT, S-adenosymethionine-dependent ribose 2′-O-methyltransferase. The ExoN, XendoU, and 2′-O-MT domains are named after their cellular homologues (Snijder et al., 2003).Analysis of the BoTV-1 ORF1a revealed several interesting features. This ORF encodes a protein of 4445 aa (pp1a), larger than the pp1a of the majority of the coronaviruses, with the sole exception of the 4470 aa MHVpp1a. The pp1a of the nidoviruses also contain conserved domains with functions important to viral replication. All nidoviruses studied to date contain at least one protease domain within pp1a, known as the ‘main’ protease or 3C-like protease (3CLP), after the picornaviruses 3C protease, and located within the C terminus of the pp1a protein (Ziebuhr et al., 2000). Although all of the nidoviruses possess this domain, the protease is slightly different in each family. The coronavirus 3CLP contains a cysteine as its catalytic nucleophile, but does not appear to contain an acidic residue common to the catalytic triad, His-Asp(Glu)-Cys, found in this family of proteases (Ziebuhr et al., 2000). The 3CLP of the gill associated virus (GAV), genus Oakavirus, family Roniviridae, is also a cysteine protease and like the coronavirus 3CLP, possesses a Cys-His catalytic dyad (Ziebuhr et al., 2003). Finally, the arterivirus main protease is referred to as a 3C-like serine protease (3CLSP) due to the presence of a serine nucleophile in the triad His-Asp-Ser, more common in the cellular homologues, such as the pancreatic enzyme chymotrypsin (Snijder et al., 1996). It was, therefore, surprising to find that the main protease of BoTV appears to be a serine protease, similar to the arterivirus 3CLSP. Repeated attempts at aligning the BoTV sequence with the coronavirus 3C protease failed to provide a convincing alignment. The results presented here were obtained with supporting data from PSI BLAST analysis, the protein sequence alignment shown in Fig. 5A, and secondary structure prediction analysis using the 3D-PSSM protein fold recognition (threading) server that identified the BoTV main protease sequence reported here as having trypsin-like or chymotrypsin-like protease folds.
Fig. 5
(A) Alignment of the putative BoTV-1 3C-like serine protease with the arterivirus 3C-like serine protease. In bold, the residues of the catalytic triad, H–E–S, and the threonine and histidine residues believed to be involved in substrate recognition; underlined, the glycine-X-histidine substrate-binding pocket ‘core’ motif, where X is any amino acid. “*” indicates fully conserved residues; “:” indicates that one of the following ‘strong’ groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. “.” indicates that one of the following ‘weaker’ groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, HFY. LDV, lactate dehydrogenase-elevating virus, strain neuro-virulent type C, GenBank accession no. NC002534; PRRSV, porcine reproductive and respiratory syndrome virus, GenBank accession no. NC00196; EAV, equine arteritis virus, strain Bucyrus, GenBank accession no. X53459. For each viral sequence the positions of the N- and C-terminal residues, within ORF1a, are given. (B) Alignment of the putative BoTV-1 papain-like protease with the picornavirus leader protease. In bold, the residues of the catalytic dyad, C–H. “*” indicates fully conserved residues; “:” indicates that one of the following ‘strong’ groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. “.” indicates that one of the following ‘weaker’ groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, HFY. FMDV, foot and mouth disease virus C, GenBank accession no. NC002554; ERAV, equine rhinitis A virus, GenBank accession no. NC003982. For each viral sequence, the positions of the N- and C-terminal residues, within ORF1a, are given.
(A) Alignment of the putative BoTV-1 3C-like serine protease with the arterivirus 3C-like serine protease. In bold, the residues of the catalytic triad, H–E–S, and the threonine and histidine residues believed to be involved in substrate recognition; underlined, the glycine-X-histidine substrate-binding pocket ‘core’ motif, where X is any amino acid. “*” indicates fully conserved residues; “:” indicates that one of the following ‘strong’ groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. “.” indicates that one of the following ‘weaker’ groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, HFY. LDV, lactate dehydrogenase-elevating virus, strain neuro-virulent type C, GenBank accession no. NC002534; PRRSV, porcine reproductive and respiratory syndrome virus, GenBank accession no. NC00196; EAV, equine arteritis virus, strain Bucyrus, GenBank accession no. X53459. For each viral sequence the positions of the N- and C-terminal residues, within ORF1a, are given. (B) Alignment of the putative BoTV-1 papain-like protease with the picornavirus leader protease. In bold, the residues of the catalytic dyad, C–H. “*” indicates fully conserved residues; “:” indicates that one of the following ‘strong’ groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. “.” indicates that one of the following ‘weaker’ groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, HFY. FMDV, foot and mouth disease virus C, GenBank accession no. NC002554; ERAV, equine rhinitis A virus, GenBank accession no. NC003982. For each viral sequence, the positions of the N- and C-terminal residues, within ORF1a, are given.In addition to the main protease, the nidoviruses, with the exception of GAV (Cowley et al., 2000), contain one or more ‘accessory’ protease(s) in the N-terminus of pp1a (Lai and Cavanagh, 1997, Snijder and Meulenberg, 1998). In both the coronaviruses and arteriviruses this protease is a cysteine protease with homology to papain, referred to as the papain-like protease (PLP) (Ziebuhr et al., 2000). Like the SARS human coronavirus (SARS-CoV), and IBV, the torovirus PLP is present as a single domain and although it has homology with the coronavirus PL2pro, the torovirus PLP is more similar in size and has a greater degree of identity with the leader protease (Lpro) of the picornaviruses foot and mouth disease virus (FMDV) and equine rhinitis A virus (ERAV). This is supported by PSI BLAST analysis, protein fold prediction suggesting folds that are similar to the FMDV leader protease, and the alignment shown in Fig. 5B.Hydrophobic domains are additional conserved domains that are common to the pp1a of all nidoviruses. These domains are conserved in the relative positions within the pp1a protein of all of the nidoviruses. The hydrophobic domains identified in BoTV maintain this configuration (Fig. 4A).Also, a domain was identified in pp1a of BoTV-1 which has homology to the adenosine diphosphate-ribose 1″-phosphatase (ADRP) processing enzyme family mediating the removal of a phosphate from the adenosine diphosphate ribose 1″-phosphate substrate in the cellular RNA processing pathway (Martzen et al., 1999). Although this domain is present in other ssRNA viruses such as the alphaviruses, and in bacterial and eukaryotic proteins, the Coronaviridae are the only other nidoviruses to date to contain this domain (Snijder et al., 2003).Sequence analysis of pp1a revealed that the 3′ end of this protein displayed similarity to the non-structural protein 2a, found in coronaviruses such as MHV and HCoV-OC43. The same region of EqTV's pp1a also contains similarity to this protein (Snijder et al., 1991). Recently, this domain has been identified as a putative cyclic phosphodiesterase (Snijder et al., 2003).
Phylogeny of the Nidovirales
Composition vector trees (CVTRees) are a new method of inferring phylogeny using complete genome sequences. A distance matrix is generated from comparison of the predicted proteomes of the viruses, calculating the frequency of short amino acid sequences of a fixed length (K-tuples of amino acids, K being a parameter that can be adjusted in each analysis). Phylogenetic trees can then be built from the distance matrix using standard methods such as the Neighbor Joining (NJ) or Minimum Evolution (ME) methods (Chu et al., 2004, Qi et al., 2004a, Qi et al., 2004b). This method circumvents the difficult, and often impossible task of making alignments of distantly related sequences, and analyzes the entire genomes rather than sections which may have evolved separately. Since we report here the first complete torovirus sequence it is the first time that such an analysis can be done with representatives of all genera in the order Nidovirales (Fig. 6
). The trees in Fig. 6, calculated using a value of K
= 5, illustrate a topology only, since calibration of branch lengths in CVTrees present several complications (Qi et al., 2004a, Qi et al., 2004b). Nonetheless, interesting inferences on evolutionary relatedness can be drawn from the topology alone. It is noteworthy to observe that whereas the arteriviruses and the coronaviruses each form a cluster in both trees, BoTV-1 is quite distinct from both clusters in these trees.
Fig. 6
CVTRees with representative viruses within Nidovirales. (A) Tree built with the Neighbor Joining method. (B) Tree built with the Minimum Evolution method (K = 5 for both trees). Torovirus: Breda virus 1 (BoTV-1): accession no. AY427798. Coronavirus: SARS associated coronavirus (SARS-HCoV): NC_004718; human coronavirus OC43 (HCoV-OC43): NC_005147; human coronavirus NL63 (HCoV-NL63): AY567487; transmissible gastroenteritis virus (TGEV): NC_002306; infectious bronchitis virus (IBV): NC_001451. Arterivirus: equine arteritis virus (EAV): NC_002532; porcine reproductive and respiratory syndrome virus (PRRSV): NC_001961; simian hemorrhagic fever virus (SHFV): NC_003092; lactate-dehydrogenase elevating virus (LDV): NC_002534. Roniviridae: gill associated virus (GAV), genus Oakavirus: AY039647 and AF227196.
CVTRees with representative viruses within Nidovirales. (A) Tree built with the Neighbor Joining method. (B) Tree built with the Minimum Evolution method (K = 5 for both trees). Torovirus: Breda virus 1 (BoTV-1): accession no. AY427798. Coronavirus: SARS associated coronavirus (SARS-HCoV): NC_004718; human coronavirus OC43 (HCoV-OC43): NC_005147; human coronavirus NL63 (HCoV-NL63): AY567487; transmissible gastroenteritis virus (TGEV): NC_002306; infectious bronchitis virus (IBV): NC_001451. Arterivirus: equine arteritis virus (EAV): NC_002532; porcine reproductive and respiratory syndrome virus (PRRSV): NC_001961; simian hemorrhagic fever virus (SHFV): NC_003092; lactate-dehydrogenase elevating virus (LDV): NC_002534. Roniviridae: gill associated virus (GAV), genus Oakavirus: AY039647 and AF227196.
Discussion
Despite the evidence suggesting that toroviruses infect many different species, most of our understanding of these viruses at the biochemical and antigenic levels comes from studies of EqTV and BoTV (Koopmans and Horzinek, 1994). Our knowledge of their molecular biology has come mainly from studies using EqTV grown in cell culture. However, a complete torovirus genome sequence or ORF1a sequence has yet to be reported. This has hampered detailed characterization of this genus in terms of molecular biology, evolution, and taxonomy.Long RT-PCR followed by direct sequencing of the amplicons had been used previously to amplify and sequence the 7.5 kb of the 3′ end of the BoTV-1 genome encompassing the structural genes and 3′ non-coding region (Duckmanton et al., 1998a, Duckmanton et al., 1998b). The limited BoTV sequence available for primer design posed a challenge, which was overcome by the use of available sequence from related viruses such as EqTV and the coronaviruses. Using this approach almost 21 kb of novel BoTV-1 sequence was obtained without the benefit of cultured virus. With the availability of the complete genome sequence of BoTV-1, and with more than half of the EqTV genome sequence reported, future studies involving the molecular characterization of other toroviruses should be greatly facilitated. For example, the identification and characterization of the human coronavirus responsible for the spread of severe acute respiratory syndrome (SARS) was successful, in part, due to the use of RT-PCR assays targeting conserved domains within ORF1b of the viral replicase gene (Drosten et al., 2003, Ksiazek et al., 2003, Lee et al., 2003, Poutanen et al., 2003).The 28,475 nts genome is reported here with confidence that the sequence is complete. The methods used to amplify the extremities of the viral genome ensured that the extreme 5′ and 3′ nucleotides were internal to the primers used in RT-PCR amplifying the respective areas of the genome. The sequence of first nucleotide at the 5′ end of the viral genome was scrutinized further by performing two separate 5′ RACE reactions using two different nucleotides in the tailing reactions. The sequences of the two 5′ RACE amplicons could then be compared and the first nucleotide of the genome could be distinguished from the poly-nucleotide tail.The sequence of the structural genes and 3′ non-coding region reported here is in agreement with that previously reported for BoTV-1 (Duckmanton et al., 1998a, Duckmanton et al., 1998b). There are a small number of discrepancies which may be due to the different sequencing methodologies used (automated sequencing was used in this study and ought to be more accurate). The differences may also point to a quasispecies in BoTV-1, which is not unlikely since it is a RNA virus, and differences in the sequencing data may arise through use of different sequencing primers. In that regard, it is remarkable that most differences occur at isolated nucleotides except for a region of 217 nts in the putative bulbous domain (S1) of the spike glycoprotein. This is highly reminiscent of a well-characterized hypervariable region in the S1 domain of the spike glycoprotein of the murine hepatitis virus (Lai and Cavanagh, 1997, Rowe et al., 1997).Comparison of the sequences of the structural genes and 3′ non-coding regions of BoTV and EqTV previously showed that the two viruses share a relatively high degree of sequence identity (Duckmanton et al., 1998a, Duckmanton et al., 1998b). Comparison of the ORF1b sequences of EqTV and BoTV further supports these findings with a nucleotide identity of 83.3% and amino acid identity of 91% and an identically sized ORF of 6876 nts. Thus, it was not surprising that the same conserved domains identified in BEV were found in BoTV pp1b including the newly identified domains with homology to cellular proteins involved in RNA processing, with potential roles in viral replication identified by Snijder et al. (2003).The identification of two potential pseudoknot structures in the ORF1a/1b overlap region of BoTV that are virtually identical to the structures identified in EqTV (Snijder et al., 1990) reaffirms that one or both of these structures are important in the translation of ORF1b, although experimental data will be required to confirm these predictions in BoTV.Pp1a of BoTV contains 3C-like Ser protease (3CLSP) and papain-like Cys protease (PLP) motifs that are most likely involved in the processing of the viral replicase polyprotein. The finding of a 3C-like protease with a Ser catalytic nucleophile, and substrate binding residues that more closely resembled the arterivirus 3CLSP (Snijder et al., 1996) was surprising since toroviruses are classified as members of the family Coronaviridae. Although a PLP motif with similarity to the coronavirus PL2pro domain was identified, sequence alignment and protein structure prediction analysis support the hypothesis that the BoTV PLP is more similar to the leader protease (Lpro) of the picornaviruses FMDV and ERAV. The coronavirus PL2pro is the largest protease in this family of enzymes due to a unique zinc finger structure that connects the N-terminal alpha-helical domain to the C-terminal beta-sheet domain of the protease (Herold et al., 1999) In contrast, the N- and C-terminal domains of the picornavirus protease are joined by a short beta-strand structure (Guarne et al., 1998), which may also be the case for the torovirus protease given the size similarity.Consistent with the nidovirus organization of domains within pp1a, several hydrophobic domains were located in the BoTV pp1a sequence. The conservation of these domains within the nidoviruses further emphasizes their significance. The hydrophobic domains in the arteriviruses have been shown to dramatically alter the host-cell membrane architecture in addition to mediating the association of the replication complex with cellular membranes (Pedersen et al., 1999, van der Meer et al., 1998). A similar role can be conjectured in the toroviruses.Within Nidovirales, the ADRP domain identified in pp1a has so far only been identified within viruses of the family Coronaviridae. This domain is within the same relative position in both the torovirus and coronaviruspp1a. This may suggest that an ancestor common to both toroviruses and coronaviruses acquired this domain after the other nidoviruses had split from this lineage, and before the torovirus–coronavirus split.The presence of a region within the 3′ end of the torovirus pp1a with sequence similarity to the coronavirus non-structural protein 2a has been interpreted as possible evidence for a nonhomologous recombination event, perhaps between an ancestral torovirus and a coronavirus during the co-infection of a cell (Snijder et al., 1991). Replication of these sequence results in BoTV further supports this hypothesis.Previously published phylogenetic analysis based on the RdRp and helicase domains of several members of the Nidovirales including one torovirus (EqTV) suggested a rather large evolutionary distance between EqTV and the coronaviruses (González et al., 2003). Taking advantage of the BoTV sequence in this study, a new alignment of the RdRp protein sequences was done (data not shown) and the dendrogram obtained was highly similar to that obtained by González et al. (2003) but showing the two toroviruses, EqTV and BoTV, clustering together. However, the exact boundaries of the RdRp domains were difficult to establish for many sequences used in the alignment. The availability of the first complete torovirus sequence enabled the establishment of CVTRees with representatives from of all genera within Nidovirales (Fig. 6). The topology of the trees shows that whereas the coronaviruses and the arteriviruses form distinct clusters, in both trees the BoTV-1 sequence stands apart from these clusters. Presently, Coronavirus and Torovirus are two genera within the family Coronaviridae. Viral taxonomy currently uses a polythetic approach which considers numerous properties simultaneously (Condit, 2001). Based on their phylogenetic analysis, González et al. (2003) proposed that the differences between coronaviruses and toroviruses are large enough that the two genera be elevated to subfamilies or even to families within Nidovirales. The data presented here support further taxonomic separation between the genera Coronavirus and Torovirus.The completion of the genome sequence of BoTV-1 will contribute to our understanding of the toroviruses at the molecular level. This sequence, combined with the findings obtained from studies on EqTV, is expected to aid in the development of sensitive molecular diagnostic assays, such as RT-PCR targeting conserved regions of the torovirus genome. With the development of such tests, the less well-characterized toroviruses such as the human torovirus, as well any newly identified toroviruses will perhaps be more readily identified and better characterized.
Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971
Authors: Nelson Lee; David Hui; Alan Wu; Paul Chan; Peter Cameron; Gavin M Joynt; Anil Ahuja; Man Yee Yung; C B Leung; K F To; S F Lui; C C Szeto; Sydney Chung; Joseph J Y Sung Journal: N Engl J Med Date: 2003-04-07 Impact factor: 91.245
Authors: Armando E Hoet; Kyoung-Oh Cho; Kyeong-Ok Chang; Steven C Loerch; Thomas E Wittum; Linda J Saif Journal: Am J Vet Res Date: 2002-03 Impact factor: 1.156
Authors: Armando E Hoet; Jeffrey Smiley; Christopher Thomas; Paul R Nielsen; Thomas E Wittum; Linda J Saif Journal: Am J Vet Res Date: 2003-04 Impact factor: 1.156
Authors: Jessica K Roth-Cross; Helen Stokes; Guohui Chang; Ming Ming Chua; Volker Thiel; Susan R Weiss; Alexander E Gorbalenya; Stuart G Siddell Journal: J Virol Date: 2009-01-28 Impact factor: 5.103
Authors: Rafal Tokarz; Stephen Sameroff; Richard A Hesse; Ben M Hause; Aaloki Desai; Komal Jain; W Ian Lipkin Journal: J Gen Virol Date: 2015-04-27 Impact factor: 3.891