Phylogenetic analysis of SARS coronavirus isolates based on the spike gene and protein sequence using Neighbor-Joining, maximum likelihood and Bayesian inference methods indicated that a recent human SARS-CoV isolate was closer to some human SARS-CoV isolates from earlier epidemic phase than to the SARS-CoV-like viruses isolated from wild animals during previous epidemic phase. A reasonable judgment based on phylogenetic relationship and sequence variations it is likely that the recent human SARS-CoV isolate is closer to an unknown SARS-CoV predecessor.
Phylogenetic analysis of SARS coronavirus isolates based on the spike gene and protein sequence using Neighbor-Joining, maximum likelihood and Bayesian inference methods indicated that a recent humanSARS-CoV isolate was closer to some humanSARS-CoV isolates from earlier epidemic phase than to the SARS-CoV-like viruses isolated from wild animals during previous epidemic phase. A reasonable judgment based on phylogenetic relationship and sequence variations it is likely that the recent humanSARS-CoV isolate is closer to an unknown SARS-CoV predecessor.
SARS coronavirus (SARS-CoV) phylogeny and genotyping studies have processed since SARS emergence (Ruan et al., 2003, Tsui et al., 2003, Zhao et al., 2004). Genotype C and T were first suggested at the end of May 2003 (Li et al., 2003), and were further improved and named as the Yexin genotype and Xiaohong genotype (Wang et al., 2004). SARS virus was supposed to be transmitted from the wild animal to human being. This hypothesis was then supported by identification of a SARS-CoV-like virus in wild animals, such as palm civet, sold in markets in south China, for it had more than 99% of sequence identity to the SARS-CoV (Guan et al., 2003), and indicating that the virus could have recently transferred its hosts, from animals to human beings. However, recent reports indicated that SARS-CoV was distinct from the virus in palm civet and no direct evidence so far to demonstrate if the palm civet virus was the origin of the SARS-CoV or if palm civets were also infected from other species (Stadler et al., 2003). Although unlikely, the possibility that humansinfected these SARS-CoV positive animals cannot be formally excluded, and there was a report of SARS-CoV transmitted from human to pig (Chen et al., 2005). Where is the SARS-CoV-like virus of palm civet in the chain? Are they getting it from another animal? Are palm civets infecting rodents as well as humans? These still are not known exactly.Here, the evolutional relationship among the previous epidemic and newly occurred (WHO, 2004) (at the end of 2003) SARS-CoVs, and the previous epidemic SARS-CoV-like viruses of animal source are analyzed.
Materials and methods
The complete spike glycoprotein gene sequences of SARS-CoVs or SARS-CoV-like viruses download from NCBI GenBank database. GenBank accession numbers see Table 1
. Multiple sequences of nucleic acid or amino acid were aligned by ClustalW 1.83 (Thompson et al., 1994). Phylogenetic trees were constructed by MEGA3.1 (Kumar et al., 2004) (Neighbor-Joining (NJ) for gene and protein sequences), PAUP* 4.0b2 (Swofford, 2002) (Maximum likelihood for gene sequences) and MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001) (Bayesian inference for gene and protein sequences). The FIPV-X06170 (Feline infectious peritonitis virus) was used as an outgroup within the spike gene data set. SARS-CoV and FIPV are known to be highly identical throughout the spike gene sequence (Stavrinides and Guttman, 2004).
Table 1
The variant locations and substitution types in the spike protein sequences
The variant locations and substitution types in the spike protein sequences
Results
Neighbor-Joining (NJ) trees of the spike gene sequences (Fig. 1
) indicated that the SARS-CoV (GD03T0013) of newly occurred case is closer to the humanSARS-CoVs detected in the previous epidemic early phase (such as GZ02, GZ-B and BJ02, etc) than to the palm civet or raccoon dogSARS-CoV-like viruses (SZ1, SZ3, SZ13 and SZ16) detected in the previous epidemic. The p-distances of new isolate (GD03T0013) with GZ02 and GZ-B isolates are smaller than with SZ3 isolate (Fig. 1). The similar results of gene sequence analysis were obtained from other models of phylogenetic trees such as Bayesian inference and Maximum likelihood methods (Fig. 2
). The phylogenetic trees of the spike protein sequences were also showed similar characteristics as above (Fig. 3
). The variant locations in the spike protein and the substitution types occurring in these isolates were markedly different between the humanSARS-CoVs and the animal-origin SARS-CoV-like viruses comparing with the new isolate, the former has 5–7 mutual variant locations, the latter 8–9 mutual variant locations (Table 1).
Fig. 1
NJ trees of the newly occurred, animal-origin and previous epidemic SARS-CoVs. Red hollow circle indicates genotype T isolate, red solid circle indicates genotype C isolate, both from human in previous epidemic. Yellow solid triangle indicates animal-origin virus in previous epidemic. Blue solid circle indicates newly occurred virus. Purple hollow triangle indicates an out-group. Left tree: the tree was constructed using the p-distance of nucleotide difference. Bootstrap = 5000. The length indicated number of nucleotide difference per site of the spike gene. Right tree: the topologic tree was constructed using the p-distance of nucleotide difference. Bootstrap = 5000. Below table: *the genetic distance between SARS-CoV GD03T0013 and three groups. Here, genotype C composed of 43 isolates, genotype T composed of 76 isolates, animal-origin virus composed of SARS-CoV SZ1, SZ3, SZ13 and SZ16.
Fig. 2
Other model phylogenetic trees of the spike protein gene sequences, Bayesian inference (Bayes) tree was conducted with MrBayes 3.1.2 using the GTR model with 60,000 generations, sampling trees every 10th generation, and calculating a consensus tree with 25% alltrees, branch lengths indicating gamma-distributed rate variation across sites and a proportion of invariable sites, the probability of the partition indicated by the branch. Maximum likelihood (ML) tree was tested by the program MODELTEST 3.7 (Posada and Crandall, 1998) to find out the best model and parameters for PAUP* 4.0b2 to build maximum likelihood tree, branch lengths indicating substitutions/site. Confidence in ML tree was determined by analyzing 300 bootstrap replicates.
Fig. 3
Phylogenetic trees of the spike protein sequences, Neighbor-Joining (NJ) tree was conducted with MEGA 3.1 using the JTT mode, the confidence in NJ tree was determined by analyzing 1000 bootstrap replicates, the branch length indicating substitutions/site. Bayesian inference (Bayes) tree was conducted with MrBayes 3.1.2 using the JTT model with 50,000 generations, sampling trees every 10th generation, and calculating a consensus tree with 25% alltrees, branch lengths indicated substitutions/site, the probability of the partition indicated by the branch.
NJ trees of the newly occurred, animal-origin and previous epidemic SARS-CoVs. Red hollow circle indicates genotype T isolate, red solid circle indicates genotype C isolate, both from human in previous epidemic. Yellow solid triangle indicates animal-origin virus in previous epidemic. Blue solid circle indicates newly occurred virus. Purple hollow triangle indicates an out-group. Left tree: the tree was constructed using the p-distance of nucleotide difference. Bootstrap = 5000. The length indicated number of nucleotide difference per site of the spike gene. Right tree: the topologic tree was constructed using the p-distance of nucleotide difference. Bootstrap = 5000. Below table: *the genetic distance between SARS-CoVGD03T0013 and three groups. Here, genotype C composed of 43 isolates, genotype T composed of 76 isolates, animal-origin virus composed of SARS-CoVSZ1, SZ3, SZ13 and SZ16.Other model phylogenetic trees of the spike protein gene sequences, Bayesian inference (Bayes) tree was conducted with MrBayes 3.1.2 using the GTR model with 60,000 generations, sampling trees every 10th generation, and calculating a consensus tree with 25% alltrees, branch lengths indicating gamma-distributed rate variation across sites and a proportion of invariable sites, the probability of the partition indicated by the branch. Maximum likelihood (ML) tree was tested by the program MODELTEST 3.7 (Posada and Crandall, 1998) to find out the best model and parameters for PAUP* 4.0b2 to build maximum likelihood tree, branch lengths indicating substitutions/site. Confidence in ML tree was determined by analyzing 300 bootstrap replicates.Phylogenetic trees of the spike protein sequences, Neighbor-Joining (NJ) tree was conducted with MEGA 3.1 using the JTT mode, the confidence in NJ tree was determined by analyzing 1000 bootstrap replicates, the branch length indicating substitutions/site. Bayesian inference (Bayes) tree was conducted with MrBayes 3.1.2 using the JTT model with 50,000 generations, sampling trees every 10th generation, and calculating a consensus tree with 25% alltrees, branch lengths indicated substitutions/site, the probability of the partition indicated by the branch.
Discussion
Our analysis is markedly different from a conclusion of previous report in SCIENCE journal (Zhao et al., 2004) (They claim that phylogenetic analysis of this S gene sequence with those from the humanSARS-CoV and palm civet SARS-like coronavirus indicated that this most recent case of SARS-CoV (GD03T0013) is much closer to the palm civet SARS-like coronavirus than to any humanSARS-CoV detected in the previous epidemic). Their conclusion was cited by lately reports (Song et al., 2005, Wu et al., 2004). Here, our opinion, the phylogenetic relationship should be cautiously interpreted.In this context, as a reasonable judgment based on phylogenetic relationship and sequence variations it is likely that the recent humanSARS-CoV isolate is closer to an unknown SARS-CoV predecessor than the SARS-CoVs from human or SARS-CoV-like viruses from palm civet both detected in the previous epidemic.
Authors: Y Guan; B J Zheng; Y Q He; X L Liu; Z X Zhuang; C L Cheung; S W Luo; P H Li; L J Zhang; Y J Guan; K M Butt; K L Wong; K W Chan; W Lim; K F Shortridge; K Y Yuen; J S M Peiris; L L M Poon Journal: Science Date: 2003-09-04 Impact factor: 47.728