Literature DB >> 15019585

Evidence from the evolutionary analysis of nucleotide sequences for a recombinant history of SARS-CoV.

Michael J Stanhope1, James R Brown, Heather Amrine-Madsen.   

Abstract

The origins and evolutionary history of the Severe Acute Respiratory Syndrome (SARS) coronavirus (SARS-CoV) remain an issue of uncertainty and debate. Based on evolutionary analyses of coronavirus DNA sequences, encompassing an approximately 13kb stretch of the SARS-TOR2 genome, we provide evidence that SARS-CoV has a recombinant history with lineages of types I and III coronavirus. We identified a minimum of five recombinant regions ranging from 83 to 863bp in length and including the polymerase, nsp9, nsp10, and nsp14. Our results are consistent with a hypothesis of viral host jumping events, concomitant with the reassortment of bird and mammalian coronaviruses, a scenario analogous to earlier outbreaks of influenzae.

Entities:  

Mesh:

Year:  2004        PMID: 15019585      PMCID: PMC7128439          DOI: 10.1016/j.meegid.2003.10.001

Source DB:  PubMed          Journal:  Infect Genet Evol        ISSN: 1567-1348            Impact factor:   3.342


Introduction

Recently, healthcare institutions around the world, particularly in Asia and Canada, have been forcibly challenged to respond to sudden outbreaks of Severe Acute Respiratory Syndrome (SARS). SARS is a highly communicable, and often lethal, illness thought to be caused by a novel type of coronavirus (Fouchier et al., 2003, Ksiazek et al., 2003, Kuiken et al., 2003), a group of positive, single-stranded RNA viruses known to infect domestic birds and mammals, including humans. The origin of the SARS coronavirus (SARS-CoV) has been the subject of much speculation. One of the leading hypotheses is that SARS-CoV is a hybrid strain (Enserink, 2003), since there are reports of recombination in avian coronaviruses (Lee and Jackwood, 2000), however, until a recent report in this journal (Rest and Mindell, 2003), there was no evidence that SARS-CoV is a recombinant. Our analysis of this question, completed at the time of publication of the Rest and Mindell paper, differs from their work in the choice of methods, the extent of the genome analyzed, taxon sampling, and in the analysis of nucleotides rather than amino acids. Our results act to both corroborate and extend their findings, adding further support to the idea that SARS has had a recombinant history involving different coronavirus lineages and suggest the possibility that the genome could have arisen through a combination of host jumping and recombination events in a manner analogous to previous outbreaks of influenzae (Gregory et al., 2003, Zhou et al., 1999).

Materials and methods

DNA sequence alignments

Many of the molecular evolutionary methods for detection of recombination events involve the analysis of multiple DNA sequence alignments. In choosing coronavirus sequences for our analyses, we made an effort to maximize both genetic diversity of the different coronavirus variants, as well as the length of possible contiguous comparative data (i.e. in excess of 20 kb). We aligned (ClustalW; Thompson et al., 1994) a large portion of the SARS virus TOR2 strain, at the DNA sequence level, between positions 7349–20969, to other coronaviruses from previously designated groups I, II, and III (Ksiazek et al., 2003, Marra et al., 2003, Rota et al., 2003). At the time of manuscript submission, there were 36 complete, or nearly complete, genomes of SARS virus available, all of which were highly similar at the DNA sequence level, thus strain selection does not affect the results of our analyses. The DNA sequence alignments within this region had a few segments which could not be reliably aligned, and thus were excluded from our analyses. This resulted in 13 separate DNA alignments, which ranged in length from 245 to 3785 bp. Within each of these sub-alignments, any further ambiguous regions were deleted before recombination detection analyses. This was performed in a highly conservative manner, such that not only did we remove any and all remotely ambiguous gaps, but the regions surrounding the gaps were additionally excluded up to areas of clearly anchored sequence alignment (identical or virtually identical stretches of sequence) flanking either side of the gap (alignments available upon request).

Recombination detection

We used the recombination detection program PLATO (Grassly and Holmes, 1997) which employs a maximum likelihood (ML) approach to demarcate the boundaries of anomalous evolving regions in a DNA sequence alignment, with statistical measures of confidence. PLATO has a phylogenetic basis, and such methods have been shown to be somewhat less powerful than substitution distribution methods, in the sense that they are less able to identify more subtle examples of recombination (Posada et al., 2002, Posada and Crandall, 2001). However, this in turn means that such approaches are also more conservative in their overall assessment, and indeed phylogenetic methods can only detect recombination events that change the topology (Posada et al., 2002, Posada and Crandall, 2001). Importantly, the propensity for most recombination detection programs, including PLATO, to detect false positives appears to be low (Posada et al., 2002, Posada and Crandall, 2001). PLATO was used to assess possible recombinant regions for each of the 13 alignments, employing parameters of an HKY model of sequence evolution, five steps for the sliding window, and 1000 replications of Monte Carlo simulation. To add a further level of conservative assessment to our recombination detection, phylogenetic analyses were performed on all partitions identified by PLATO, the putative non-recombinant portions of such alignments, as well as all the remaining alignments. For all of these phylogenetic analyses, the best fitting model of sequence evolution and the corresponding values for the rate matrix, shape of the gamma distribution, and proportion of invariant sites were estimated by the program MODELTEST (Posada and Crandall, 1998). The evolutionary history of each region was compared to the control phylogeny, which was based on a concatenation of the 13 alignments. This control topology was the same as that derived from the concatenated non-recombinant sequence portions. A region was concluded as a SARS-CoV recombinant when all, or at least the majority (for shorter sequences), of phylogenetic methods agreed in their convincing placement of SARS-CoV in an alternative position to that of the control phylogeny. Phylogenies were reconstructed using Bayesian (Huelsenbeck and Ronquist, 2001), maximum likelihood, neighbor joining (NJ, log det distances) and maximum parsimony methods, implemented in PAUP* 4.0b (Swofford, 2002). For ML, starting trees were obtained via neighbor joining and for parsimony analyses addition sequence was employed with 10 random input orders. Tree-bisection reconnection (TBR) was the branch-swapping algorithm used in all analyses. Gaps were coded as missing data in all analyses. Bootstrap support values were obtained with 1000 replicates for maximum parsimony and neighbor joining analyses and 100 replicates for ML. Bayesian analyses were performed using Mr. Bayes (Huelsenbeck and Ronquist, 2001) with 500,000 generations, sampling frequency every 100 generations, four Markov chains, random starting trees, and a burn-in of 100,000 generations. The PLATO results were corroborated using split decomposition analysis (program SplitsTree; Huson, 1998) and bootscanning (Salminen et al., 1995) (program BOOTSCAN within the SimPlot package). Instances identified by PLATO as possible SARS-CoV recombinants were similarly identified by SplitsTree and bootscanning.

Results and discussion

In the unrooted control phylogeny, SARS-CoV branches, with convincing support, along the lineage leading to group II coronaviruses (Fig. 1a ), which is in agreement with previous reports (Ksiazek et al., 2003, Marra et al., 2003, Rota et al., 2003). The long branch separating SARS-TOR2 from the group II coronaviruses, in comparison to the branch lengths separating the various group II representatives, is in general agreement with earlier opinions for SARS-CoV as a new, fourth group of coronaviruses (Marra et al., 2003, Rota et al., 2003), and contrary to Snijder et al. (2003) who suggest, based on analysis of replicase ORF1b, that SARS-CoV is more aptly considered a distant member of group II. For the individual alignments the models of sequence evolution identified by MODELTEST were GTR+gamma (alignments corresponding with TOR2 coordinates: 10,645–10,902; 12,613–13,344; 13,725–14,147; 20,100–20,984; and recombinant regions: 15,259–15,342; 19,577–19,862), GTR+invariants (9982–10,125; 13,392–13,610), GTR+gamma+invariants (7366–7710; 10,147–10,626; 11,554–11,973; 11,989–12,516; 18,117–18,980; 14,172–17,936; 19,065–19,871), or HKY+gamma (recombinant region: 15,974–16,108).
Fig. 1

Examples and summary of recombinational analyses. Sequence identifications are as follows: 229E (human): AF304460; PEDV, porcine epidemic diarrhea virus: AF353511; TGEV, porcine transmissible gastroenteritis virus: AJ271965; AIBV-LX4, Avian infectious bronchitis virus-LX4: AY223860; AIBV, Avian infectious bronchitis virus: M95169; SARS-TOR2: AY274119; MHV-ML10, Murine hepatitis virus-ML-10: AF208067; MHV, Murine hepatitis virus: M55148; MHV-2, Murine hepatitis virus strain 2: AF201929; BCov-Quebec, Bovine coronavirus Quebec: AF220295; BCov-LUN, Bovine coronavirus-LUN: AF391542. (a) the control topology with Bayesian (Huelsenbeck and Ronquist, 2001) posterior probabilities (1.0 for all nodes) and ML (Swofford, 2002) bootstrap values; branch lengths drawn proportional to the amount of sequence change. (b) a tree resulting from one of the PLATO detected anomalous zones, implicating a recombination event involving SARS-CoV and the group III lineage; ML bootstrap and Bayesian posterior probabilities are indicated for both recombination events involving SARS-CoV with the group III lineage, corresponding with the red numbers in d. (c) A tree resulting from a recombinational zone implicating genetic exchange involving SARS-CoV and the group I lineage; ML bootstrap and Bayesian posterior probabilities are indicated for all three recombination events involving SARS-CoV with the group I lineage, corresponding with the blue numbers in (d). (d) A schematic of the recombination and non-recombination events identified in the SARS-TOR2 genome between position 7349 and 20,969.

Examples and summary of recombinational analyses. Sequence identifications are as follows: 229E (human): AF304460; PEDV, porcine epidemic diarrhea virus: AF353511; TGEV, porcine transmissible gastroenteritis virus: AJ271965; AIBV-LX4, Avian infectious bronchitis virus-LX4: AY223860; AIBV, Avian infectious bronchitis virus: M95169; SARS-TOR2: AY274119; MHV-ML10, Murine hepatitis virus-ML-10: AF208067; MHV, Murine hepatitis virus: M55148; MHV-2, Murine hepatitis virus strain 2: AF201929; BCov-Quebec, Bovine coronavirus Quebec: AF220295; BCov-LUN, Bovine coronavirus-LUN: AF391542. (a) the control topology with Bayesian (Huelsenbeck and Ronquist, 2001) posterior probabilities (1.0 for all nodes) and ML (Swofford, 2002) bootstrap values; branch lengths drawn proportional to the amount of sequence change. (b) a tree resulting from one of the PLATO detected anomalous zones, implicating a recombination event involving SARS-CoV and the group III lineage; ML bootstrap and Bayesian posterior probabilities are indicated for both recombination events involving SARS-CoV with the group III lineage, corresponding with the red numbers in d. (c) A tree resulting from a recombinational zone implicating genetic exchange involving SARS-CoV and the group I lineage; ML bootstrap and Bayesian posterior probabilities are indicated for all three recombination events involving SARS-CoV with the group I lineage, corresponding with the blue numbers in (d). (d) A schematic of the recombination and non-recombination events identified in the SARS-TOR2 genome between position 7349 and 20,969. Under our recombination criteria, several regions of recombination were evident, involving two alternative positions of SARS-CoV (Fig. 1b and c). These two branching arrangements were SARS-CoV on the branch leading to group III viruses (avian) or as sister lineage to the group I clade (porcine, human, etc.). PLATO identified anomalous regions included 15,259–15,342 (Z value of 5.0666; Z values greater than 3.8896 judged to be significant), 15,974–16,108 (Z value of 4.3997; Z values greater than 3.8896 judged to be significant), and 19,577–19,862 (Z value of 6.1619; Z values greater than 3.6471 judged to be significant). Phylogenetic analysis of 15,259–15,342 supported SARS-CoV with group III (Fig. 1b), whereas 15,974–16,108 supported SARS-CoV with group I (Fig. 1c). Phylogenetic analysis of the third putative recombinant region identified by PLATO (i.e. 19,577–19,862; Fig. 1d), proved inconclusive, with ML and Bayes supporting SARS-CoV with group I, and parsimony and NJ yielding the control topology (bootstrap support under 60%, and Bayesian posterior probability less than 0.50). Three further recombinant regions were identified by phylogenetic analysis, that did not yield significant PLATO results, simply because the entire (or very nearly) alignment appears to represent a recombinant zone (i.e. nothing for PLATO to identify as anomalous; Fig. 1d). Mutational saturation at synonymous positions of codons can be ruled out as a possible explanation for the alternative branching arrangements of these five (possibly six) recombinant zones, because phylogenies for these same regions based on alignments that exclude third codon positions, as well as amino acid sequences, yielded identical topologies. The resulting genomic picture suggests a complex evolutionary history of recombination involving SARS-CoV (Fig. 1d). The placement of SARS-CoV on the branches leading to groups I or III and not nested within these groups indicates that either the recombination events are ancient in nature or the donor species are not present in currently available sequence data. The inclusion of greater host species representation, which is presently possible for a few regions of the genome, such as a 922 bp region of polymerase (for which there are additional GenBank sequences from cat, dog—group I; turkey—group III; human OC43, porcine—group II) (Stephensen et al., 1999), did not allow a more specific identification of the possible species involved, and implicated the same recombination event between positions 15,259–15,342 (Fig. 1d). Two recent reports regarding the SARS genome suggest, based on analysis of amino acid sequences, that there is either no evidence for recombination (Rota et al., 2003) or no evidence for recent recombination involving other coronaviruses (Marra et al., 2003). Although the methodological details regarding recombination detection are scant in both these reports, we gather that in the one case they came to this conclusion by comparing branching arrangements between gene trees (Marra et al., 2003), and in the other case by performing an amino acid similarity plot (Rota et al., 2003). In the first case, a comparison of gene trees would not pick up recombination events that crossed gene boundaries, or which involved relatively short stretches of sequence within a gene. In the second instance, similarity plots will only tend to pick up recombination events in comparisons that involved the actual donor, a close relative to the donor, and/or a recent event. In contrast, our analysis agrees with Rest and Mindell (2003) in identifying recombination in RDRP (RNA dependent RNA polymerase), although our approach tends to suggest more specific break-points, and a larger number of smaller recombinant regions than does their analysis (three regions in RDRP: 13,392–13,610; 15,259–15,342; 15,974–16,108, based on TOR2 coordinates). We also identified several additional recombinant regions in the SARS-CoV genome, encompassing regions not analyzed by Rest and Mindell, including: 12,613–13,344 including all of nsp9 and most of nsp10 and 18,117–18,980 of nsp14. Analyses of currently available sequences of coronaviruses, yields the conclusion that group III is exclusively composed of avian coronaviruses, while groups I and II have viruses isolated from pig, human, murine rodents, cat, dog and bovine. Our results indicate that SARS-CoV recombined with a member of the group III lineage, suggesting that an avian coronavirus was involved, a further point of general agreement between our results and that of Rest and Mindell (2003). Other recombination events evident from our analysis, involve the branch leading to group I, which encompasses viruses from several mammalian taxa, including two very divergent strains of porcine coronaviruses. Thus, our analyses indicate that human SARS-CoV have a past history of recombination with coronaviruses hosted in distinct animal groups. Mixed animal husbandry practices, in proximity to human populations, could have led to the evolution of the SARS coronavirus and facilitated its progression as an infectious disease in humans. Novel human influenza viruses are thought to have arisen from the reassortment, within porcine hosts, of avian, swine, and human influenza viruses (Gregory et al., 2003, Zhou et al., 1999). We suggest that our recombination results for SARS-CoV implicate a suspiciously analogous history. More specifically, SARS-CoV could have arisen from a combination of host jumping and recombinational events, involving as yet unidentified strains of avian coronavirus group III and mammalian (possibly pig) coronavirus group I. Rest and Mindell (2003) suggested host-species shifts have been relatively common in the diversification of coronavirus lineages, a result consistent with our hypothesis for SARS-CoV. Critical to determination of the evolutionary origin of SARS-CoV are expanded epidemiological surveys of wild and domestic animals, including in particular, additional avian species. Understanding the origin and evolutionary history of SARS-CoV is important to proper vaccine development as well as the epidemiological modeling of future outbreaks. Current perception of the SARS-CoV genome is one of relative genetic stability (Brown and Tetro, 2003; Ruan et al., 2003), however, our analyses indicate that SARS-CoV has a complex history of recombination, suggesting that the genome may not be as stable as previously thought. We propose that future epidemiological modeling efforts and vaccine development take this new evidence into account.
  22 in total

1.  MRBAYES: Bayesian inference of phylogenetic trees.

Authors:  J P Huelsenbeck; F Ronquist
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

2.  MODELTEST: testing the model of DNA substitution.

Authors:  D Posada; K A Crandall
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

3.  A likelihood method for the detection of selection and recombination using nucleotide sequences.

Authors:  N C Grassly; E C Holmes
Journal:  Mol Biol Evol       Date:  1997-03       Impact factor: 16.240

4.  Human infection by a swine influenza A (H1N1) virus in Switzerland.

Authors:  V Gregory; M Bennett; Y Thomas; L Kaiser; W Wunderli; H Matter; A Hay; Y P Lin
Journal:  Arch Virol       Date:  2003-04       Impact factor: 2.574

5.  Genetic reassortment of avian, swine, and human influenza A viruses in American pigs.

Authors:  N N Zhou; D A Senne; J S Landgraf; S L Swenson; G Erickson; K Rossow; L Liu; K j Yoon; S Krauss; R G Webster
Journal:  J Virol       Date:  1999-10       Impact factor: 5.103

6.  Phylogenetic analysis of a highly conserved region of the polymerase gene from 11 coronaviruses and development of a consensus polymerase chain reaction assay.

Authors:  C B Stephensen; D B Casebolt; N N Gangopadhyay
Journal:  Virus Res       Date:  1999-04       Impact factor: 3.303

7.  Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage.

Authors:  Eric J Snijder; Peter J Bredenbeek; Jessika C Dobbe; Volker Thiel; John Ziebuhr; Leo L M Poon; Yi Guan; Mikhail Rozanov; Willy J M Spaan; Alexander E Gorbalenya
Journal:  J Mol Biol       Date:  2003-08-29       Impact factor: 5.469

8.  SARS associated coronavirus has a recombinant polymerase and coronaviruses have a history of host-shifting.

Authors:  Joshua S Rest; David P Mindell
Journal:  Infect Genet Evol       Date:  2003-09       Impact factor: 3.342

9.  Comparative analysis of the SARS coronavirus genome: a good start to a long journey.

Authors:  Earl G Brown; Jason A Tetro
Journal:  Lancet       Date:  2003-05-24       Impact factor: 79.321

10.  Aetiology: Koch's postulates fulfilled for SARS virus.

Authors:  Ron A M Fouchier; Thijs Kuiken; Martin Schutten; Geert van Amerongen; Gerard J J van Doornum; Bernadette G van den Hoogen; Malik Peiris; Wilina Lim; Klaus Stöhr; Albert D M E Osterhaus
Journal:  Nature       Date:  2003-05-15       Impact factor: 49.962

View more
  19 in total

Review 1.  The molecular biology of coronaviruses.

Authors:  Paul S Masters
Journal:  Adv Virus Res       Date:  2006       Impact factor: 9.937

Review 2.  Current understanding of middle east respiratory syndrome coronavirus infection in human and animal models.

Authors:  Yanqun Wang; Jing Sun; Airu Zhu; Jingxian Zhao; Jincun Zhao
Journal:  J Thorac Dis       Date:  2018-07       Impact factor: 2.895

3.  Evidence for ACE2-utilizing coronaviruses (CoVs) related to severe acute respiratory syndrome CoV in bats.

Authors:  Ann Demogines; Michael Farzan; Sara L Sawyer
Journal:  J Virol       Date:  2012-03-21       Impact factor: 5.103

4.  Comparative analysis of American Dengue virus type 1 full-genome sequences.

Authors:  S E S Carvalho; D P Martin; L M Oliveira; B M Ribeiro; T Nagata
Journal:  Virus Genes       Date:  2009-12-09       Impact factor: 2.332

5.  Recombinant modified vaccinia virus Ankara expressing the spike glycoprotein of severe acute respiratory syndrome coronavirus induces protective neutralizing antibodies primarily targeting the receptor binding region.

Authors:  Zhiwei Chen; Linqi Zhang; Chuan Qin; Lei Ba; Christopher E Yi; Fengwen Zhang; Qiang Wei; Tian He; Wenjie Yu; Jian Yu; Hong Gao; Xinming Tu; Agegnehu Gettie; Michael Farzan; Kwok-Yung Yuen; David D Ho
Journal:  J Virol       Date:  2005-03       Impact factor: 5.103

Review 6.  Viral evolution and the emergence of SARS coronavirus.

Authors:  Edward C Holmes; Andrew Rambaut
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2004-07-29       Impact factor: 6.237

7.  Differential stepwise evolution of SARS coronavirus functional proteins in different host species.

Authors:  Xianchun Tang; Gang Li; Nikos Vasilakis; Yuan Zhang; Zhengli Shi; Yang Zhong; Lin-Fa Wang; Shuyi Zhang
Journal:  BMC Evol Biol       Date:  2009-03-05       Impact factor: 3.260

Review 8.  MERS-CoV: Understanding the Latest Human Coronavirus Threat.

Authors:  Aasiyah Chafekar; Burtram C Fielding
Journal:  Viruses       Date:  2018-02-24       Impact factor: 5.048

9.  The recent ancestry of Middle East respiratory syndrome coronavirus in Korea has been shaped by recombination.

Authors:  Jin Il Kim; You-Jin Kim; Philippe Lemey; Ilseob Lee; Sehee Park; Joon-Yong Bae; Donghwan Kim; Hyejin Kim; Seok-Il Jang; Jeong-Sun Yang; Hak Kim; Dae-Won Kim; Jeong-Gu Nam; Sung Soon Kim; Kisoon Kim; Jae Myun Lee; Man Ki Song; Daesub Song; Jun Chang; Kee-Jong Hong; Yong-Soo Bae; Jin-Won Song; Joo-Shil Lee; Man-Seong Park
Journal:  Sci Rep       Date:  2016-01-06       Impact factor: 4.379

10.  Sequence analysis of the spike gene of Porcine epidemic diarrhea virus isolated from South China during 2011-2015.

Authors:  Xiaoya Zhao; Zhili Li; Xiduo Zeng; Guanqun Zhang; Jianqiang Niu; Baoli Sun; Jingyun Ma
Journal:  J Vet Sci       Date:  2017-06-30       Impact factor: 1.672

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.