Literature DB >> 15028113

Date of origin of the SARS coronavirus strains.

Hongchao Lu1, Yi Zhao, Jingfen Zhang, Yuelan Wang, Wei Li, Xiaopeng Zhu, Shiwei Sun, Jingyi Xu, Lunjiang Ling, Lun Cai, Dongbo Bu, Runsheng Chen.   

Abstract

BACKGROUND: A new respiratory infectious epidemic, severe acute respiratory syndrome (SARS), broke out and spread throughout the world. By now the putative pathogen of SARS has been identified as a new coronavirus, a single positive-strand RNA virus. RNA viruses commonly have a high rate of genetic mutation. It is therefore important to know the mutation rate of the SARS coronavirus as it spreads through the population. Moreover, finding a date for the last common ancestor of SARS coronavirus strains would be useful for understanding the circumstances surrounding the emergence of the SARS pandemic and the rate at which SARS coronavirus diverge.
METHODS: We propose a mathematical model to estimate the evolution rate of the SARS coronavirus genome and the time of the last common ancestor of the sequenced SARS strains. Under some common assumptions and justifiable simplifications, a few simple equations incorporating the evolution rate (K) and time of the last common ancestor of the strains (T0) can be deduced. We then implemented the least square method to estimate K and T0 from the dataset of sequences and corresponding times. Monte Carlo stimulation was employed to discuss the results.
RESULTS: Based on 6 strains with accurate dates of host death, we estimated the time of the last common ancestor to be about August or September 2002, and the evolution rate to be about 0.16 base/day, that is, the SARS coronavirus would on average change a base every seven days. We validated our method by dividing the strains into two groups, which coincided with the results from comparative genomics.
CONCLUSION: The applied method is simple to implement and avoid the difficulty and subjectivity of choosing the root of phylogenetic tree. Based on 6 strains with accurate date of host death, we estimated a time of the last common ancestor, which is coincident with epidemic investigations, and an evolution rate in the same range as that reported for the HIV-1 virus.

Entities:  

Mesh:

Year:  2004        PMID: 15028113      PMCID: PMC516801          DOI: 10.1186/1471-2334-4-3

Source DB:  PubMed          Journal:  BMC Infect Dis        ISSN: 1471-2334            Impact factor:   3.090


Background

A new respiratory infectious epidemic, severe acute respiratory syndrome (SARS), broke out and spread throughout the world, affecting over 8,000 individuals in 32 countries[1,2]. In response to this outbreak, a global network of international collaborating laboratories was immediately sponsored and established by World Health Organization (WHO) to facilitate the identification of the causative agent of SARS. By now the putative pathogen of SARS has been identified, by experimental proof and by Koch's postulates, as a new coronavirus, a single positive-strand RNA virus [3-5]]. The whole genome of SARS coronavirus was first sequenced by the British Columbia Centre for Disease Control (CDC) in Canada on 23, April 2003 [6], and subsequently a total of 16 SARS coronavirus strains isolated from Hanoi, mainland China, Hong Kong, Singapore, and Taiwan were sequenced within short time[7,8]. Phylogenetic analysis and comparative genomic studies based on these genomic sequences indicate that the SARS coronavirus is distinct from any of the previously characterized coronaviruses. Epidemiological investigations further indicate the SARS coronavirus strains may be divided into two different genotypes[9]. RNA viruses commonly have a high rate of genetic mutation, by which the viruses escape from host defence and evolve into novel viral strains. It is therefore important to know the mutation rate of the SARS coronavirus as it spreads through the population. Moreover, finding a date for the last common ancestor of SARS coronavirus strains would be useful for understanding the circumstances surrounding the emergence of the SARS pandemic and the rate at which SARS coronavirus diverge. Many attempts have been made to extrapolate the age of the common ancestor of sequenced genomes. Most of them are based on accurate phylogenetic tree reconstructions, which demand a large amount computation, because of their application of the maximum likelihood strategy. Common for these methods is that it is critical to choose a sequence as the root of the phylogenetic tree. Korber et al. [10] implemented a parsimonious strategy, which used the consensus sequence including the most common bases appearing in strains as the ancestral sequence.

Methods

Among the 16 full-length SARS coronavirus genomes, we selected 6 strains for which the accurate date of host death is known, and on which our modelling was based. Our model performed calculation under two hypotheses, which are commonly adopted and have lead to accurate prediction in the study of HIV-1 virus [10]: first, nucleotide variation of these strains occurred by independent mutations at random positions in a single ancestral sequence; second, there exist a molecular clock and a constant rate of evolution. In addition, we simplified the calculation by neglecting trivial non-linear effects of multi-mutation for a base, i.e. there has only been one mutation for a base at a specific position of all the sequences during SARS infection time. This simplification can be justified by further discussion (see Additional file: 1). For an ancestor sequence S0 of a strain S, we can deduce for the assumptions above that E(D(S0, S)) ≈ K(T - T0), where D(S0, S) is difference of the two sequences (as depicted by Hamming distance), T0 is the date of the last ancestor, T is the date of host death (as an estimate of sampling date), and K is the evolution rate constant. The formula gives the expectation of sequence differences in proportion to the time of evolution. If S0 is the last common ancestor of S and S', then we have E(D(S,S')) = E(D(S0,S)) + E(D(S0,S')) (Fig. 1(a)). The equation takes this form under the simplification that along the total of the infection paths of the two sequences, mutation at any specific point of the sequences could, at most, only occur once. Thus E(D(S,S')) ≈ K(T + T' - 2T0).
Figure 1

Phylogenetic Tree a) For two strains; b) For several strains, these can be divided into two groups from the last common ancestor.

The last common ancestor S0 of all the sequences is the root of the hidden phylogenetic tree with the strains as nodes. From the time T0, the sequences should at least evolve along two different routes. Therefore, there should be a partition of the strains into B and B' such that every pair of strains S ∈ B and S' ∈ B' should share the root of the tree as their last common ancestor (Fig. 1(b)), i.e., for each pair , E(D(S,S')) should be linear to (T + T') with same parameter T0. Therefore, we can implement the least square method to estimate K and T0 from the dataset. Since the real partition cannot be known in advance, we carried out calculations for all of the possible partitions of these 6 strains. For each division we use the estimated K to calculate the possible T0(S,S') of each sequence pair. The division with the minimum variance of T0(S,S') is taken as our best solution to the problem, and the corresponding K as an estimation of the mutation rate. To analyze how the parameters affect the results and support our fitting method, the Monte Carlo method was employed. At first, we produced a phylogenetic tree (See Fig. 3(c)) and a table of parameters (See Table 3) including the evolution rate and the times of the sequences. From the time of the last common ancestor S0 of the other sequences, every base of a given sequence has the possibility to mutate over time according to the given evolution rate. So the other sequences, included intermediate sequences (I) and final sequences (F), can be obtained in steps in the stimulation according to the given phylogenetic tree and the time parameters. After the sequences were obtained, we used our fitting method to get the evolution rate, without including the hidden parameters. By analysis of the estimated K from the data, we can get to know how the parameters affect our fitting results and the quality of our method.
Figure 3

Estimated K for Monte Carlo Simulation The distribution of estimated K is shown in a) and b): a) Model 1; b) Model 2. The common phylogenetic tree is shown in c)

Table 3

Parameters in the Monte Carlo stimulation

KT0TI1TI2TI3TI4TF1TF2TF3TF4TF5TF6

base/dayDaydaydaydaydaydaydaydaydaydayday
Model 10.2-123-100-30-60-401910341313-13
Model 20.2-180-1200-90-6060300-30-60-90

Results

Of the 16 SARS coronavirus strains submitted to Genbank before June, 2003, 6 had accurate date of host death recorded. We chose these 6 to estimate the last common ancestor and the mutation rate of the SARS coronavirus (Table 1). We performed the calculation, and the fitting result of the best division (See Table 2) is shown in Fig. 2, including the differences between sequences D(S,S') versus the time factor (T + T'). The evolution rate K was estimated to be 0.16 base/day, which is similar to the reported evolution rate of HIV-1 virus [10]. The date of the last common ancestor T0 was found to be about August or September, 2002, which is also in accordance with the epidemic investigations finding that the first verifiable SARS case was reported as early as on November 11, 2002.
Table 1

Dates of hosts' death

IDStrainDate of host deathDate form Feb. 22
1BJ0103-08-200313
2BJ0203-08-200313
3GZ0102-10-2003-13
4SIN250003-14-200319
5TOR203-05-200310
6US03-29-200334
Table 2

Grouping of the strains

SiSjD(Si, Sj)T0*(i, j)Annotation
GZ01BJ0255-172Best DivisionG1
GZ01TOR253-167Best DivisionG1
GZ01US56-165Best DivisionG1
GZ01SIN250053-163Best DivisionG1
GZ01BJ0149-153Best DivisionG1
BJ02TOR224-64G1
BJ02US27-61G1
BJ02SIN250024-59G1
BJ01BJ0216-37G1
BJ01TOR214-32G1
BJ01US17-30G1
BJ01SIN250014-28G1
TOR2US70G2
SIN2500TOR242G2
SIN2500US75G2

Note: The best division is shown to the top, where one group include GZ01 and the other include the other strains. And from the time of the last common ancestor T0*(i, j), the strains can be classified into G1 = {GZ01,BJ01,BJ02} and G2 = {TOR2,US,SIN2500}.

Figure 2

The linear relation between D(S,S') and (T+T') The parameters were estimated from the best division of 6 strains, where K is the evolution rate (base/day) and T0 is the time (day) of the last common ancestor.

We validated our estimation of the evolutionary rate by grouping strains according to the estimated date of their pair wise last common ancestor. Applying the estimated K = 0.16, we can determine a date T0*(i, j) of the last common ancestor for each pair by E(D(S,S)) = K(T+ T- 2T0*(i, j)) [Table 2], and then divide the 6 into two groups, G1 = {BJ01, BJ02, GZ01} and G2 = {TOR2,SIN2500,US}. It is apparent that every two members in G1 have a last common ancestor with a date T0*(i, j) > 0, while every two members in G2 have corresponding T0*(i, j) < 0. This would imply that the strains in G1 have a more recent last common ancestor than those in G2. This partition of strains was supported by Ruan et al [9].

Discussion

Analysis by Monte Carlo Method was employed to test our fitting method and explain why the error of the evolution rate and time of last common ancestor was so large in our prediction. In a simulation of the simplified evolution model, sequences were generated according to a given phylogenetic tree, with parameters including evolution rate and times for each sequence. Two sets of parameters were used for a common phylogenetic tree, the evolution rate kept constant while time parameters differed (See Fig. 3(c) and Table 3). In model 1 there is a narrow time distribution two month of final sequences, while model 2 had a wider time distribution of five months. Hundreds of iterations of sequence data from the stimulation were given according to the parameters. For each result, we could get estimated parameters by our fitting method. The estimated K distribution of the results (shown in Fig 3) is in support of our fitting method, as in both models the estimates for the evolutionary rate converged on the set parameter (0.2). Model 2 with wide time distribution had a narrower distribution of K, which indicates the fitted parameter has a smaller error. The difference between the two models hints a narrow sampling time window as a partially explanation of the large error on the estimated K for the real data. Ideally, an estimation of evolution rate and the date of last ancestor for the SARS coronavirus should be based on sampling dates, with possible adjustments for culturing time and conditions. As such data were neither included in the submissions to Genbank, nor obtainable by direct contacts to the sequencing labs, we were left to choose between less ideal age estimates for the strains, such as date of host death, sequencing date, or submission date to Genbank. Sequencing dates were no more available than sampling dates, and for some groups several sequences were submitted to Genbank on the same date. In addition large part of the GenBank sequence were submitted long after June 2003, when no or very few SARS patient were available for sampling, also rendering submission date a not very accurate estimate for strain time. This basically left us with little other choice than to accept the date of host death as the most accurate available estimate for the age of each strain. Assuming that in most cases samples were taken a few days before to just after the death of the host, we think these dates represent acceptable, though not ideal, estimates of the endpoint of strain time. In summary, certain inherent features of the situation around the SARS epidemic prevented our method from rendering more accurate estimates. First, as national and international efforts fortunately succeeded in stemming the spread of fledgling epidemic by summer 2003, all the samples used to obtain the 16 sequences were collected within a relatively short period of time (two months), which makes the error of D(S0, S) is relative large. Second, because the date of host death is not good reflection of real time of sequences, the error of time is quite large. Third, as useful time data for the submitted sequences were scarce, the subset of sequences available for modelling was too small. Finally, as data on pre-sequencing culturing times and conditions have not been made available, differences in evolutions rates between in vivo and in vitro conditions cannot be estimated, and the basic assumption, that only a constant evolution rate may not be completely valid. A more accurate model considering two evolution rate parameters may produce a more accurate estimation, particularly on a larger dataset with accurate sampling and sequencing times.

Conclusions

We have proposed a mathematical model to estimate the evolution rate of the SARS coronavirus genome as well as the time of the last common ancestor of the various SARS coronavirus strains. The method is simple to implement and avoids the difficulty and subjectivity of choosing the root of phylogenetic tree. Based on 6 strains with accurate dates of host death, we estimated a time of the last common ancestor, which is coincident with epidemic investigations, and an evolution rate in the same range as that reported for the HIV-1 virus.

Competing interests

None declared.

Authors' contributions

Lu and Bu built the model including proposing the assumptions, deriving the system of equations, programmed and analyzed the data. Zhao, Wang, Li, Zhu, Sun, Cai collected data and analyzed them. Zhang, Xu programmed and prepared for the paper, Chen, Bu, Ling led the group to complete work related to the paper.

Pre-publication history

The pre-publication history for this paper can be accessed here: Click here for file
  8 in total

1.  Timing the ancestor of the HIV-1 pandemic strains.

Authors:  B Korber; M Muldoon; J Theiler; F Gao; R Gupta; A Lapedes; B H Hahn; S Wolinsky; T Bhattacharya
Journal:  Science       Date:  2000-06-09       Impact factor: 47.728

2.  Characterization of a novel coronavirus associated with severe acute respiratory syndrome.

Authors:  Paul A Rota; M Steven Oberste; Stephan S Monroe; W Allan Nix; Ray Campagnoli; Joseph P Icenogle; Silvia Peñaranda; Bettina Bankamp; Kaija Maher; Min-Hsin Chen; Suxiong Tong; Azaibi Tamin; Luis Lowe; Michael Frace; Joseph L DeRisi; Qi Chen; David Wang; Dean D Erdman; Teresa C T Peret; Cara Burns; Thomas G Ksiazek; Pierre E Rollin; Anthony Sanchez; Stephanie Liffick; Brian Holloway; Josef Limor; Karen McCaustland; Melissa Olsen-Rasmussen; Ron Fouchier; Stephan Günther; Albert D M E Osterhaus; Christian Drosten; Mark A Pallansch; Larry J Anderson; William J Bellini
Journal:  Science       Date:  2003-05-01       Impact factor: 47.728

3.  Identification of a novel coronavirus in patients with severe acute respiratory syndrome.

Authors:  Christian Drosten; Stephan Günther; Wolfgang Preiser; Sylvie van der Werf; Hans-Reinhard Brodt; Stephan Becker; Holger Rabenau; Marcus Panning; Larissa Kolesnikova; Ron A M Fouchier; Annemarie Berger; Ana-Maria Burguière; Jindrich Cinatl; Markus Eickmann; Nicolas Escriou; Klaus Grywna; Stefanie Kramme; Jean-Claude Manuguerra; Stefanie Müller; Volker Rickerts; Martin Stürmer; Simon Vieth; Hans-Dieter Klenk; Albert D M E Osterhaus; Herbert Schmitz; Hans Wilhelm Doerr
Journal:  N Engl J Med       Date:  2003-04-10       Impact factor: 91.245

4.  A cluster of cases of severe acute respiratory syndrome in Hong Kong.

Authors:  Kenneth W Tsang; Pak L Ho; Gaik C Ooi; Wilson K Yee; Teresa Wang; Moira Chan-Yeung; Wah K Lam; Wing H Seto; Loretta Y Yam; Thomas M Cheung; Poon C Wong; Bing Lam; Mary S Ip; Jane Chan; Kwok Y Yuen; Kar N Lai
Journal:  N Engl J Med       Date:  2003-03-31       Impact factor: 91.245

5.  The Genome sequence of the SARS-associated coronavirus.

Authors:  Marco A Marra; Steven J M Jones; Caroline R Astell; Robert A Holt; Angela Brooks-Wilson; Yaron S N Butterfield; Jaswinder Khattra; Jennifer K Asano; Sarah A Barber; Susanna Y Chan; Alison Cloutier; Shaun M Coughlin; Doug Freeman; Noreen Girn; Obi L Griffith; Stephen R Leach; Michael Mayo; Helen McDonald; Stephen B Montgomery; Pawan K Pandoh; Anca S Petrescu; A Gordon Robertson; Jacqueline E Schein; Asim Siddiqui; Duane E Smailus; Jeff M Stott; George S Yang; Francis Plummer; Anton Andonov; Harvey Artsob; Nathalie Bastien; Kathy Bernard; Timothy F Booth; Donnie Bowness; Martin Czub; Michael Drebot; Lisa Fernando; Ramon Flick; Michael Garbutt; Michael Gray; Allen Grolla; Steven Jones; Heinz Feldmann; Adrienne Meyers; Amin Kabani; Yan Li; Susan Normand; Ute Stroher; Graham A Tipples; Shaun Tyler; Robert Vogrig; Diane Ward; Brynn Watson; Robert C Brunham; Mel Krajden; Martin Petric; Danuta M Skowronski; Chris Upton; Rachel L Roper
Journal:  Science       Date:  2003-05-01       Impact factor: 47.728

6.  A novel coronavirus associated with severe acute respiratory syndrome.

Authors:  Thomas G Ksiazek; Dean Erdman; Cynthia S Goldsmith; Sherif R Zaki; Teresa Peret; Shannon Emery; Suxiang Tong; Carlo Urbani; James A Comer; Wilina Lim; Pierre E Rollin; Scott F Dowell; Ai-Ee Ling; Charles D Humphrey; Wun-Ju Shieh; Jeannette Guarner; Christopher D Paddock; Paul Rota; Barry Fields; Joseph DeRisi; Jyh-Yuan Yang; Nancy Cox; James M Hughes; James W LeDuc; William J Bellini; Larry J Anderson
Journal:  N Engl J Med       Date:  2003-04-10       Impact factor: 91.245

7.  Coronavirus as a possible cause of severe acute respiratory syndrome.

Authors:  J S M Peiris; S T Lai; L L M Poon; Y Guan; L Y C Yam; W Lim; J Nicholls; W K S Yee; W W Yan; M T Cheung; V C C Cheng; K H Chan; D N C Tsang; R W H Yung; T K Ng; K Y Yuen
Journal:  Lancet       Date:  2003-04-19       Impact factor: 79.321

8.  Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection.

Authors:  Yi Jun Ruan; Chia Lin Wei; Ai Ling Ee; Vinsensius B Vega; Herve Thoreau; Se Thoe Yun Su; Jer-Ming Chia; Patrick Ng; Kuo Ping Chiu; Landri Lim; Tao Zhang; Chan Kwai Peng; Ean Oon Lynette Lin; Ng Mah Lee; Sin Leo Yee; Lisa F P Ng; Ren Ee Chee; Lawrence W Stanton; Philip M Long; Edison T Liu
Journal:  Lancet       Date:  2003-05-24       Impact factor: 79.321

  8 in total
  15 in total

Review 1.  Evolution of feline immunodeficiency virus in Felidae: implications for human health and wildlife ecology.

Authors:  Jill Pecon-Slattery; Jennifer L Troyer; Warren E Johnson; Stephen J O'Brien
Journal:  Vet Immunol Immunopathol       Date:  2008-01-19       Impact factor: 2.046

Review 2.  Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission.

Authors:  Rachel L Graham; Ralph S Baric
Journal:  J Virol       Date:  2009-11-11       Impact factor: 5.103

3.  Tracing the SARS-coronavirus.

Authors:  Paul K S Chan; Martin C W Chan
Journal:  J Thorac Dis       Date:  2013-08       Impact factor: 2.895

4.  Lack of innate interferon responses during SARS coronavirus infection in a vaccination and reinfection ferret model.

Authors:  Mark J Cameron; Alyson A Kelvin; Alberto J Leon; Cheryl M Cameron; Longsi Ran; Luoling Xu; Yong-Kyu Chu; Ali Danesh; Yuan Fang; Qianjun Li; Austin Anderson; Ronald C Couch; Stephane G Paquette; Ndingsa G Fomukong; Otfried Kistner; Manfred Lauchart; Thomas Rowe; Kevin S Harrod; Colleen B Jonsson; David J Kelvin
Journal:  PLoS One       Date:  2012-09-24       Impact factor: 3.240

5.  A combined nucleocapsid vaccine induces vigorous SARS-CD8+ T-cell immune responses.

Authors:  Ali Azizi; Susan Aucoin; Helina Tadesse; Rita Frost; Masoud Ghorbani; Catalina Soare; Turaya Naas; Francisco Diaz-Mitoma
Journal:  Genet Vaccines Ther       Date:  2005-08-22

Review 6.  Molecular evolution and emergence of avian gammacoronaviruses.

Authors:  Mark W Jackwood; David Hall; Andreas Handel
Journal:  Infect Genet Evol       Date:  2012-05-17       Impact factor: 3.342

7.  Non-molecular-clock-like evolution following viral origins in homo sapiens.

Authors:  Wendy Mok; Kelly Seto; Jon Stone
Journal:  Evol Bioinform Online       Date:  2007-09-26       Impact factor: 1.625

Review 8.  Bioinformatics in China: a personal perspective.

Authors:  Liping Wei; Jun Yu
Journal:  PLoS Comput Biol       Date:  2008-04-25       Impact factor: 4.475

9.  Dating the time of viral subtype divergence.

Authors:  John D O'Brien; Zhen-Su She; Marc A Suchard
Journal:  BMC Evol Biol       Date:  2008-06-09       Impact factor: 3.260

Review 10.  Towards our understanding of SARS-CoV, an emerging and devastating but quickly conquered virus.

Authors:  Youjun Feng; George F Gao
Journal:  Comp Immunol Microbiol Infect Dis       Date:  2007-07-19       Impact factor: 2.268

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.