Literature DB >> 17112780

Characterizing 56 complete SARS-CoV S-gene sequences from Hong Kong.

Julian W Tang1, Jo L K Cheung, Ida M T Chu, Margaret Ip, Mamie Hui, Malik Peiris, Paul K S Chan.   

Abstract

BACKGROUND: The spike glycoprotein (S) gene of the severe acute respiratory syndrome-associated coronavirus (SARS-CoV) has been useful in analyzing the molecular epidemiology of the 2003 SARS outbreaks.
OBJECTIVES: To characterize complete SARS-CoV S-gene sequences from Hong Kong. STUDY
DESIGN: Fifty-six SARS-CoV S-gene sequences, obtained from patients who presented with SARS to the Prince of Wales Hospital during March-May 2003, were analysed using a maximum likelihood (ML) approach, together with 138 other (both human and animal) S-gene sequences downloaded from GenBank.
RESULTS: The maximum-likelihood (ML) trees showed little evolution occurring within these 56 sequences. Analysis with the other sequences, showed three distinct SARS clusters, closely correlated to previously defined early, middle and late phases of the 2003 international SARS outbreaks. In addition, two new single nucleotide variations (SNVs), T21615A and T21901A, were discovered, not previously reported elsewhere.
CONCLUSIONS: The ML approach to the reconstruction of tree phylogenies is known to be superior to the more popular, less computationally and time-demanding neighbour-joining (NJ) approach. The ML analysis in this study confirms the previously reported SARS epidemiology analysed mostly using the NJ approach. The two new SNVs reported here are most likely due to the tissue-culture passaging of the clinical samples.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17112780      PMCID: PMC7108452          DOI: 10.1016/j.jcv.2006.10.001

Source DB:  PubMed          Journal:  J Clin Virol        ISSN: 1386-6532            Impact factor:   3.168


Introduction

Since the severe acute respiratory syndrome (SARS) epidemic of 2003 around the world, many researchers have attempted to determine the natural reservoir of the SARS-associated coronavirus (SARS-CoV). Studies on the possible animal source of SARS-CoV have mainly focused on the Himalyan palm civet (Paguma larvata), though other animals (e.g. the raccoon dog, Nyctereutes procyonoides) have been shown to carry coronaviruses closely related to the SARS-CoV. Of note, these related animal coronaviruses possess a 29 base-pair sequence (position 27,869–27,897) in the putative open reading frame (ORF) 11 (Marra et al., 2003), that is absent in most human SARS-CoV isolates (Guan et al., 2003, Song et al., 2005). A more recent study on SARS-CoV-like coronaviruses from palm civets and raccoon dogs from various live markets in China identified a series of single nucleotide variations (SNVs) in the SARS-CoV spike (S) glycoprotein gene (Kan et al., 2005). The authors suggested that these SNVs marked the transmission of SARS-CoV-like viruses from these animals into humans during various phases of the SARS epidemic. Most recently, SARS-like CoV have been found in bats though the S-gene homology with SARS-CoV is low, around 80%, and it is still uncertain as to whether this is the true natural reservoir for SARS-CoV (Lau et al., 2005, Li et al., 2005, Poon et al., 2005a, Poon et al., 2005b). Guan et al. (2004) described the genetic variation of the first 2149 bp of the S-gene, mainly focusing on the S1 (amino) region. This study compared local SARS-CoV S1 gene sequences collected from 137 Hong Kong SARS patients from February to April 2003, with 27 other sequences then available from GenBank, using neighbour-joining (NJ) phylogenetic analysis. The authors concluded that the international SARS epidemics were caused by closely related SARS-CoVs. However, some strains isolated from Hong Kong in the early phase of epidemic, were distinct from the international outbreak strain, and may have represented transitory strains of SARS-CoV (Chim et al., 2004). Furthermore, analysis of SARS-CoV strains subsequently isolated from China suggested that this virus may have been circulating there, unidentified, for some time before the international epidemic occurred (Chinese SARS Molecular Epidemiology Consortium, CSMEC, 2004, Zhao et al., 2004). In order to further characterize the evolution of SARS-CoV, S-gene sequences of isolates from patients in Hong Kong were analysed using a maximum likelihood (ML) method, and any new SNVs documented.

Materials and methods

Patients, sample collection and processing

Patients with laboratory-confirmed SARS-CoV infection admitted to the New Territories East Cluster Hospitals in Hong Kong were included (Chan et al., 2004). For each patient, a stored original clinical sample (CS), or its primary isolate from tissue culture (TC) when the original sample has been exhausted, was retrieved for this study. The nucleotide sequence of the whole S-gene was obtained by direct sequencing. Briefly, the extracted RNA was amplified with two sets of overlapping primers using Superscript™ III One-Step RT-PCR System with Platinum Taq DNA polymerase (Invitrogen, Life Technology, Carlsbad, CA). Purified PCR products were sequenced using six internal sequencing primers using BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA). Their sequences were edited and aligned using the Seqscape v2.1.1 software (Applied Biosystems, Foster City, CA). All 56 Hong Kong S-gene sequences obtained in this study were deposited on the National Center for Biotechnology Information Genome Database (GenBank, accession numbers: DQ412574–DQ412629).

SARS-CoV S-gene phylogenetic analysis

The S-gene sequences were aligned using the Clustal X's multiple alignment algorithm, then manually checked and edited using Bioedit v7.0.4.1 (Tippmann, 2004). After alignment and editing, the total length of the S-gene sequences was 3759 bp. A maximum likelihood (ML) tree were then constructed using the program PAUP* v4.0b10 (Swofford, 2001) under an optimum model of evolution as selected by the program Modeltest v3.7 (Posada and Crandall, 1998). The robustness of the trees’ topology was statistically assessed by bootstrap analysis, with a minimum of 1000 rounds of replication. To put these 56 CUHK S-gene sequences in context, 138 other S-gene sequences were downloaded from the GenBank and these 194 S-gene sequences were drawn, using the same methods above. Both trees were rooted against a civet cat sequence (CIV007, AY572034), and plotted using NJPlot (Perrière and Gouy, 1996).

Results

A total of 56 SARS patients were included in this study (mean age 54.8, S.D. 21.9 years, 21 males). These patients all became ill during March–May 2003. Samples for analysis were collected after a mean of 5.9 (S.D. 4.8) days after illness onset. Table 1 shows the specimen type and numbers used in this study, 11 were original clinical samples (CS) and 45 were primary tissue culture isolates (TC).
Table 1

Types of specimens used for S-gene sequencing

Specimen typeNo. of specimens testedClinical sample (CS)Tissue culture isolate (TC)
Respiratorya44440
Stool550
Urine220
Tissueb505



Total561145

Includes nasopharyngeal aspirate, throat and nasal swabs.

Includes four lung and one terminal ileum biopsy samples.

Types of specimens used for S-gene sequencing Includes nasopharyngeal aspirate, throat and nasal swabs. Includes four lung and one terminal ileum biopsy samples. The ML trees in Fig. 1, Fig. 2 were both constructed using a Kimura three-parameters model of base substitution with unequal base frequencies, a proportion of invariant sites and a gamma-distributed site substitution rate (K81uf +  I  +  G), as selected by Modeltest under Akaike Information Criteria, using a nearest-neighbor interchange heuristic search strategy in PAUP*.
Fig. 1

Maximum likelihood phylogram of the SARS-CoV S-gene from 56 Hong Kong patients. Note: Constructed under a Kimura three-parameters with unequal base frequencies (K81uf) model of evolution with invariable (I) sites and a gamma (G) distributed rate of substitution (i.e. K81uf + I + G), as selected by Modeltest (v3.7) under the Akaike Information Criteria (AIC), using a nearest-neighbor interchange (NNI) heuristic search strategy in PAUP*. The dates in the sample names are dates of fever onset for that patient. The scale for the branch lengths is indicated (number of nucleotide substitutions per site). There were no branches with bootstrap values >70. GenBank accession numbers for the 56 S-gene sequences: DQ412574–DQ412629. CUHK: Chinese University of Hong Kong; TC: tissue cultured isolate; CS: clinical sample; NP: nasopharyngeal aspirate; UR: urine; NS: nasal swab; SP: sputum; ST: stool; TS: throat swab; RS: rectal swab; TG: throat gargle; TI: terminal ileal biopsy; RL: right lung biopsy; L: lung biopsy.

Fig. 2

Maximum likelihood phylogram of the SARS-CoV S-gene from 56 Hong Kong samples and 138 downloaded S-gene sequences from GenBank. Note: Constructed under a Kimura three-parameters with unequal base frequencies (K81uf) model of evolution with invariable (I) sites and a gamma (G) distributed rate of substitution (i.e. K81uf + I + G), as selected by Modeltest (v3.7) under the Akaike Information Criteria (AIC), using a nearest-neighbor interchange (NNI) heuristic search strategy in PAUP*. The dates in the Hong Kong sample names are dates of fever onset for that patient, from whom the sample was taken. The scale for the branch lengths is indicated (number of nucleotide substitutions per site). The boxes highlight branches containing samples showing significant differences in their S-gene sequences from the rest of the group. Cluster 1: ‘mainland’ (solid-line box); cluster 2: ‘Guangdong’ (long-dashed line box); cluster 3: ‘worldwide’ (short-dotted line box). CUHK: Chinese University of Hong Kong; TC: tissue cultured isolate; CS: clinical sample; NP: nasopharyngeal aspirate; UR: urine; NS: nasal swab; SP: sputum; ST: stool; TS: throat swab; RS: rectal swab; TG: throat gargle; TI: terminal ileal biopsy; RL: right lung biopsy; L: lung biopsy. The names of the downloaded sequences from GenBank have been retained as far as possible, though some truncation has been applied in some cases.

Maximum likelihood phylogram of the SARS-CoV S-gene from 56 Hong Kong patients. Note: Constructed under a Kimura three-parameters with unequal base frequencies (K81uf) model of evolution with invariable (I) sites and a gamma (G) distributed rate of substitution (i.e. K81uf + I + G), as selected by Modeltest (v3.7) under the Akaike Information Criteria (AIC), using a nearest-neighbor interchange (NNI) heuristic search strategy in PAUP*. The dates in the sample names are dates of fever onset for that patient. The scale for the branch lengths is indicated (number of nucleotide substitutions per site). There were no branches with bootstrap values >70. GenBank accession numbers for the 56 S-gene sequences: DQ412574–DQ412629. CUHK: Chinese University of Hong Kong; TC: tissue cultured isolate; CS: clinical sample; NP: nasopharyngeal aspirate; UR: urine; NS: nasal swab; SP: sputum; ST: stool; TS: throat swab; RS: rectal swab; TG: throat gargle; TI: terminal ileal biopsy; RL: right lung biopsy; L: lung biopsy. Maximum likelihood phylogram of the SARS-CoV S-gene from 56 Hong Kong samples and 138 downloaded S-gene sequences from GenBank. Note: Constructed under a Kimura three-parameters with unequal base frequencies (K81uf) model of evolution with invariable (I) sites and a gamma (G) distributed rate of substitution (i.e. K81uf + I + G), as selected by Modeltest (v3.7) under the Akaike Information Criteria (AIC), using a nearest-neighbor interchange (NNI) heuristic search strategy in PAUP*. The dates in the Hong Kong sample names are dates of fever onset for that patient, from whom the sample was taken. The scale for the branch lengths is indicated (number of nucleotide substitutions per site). The boxes highlight branches containing samples showing significant differences in their S-gene sequences from the rest of the group. Cluster 1: ‘mainland’ (solid-line box); cluster 2: ‘Guangdong’ (long-dashed line box); cluster 3: ‘worldwide’ (short-dotted line box). CUHK: Chinese University of Hong Kong; TC: tissue cultured isolate; CS: clinical sample; NP: nasopharyngeal aspirate; UR: urine; NS: nasal swab; SP: sputum; ST: stool; TS: throat swab; RS: rectal swab; TG: throat gargle; TI: terminal ileal biopsy; RL: right lung biopsy; L: lung biopsy. The names of the downloaded sequences from GenBank have been retained as far as possible, though some truncation has been applied in some cases. Fig. 1 shows the S-gene sequences obtained from the 56 Hong Kong patients, rooted against CIV007. The dates in the sample names are dates of fever onset for that patient. Despite the apparent multiple branching topology, there are no significant branches (all bootstrap values < 70). The homology of these CUHK 56 S-gene sequences was 99–100% Fig. 2 shows the position of these 56 sequences within the other 138 S-gene sequences downloaded from GenBank. Only bootstrap values > 70 are shown. Three distinct clusters can be seen, the sequences of which have been characterized and discussed elsewhere (CSMEC, 2004, Kan et al., 2005, Lan et al., 2005, Song et al., 2005, Wang et al., 2005, Yeh et al., 2004). All the CUHK S-gene sequence lie within cluster 3 ‘worldwide’, together with other S-gene sequences from Hong Kong (CUHK-W1, AY278554; CUHK-AG01, AY345986; CUHK-AG02, AY345987; CUHK-AG03, AY345988; CUHK-Su10, AY282752; HKU-39849, AY278491). When aligned and compared with the TOR-2 reference strain (AY274119), four single nucleotide variations (SNVs) were found in a large proportion of the CUHK S-gene sequences, as shown in Table 2 .
Table 2

Single nucleotide variations (SNVs) present in the 56 CUHK SARS-CoV S-gene sequences

SNV positionBase changeAmino acid changeTotal no. of samples with SNV (%)No. of CS samples containing SNV (%)No. of TC samples containing SNV (%)
21,615T to ATyrosine to stop21 (37.5)2 (9.5)19 (90.5)
21,901T to APhenylalanine to isoleucine18 (32.1)0 (0)18 (100)
23,220G to TAlanine to serine56 (100)11 (100)45 (100)
25,114C to TAlanine to valine17 (30.4)7 (63.6)10 (22.2)
Single nucleotide variations (SNVs) present in the 56 CUHK SARS-CoV S-gene sequences

Discussion

Since the end of the 2003 SARS epidemic, SARS-CoV has only reappeared a few times, sporadically, and only affecting a few people (WHO, 2003, Lim et al., 2004, WHO, 2004a, WHO, 2004b). No further SARS epidemics have been identified. Therefore, the 56 samples analysed in this study, collected over a 3-month period during the SARS epidemic in Hong Kong (March–June 2003), represent one of the longest continuous periods over which SARS-CoV has been investigated for any viral mutation and evolution within a human population. As shown in Fig. 1, these 56 CUHK sequences are all very similar, lying in Fig. 2's cluster 3 ‘worldwide’ S-gene sequences. The pattern in Fig. 2 shows that SARS-CoV S-gene sequences isolated from some Guangdong patients and animals from Beijing and Guangdong cluster together (cluster 1 ‘mainland’). This correlates well with the early phase of the SARS epidemic, as has been defined previously (CSMEC, 2004, Song et al., 2005). Next, there is a transitional region seen, most likely to represent the spread of the SARS epidemic within Guangdong, and then out to Beijing and Hong Kong, then the rest of the world (cluster 2 ‘Guangdong’). This corresponds well with the late-early and middle phase of the SARS epidemic (CSMEC, 2004, Ruan et al., 2003, Song et al., 2005). Finally, the S-gene sequences from this study together with those obtained from outside Hong Kong and China (cluster 3 ‘worldwide’), are representative of the middle-to-late phase of the SARS epidemics (CSMEC, 2004). It has been well established for almost 20 years that the ML approach is more accurate for reconstructing tree phylogenies, though the NJ approach is still widely used as it is much less demanding in terms of computational power and time (Hillis et al., 1994, Huelsenbeck and Hillis, 1993, Page and Holmes, 1998, Saitou, 1988, Saitou and Imanishi, 1989). Therefore, it is reassuring that this more complex and demanding ML approach, using optimal models of evolution as described here, with these complete S-gene sequences, independently supports previous NJ analysis of the epidemiology of the 2003 SARS outbreaks (e.g. the Kimura two-parameter distance method used by the CSMEC, 2004, Fig. 6). Due to the availability of more data, Fig. 2 contains more sequences spread over a longer time-frame, and the clusters appear more clearly defined, than the similar figure presented by the CSMEC (2004, Fig. 6). It is possible that using tissue cultured (TC) isolates for S-gene sequence analysis may have introduced some new mutations, not otherwise present in the original clinical samples (CS). Poon et al., 2005a, Poon et al., 2005b reported non-synonymous mutations occurring in the S, membrane and ORF 8a protein coding genes, during the passaging of SARS-CoV through non-human primate cell lines (FRhK4 and Vero E6). Within the S-gene region they found mutations at positions (with respect to the TOR-2): C23412T, A23473G, C23518T, C23632T, C24864T, and A24978G. Tong et al. (2004) also reported two tissue-culture (Vero E6)-associated mutations (with reference to the SARS-CoV Urbani strain): T21938C and T24872C. In this study, we did not find any of these previously reported mutations, however, there were mutations found mainly in the TC samples (T21615A and T21901A) that have not been previously reported elsewhere. These are most likely to be tissue-culture associated mutations. It is interesting that the T21615A mutation results in a stop codon, yet still manages to produce a viable virus in tissue culture. Several studies have documented various SNVs in the SARS-CoV genome, with some even proposing them as the basis for a genotypic classification (Poon et al., 2005a, Poon et al., 2005b, Ruan et al., 2003, Tsui et al., 2003, Wang et al., 2005, Yeh et al., 2004). In this study, the SNV G23220T that was found in all of the patients (56/56, 100%) has been reported elsewhere (Yeh et al., 2004), though the functional significance of this is presently uncertain. The 100% prevalence of this SNV in these samples strongly suggests that a non-TOR-2-like strain of SARS-CoV has infected these Hong Kong patients, rather than this being an example of natural selection or of SARS-CoV behaving as a quasispecies. The second, less frequently found SNV (C25114T) present in 17/56 (30.4%) of these S-gene sequences has also been found in four sequences from Guangdong (GUAN_LC2, AY394999; GUAN_LC3, AY395000; GUAN_LC4, AY395001; GUAN_LC5, AY395002, but not in GUAN_LC1, AY394998), of which GUAN_LC3 and GUAN_LC5 have been previously reported by Wang et al. (2005). The 17 CUHK S-gene sequences containing this polymorphism all occur in samples collected towards the end of the SARS outbreaks in Hong Kong (from 4 April to 31 May 2003). Unfortunately, we do not know on what date the GUAN_LC2/LC3/LC4/LC5 samples were taken (they are unpublished sequences that were submitted to GenBank on 19 September 2003), and it is therefore difficult to ascertain whether this SNV represents ongoing host-adaptive evolution. All of these 56 CUHK samples came from patients admitted to Hong Kong's New Territories East hospital cluster during this period. The 17/56 patients carrying this C25114T mutation may have acquired their virus via a chain of contacts infected with this particular virus, or this mutation occurred in all of these viruses independently, perhaps in response to some common, unknown selection pressure. Therefore, without a more accurate epidemic history for this particular virus, it is hard to say whether this is a founder effect or natural selection acting upon the virus towards the end of the Hong Kong 2003 SARS outbreaks. In summary, the analysis of 56 complete SARS-CoV S-gene sequences from Hong Kong in this study using ML analysis confirms the previous epidemiology of SARS-CoV obtained using mostly NJ analysis. It also describes two new SNVs that are most likely to be associated with the tissue-culture passage of SARS-CoV.
  25 in total

1.  Coronavirus genomic-sequence variations and the epidemiology of the severe acute respiratory syndrome.

Authors:  Stephen K W Tsui; Stephen S C Chim; Y M Dennis Lo
Journal:  N Engl J Med       Date:  2003-07-10       Impact factor: 91.245

2.  Analysis for free: comparing programs for sequence analysis.

Authors:  Helge-Friedrich Tippmann
Journal:  Brief Bioinform       Date:  2004-03       Impact factor: 11.622

3.  Laboratory-acquired severe acute respiratory syndrome.

Authors:  Poh Lian Lim; Asok Kurup; Gowri Gopalakrishna; Kwai Peng Chan; Christopher W Wong; Lee Ching Ng; Su Yun Se-Thoe; Lynette Oon; Xinlai Bai; Lawrence W Stanton; Yijun Ruan; Lance D Miller; Vinsensius B Vega; Lyn James; Peng Lim Ooi; Chew Suok Kai; Sonja J Olsen; Brenda Ang; Yee-Sin Leo
Journal:  N Engl J Med       Date:  2004-04-22       Impact factor: 91.245

4.  MODELTEST: testing the model of DNA substitution.

Authors:  D Posada; K A Crandall
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

5.  Property and efficiency of the maximum likelihood method for molecular phylogeny.

Authors:  N Saitou
Journal:  J Mol Evol       Date:  1988       Impact factor: 2.395

6.  Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China.

Authors:  Y Guan; B J Zheng; Y Q He; X L Liu; Z X Zhuang; C L Cheung; S W Luo; P H Li; L J Zhang; Y J Guan; K M Butt; K L Wong; K W Chan; W Lim; K F Shortridge; K Y Yuen; J S M Peiris; L L M Poon
Journal:  Science       Date:  2003-09-04       Impact factor: 47.728

7.  Molecular evolution and multilocus sequence typing of 145 strains of SARS-CoV.

Authors:  Zhi-Gang Wang; Zhi-Hua Zheng; Lei Shang; Lan-Juan Li; Li-Ming Cong; Ming-Guang Feng; Yun Luo; Su-Yun Cheng; Yan-Jun Zhang; Miao-Gui Ru; Zan-Xin Wang; Qi-Yu Bao
Journal:  FEBS Lett       Date:  2005-09-12       Impact factor: 4.124

8.  Laboratory diagnosis of SARS.

Authors:  Paul K S Chan; Wing-Kin To; King-Cheung Ng; Rebecca K Y Lam; Tak-Keung Ng; Rickjason C W Chan; Alan Wu; Wai-Cho Yu; Nelson Lee; David S C Hui; Sik-To Lai; Ellis K L Hon; Chi-Kong Li; Joseph J Y Sung; John S Tam
Journal:  Emerg Infect Dis       Date:  2004-05       Impact factor: 6.883

9.  Molecular epidemiology of the novel coronavirus that causes severe acute respiratory syndrome.

Authors:  Y Guan; J S M Peiris; B Zheng; L L M Poon; K H Chan; F Y Zeng; C W M Chan; M N Chan; J D Chen; K Y C Chow; C C Hon; K H Hui; J Li; V Y Y Li; Y Wang; S W Leung; K Y Yuen; F C Leung
Journal:  Lancet       Date:  2004-01-10       Impact factor: 79.321

10.  Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection.

Authors:  Yi Jun Ruan; Chia Lin Wei; Ai Ling Ee; Vinsensius B Vega; Herve Thoreau; Se Thoe Yun Su; Jer-Ming Chia; Patrick Ng; Kuo Ping Chiu; Landri Lim; Tao Zhang; Chan Kwai Peng; Ean Oon Lynette Lin; Ng Mah Lee; Sin Leo Yee; Lisa F P Ng; Ren Ee Chee; Lawrence W Stanton; Philip M Long; Edison T Liu
Journal:  Lancet       Date:  2003-05-24       Impact factor: 79.321

View more
  2 in total

1.  Phages bearing affinity peptides to severe acute respiratory syndromes-associated coronavirus differentiate this virus from other viruses.

Authors:  Chao Wang; Xuejiao Sun; Siqingaowa Suo; Yudong Ren; Xunliang Li; Georg Herrler; Volker Thiel; Xiaofeng Ren
Journal:  J Clin Virol       Date:  2013-05-09       Impact factor: 3.168

Review 2.  Phylogenetic perspectives on the epidemiology and origins of SARS and SARS-like coronaviruses.

Authors:  Chi Wai Yip; Chung Chau Hon; Mang Shi; Tommy Tsan-Yuk Lam; Ken Yan-Ching Chow; Fanya Zeng; Frederick Chi-Ching Leung
Journal:  Infect Genet Evol       Date:  2009-09-30       Impact factor: 3.342

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.