Literature DB >> 12775768

Mass spectrometric characterization of proteins from the SARS virus: a preliminary report.

Oleg Krokhin1, Yan Li, Anton Andonov, Heinz Feldmann, Ramon Flick, Steven Jones, Ute Stroeher, Nathalie Bastien, Kumar V N Dasuri, Keding Cheng, J Neil Simonsen, Hélène Perreault, John Wilkins, Werner Ens, Frank Plummer, Kenneth G Standing.   

Abstract

A new coronavirus has been implicated as the causative agent of severe acute respiratory syndrome (SARS). We have used convalescent sera from several SARS patients to detect proteins in the culture supernatants from cells exposed to lavage another SARS patient. The most prominent protein in the supernatant was identified by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) as a approximately 46-kDa species. This was found to be a novel nucleocapsid protein that matched almost exactly one predicted by an open reading frame in the recently published nucleotide sequence of the same virus isolate (>96% coverage). A second viral protein corresponding to the predicted approximately 139-kDa spike glycoprotein has also been examined by MALDI-TOF MS (42% coverage). After peptide N-glycosidase F digestion, 12 glycosylation sites in this protein were confirmed. The sugars attached to four of the sites were also identified. These results suggest that the nucleocapsid protein is a major immunogen that may be useful for early diagnostics, and that the spike glycoprotein may present a particularly attractive target for prophylactic intervention in combating SARS.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12775768      PMCID: PMC7780042          DOI: 10.1074/mcp.M300048-MCP200

Source DB:  PubMed          Journal:  Mol Cell Proteomics        ISSN: 1535-9476            Impact factor:   5.911


The recent clinical identification of a novel type of atypical pneumonia without a clearly defined etiology, together with epidemiological evidence of high transmissibility, have provoked the World Health Organization to issue a rare travel advisory. The new entity has been called severe acute respiratory syndrome (SARS)1 ; it apparently began in Guangdong province in China in November of 2002 and has since spread to Hong Kong, Singapore, Vietnam, Canada, the U.S., Taiwan, and several European countries. The outbreak in Canada began in late February 2003 in a traveler returning from Hong Kong whose exposure was to the index case in the Hong Kong epidemic (a physician who had cared for SARS cases in Guangdong province in the People's Republic of China). The Canadian index case died 9 days after the disease onset, and a 43-year-old male relative became ill 2 days after exposure and died of the adult respiratory distress syndrome 15 days after the illness began (1). Subsequently, Canada has faced the largest SARS outbreak outside of Asia, with at least 351 probable or suspected cases and 27 deaths, mostly in the Toronto area (2, 3). Samples from patients with suspected or probable SARS in Canada have been referred to the National Microbiology Laboratory (NML), Health Canada, for laboratory diagnostics. This laboratory, part of the Canadian Science Centre for Human and Animal Health, is Canada's national reference center for infectious diseases and houses the only Class 4 containment facilities in the country. NML has played an active role in an intensive international collaborative effort among 11 laboratories around the world that suggested a distinct coronavirus may be etiologically involved. In particular, the laboratory prepared the nucleotide samples for the first successful effort to determine the genome sequence for the coronavirus (4), a result soon confirmed by several other laboratories (see, for example, Ref. 5). Nevertheless, the genome sequence merely provides a template for the construction of the viral proteins. Thus, an alternative strategy is to examine the proteins themselves, and mass spectrometry has proved to be an efficient tool for this purpose (6). The University of Manitoba time-of-flight mass spectrometry laboratory has already been active in characterizing viral proteins (7, 8, 9, 10, 11), so it was natural for NML to enlist the university laboratory (late in March) as a collaborator in the analysis of the SARS proteins. The first results of this collaboration are described below.

EXPERIMENTAL PROCEDURES

Preparation of the Primary Material at NML—

Clinical specimens obtained from the original case cluster were extensively investigated for the presence of bacterial and viral pathogens (1). Nasopharyngeal swab and bronchoalveolar lavage fluids from several of these patients were found to be positive by reverse transcription-PCR for human metapneumovirus and the novel coronavirus (1). Inoculation of the bronchoalveolar lavage fluid from the 43-year-old male patient in Vero E6 cells produced a strong cytopathic effect on day 4 after infection. The second passage of this viral isolate was further used to produce large quantities of the virus. Initially, this virus material was used to assess its antigenicity with convalescent serum samples from SARS patients. The convalescent sera that were previously found to be positive for antibodies to the virus by indirect immunofluorescence assay2 strongly reacted in Western blot with a ∼46-kDa protein (Fig. 1 A) similar in size to the nucleocapsid protein of coronaviruses (12).
F

Analysis of proteins from a new coronavirus associated with SARS.A, Viral particles pelleted from the supernatant of vero E6 cells exposed to samples derived from patients with SARS (lane 1) or from mock-infected cells processed in a similar fashion (lane 2) were separated by SDS-PAGE and analyzed by Western blot with convalescent sera from SARS patients (Tor 2, Tor 3, Tor 4, and BC 1) or a control serum from a noninfected donor (NML). The sera from the patients, but not the control, reacted with a 44- to 48-kDa species present in the supernatants from the infected but not the mock-treated cultures (indicated by arrowhead). B, A virus sample similar to that described in A was fractionated on a sucrose gradient. The fraction containing immunoreactive material was separated on a 4–12% bis acrylamide gradient gel and stained with colloidal Coomasie blue. A prominent band with an apparent molecular mass of 44–48 kDa was observed along with much less intense band at ∼180 kDa (indicated by arrowheads). These bands were excised and used for the mass spectrometric studies described in this report.

Analysis of proteins from a new coronavirus associated with SARS.A, Viral particles pelleted from the supernatant of vero E6 cells exposed to samples derived from patients with SARS (lane 1) or from mock-infected cells processed in a similar fashion (lane 2) were separated by SDS-PAGE and analyzed by Western blot with convalescent sera from SARS patients (Tor 2, Tor 3, Tor 4, and BC 1) or a control serum from a noninfected donor (NML). The sera from the patients, but not the control, reacted with a 44- to 48-kDa species present in the supernatants from the infected but not the mock-treated cultures (indicated by arrowhead). B, A virus sample similar to that described in A was fractionated on a sucrose gradient. The fraction containing immunoreactive material was separated on a 4–12% bis acrylamide gradient gel and stained with colloidal Coomasie blue. A prominent band with an apparent molecular mass of 44–48 kDa was observed along with much less intense band at ∼180 kDa (indicated by arrowheads). These bands were excised and used for the mass spectrometric studies described in this report. In order to prepare this (and perhaps other SARS-related proteins) for proteolytic digestion, the virus was purified by 20–60% linear sucrose gradient. Western blotting of the gradient fractions showed that fraction 4 (density, 1.18g/cm3) reacted strongly with a convalescent serum from a SARS patient. This fraction was run on a Novex 4–12% Bis-Tris gel in 4-morpholinepropanesulfonic acid running buffer (Invitrogen), and stained with Coomassie blue (Fig. 1 B). Two bands were then excised from the gel (indicated by arrowheads), one containing the prominent ∼46-kDa protein and the other containing a much weaker protein band with an apparent mass of ∼180 kDa. These were transferred to the university laboratories for in-gel digestion with various proteolytic enzymes.

Proteolytic Digestions—

The excised protein bands were in-gel digested with one of three different enzymes (sequencing grade-modified trypsin (Promega, Madison, WI), Lys-C, or Asp-N (both from Roche Molecular Biochemicals)). Digestions were performed according to the procedure described by Shevchenko et al. (13) either in ordinary water or else in a 1:1 H2 16O:H2 18O mixture (14, 15, 16) prepared from 98% H2 18O (Isotec, Miamisburg, OH) and ordinary water. Unless otherwise noted, all other chemicals were purchased from Sigma. The extracts containing the peptide mixture were lyophilized and resuspended in 5.5 μl of 0.5% trifluoroacetic acit in water, then 0.5 μl of the resulting sample was mixed 1:1 with 2,5-dihydroxybenzoic acid (150 mg/ml in water:acetonitrile 1:1) matrix solution and deposited on the gold surface of a matrix-assisted laser desorption/ionization (MALDI) target. The remaining 5 μl was separated into fractions by micro-high-performance liquid chromatography (μHPLC), and the individual fractions were deposited on a target for subsequent mass spectrometric analysis.

Chromatography—

Chromatographic separations were performed using an Agilent 1100 Series system (Agilent Technologies, Wilmington, DE). Deionized (18 mΩ) water and HPLC-grade acetonitrile were used for the preparation of eluents. Samples (5 μl) were injected onto a 150 μm × 150 mm column (Vydac 218 TP C18, 5μ; Grace Vydac, Hesperia, CA) and eluted with a linear gradient of 1–80% acetonitrile (0.1% trifluoroacetic acid) in 60 min. The column effluent (4 μl/min) was mixed on-line with dihydroxybenzoic acid matrix solution (0.5 μl/min) and deposited by a small computer-controlled robot onto a movable gold target at 1-min intervals (17). The vast majority of the tryptic fragments were eluted within 40 min under the HPLC condition used, so 40 fractions were normally collected.

Glycoprotein Analysis—

Our original intention was to postpone any detailed analysis of the higher mass protein to a subsequent investigation. Later, when we decided to include this effort in the present measurements, the only materials that we had available were two lyophilized samples from digests of the larger protein (∼180-kDa band), one from a tryptic digest and one from a Lys-C digest. The sample from the tryptic digest was separated by HPLC and used for analysis of the glycosylated peptides. The sample from the Lys-C digest was digested twice more, first by peptide N-glycosidase F (PNGase F; Roche Molecular Biochemicals) to remove the asparagine-linked glycosylation (18), then by trypsin to produce smaller fragments (both digestions in ordinary water).

TOF Mass Spectrometry—

The spots on the gold targets were analyzed individually, both by single mass spectrometry (MS) and by tandem mass spectrometry (MS/MS) in the Manitoba/Sciex prototype quadrupole/TOF (QqTOF) mass spectrometer (subsequent commercial model sold as QSTAR by Applied Biosystems/MDS Sciex, Foster City, CA) (19). In this instrument, ions are produced by irradiation of the target with photon pulses from a 20-Hz nitrogen laser (VSL 337ND, Spectra-Physics, Mountain View, CA) with 300 μJ energy per pulse. Orthogonal injection of ions from the quadrupole into the TOF section normally produce a mass resolving power 10,000 full-width half-maximum and accuracy within a few millidaltons in the TOF spectra in both MS and MS/MS modes, as long as the ion peak is reasonably intense.

RESULTS

Mass Spectra from Proteolytic Digests of the ∼46-kDa Protein—

Fig. 2 A shows the m/z spectrum of the mixture of peptides resulting from tryptic digestion of the ∼46-kDa protein in ordinary water, before HPLC fractionation. Note that T indicates a tryptic fragment containing amino acid residues x to y in Fig. 2 and in subsequent tables and discussion. A small region of this spectrum is expanded in Fig. 2 B, and an HPLC fraction containing some of the same ions is shown in Fig. 2 C. Here, the most intense ion in Fig. 2 B has moved to a different fraction, but some of the weaker ions are much more prominent. It is clear that individual peptide peaks are considerably easier to distinguish after HPLC separation; spectra of the fractions are dramatically simpler and have a signal-to-noise ratio improved by a factor ∼10 or more.
F

A, Single MS MALDI-QqTOF spectrum of the peptide mixture obtained from tryptic digestion of the 46-kDa protein prior to HPLC fractionation. B, Expanded view of a small section of the MALDI mass spectrum in A. C, The same section of the spectrum obtained from fraction 23, after HPLC separation of the mixture. The peak labels indicate the residue numbers corresponding to the intact protein; in one case loss of 64 Da is indicated. The intense peak corresponding to T277–293 in the mixture is absent in fraction 23 (it elutes in fraction 21), but several weaker peaks that are present, like T389–405, are significantly enhanced by the HPLC. The improvement helps to identify them and is essential for high mass accuracy and for subsequent MS/MS analysis. Measured and predicted masses for all the tryptic peptides can be found in Table II; Δm is less than 10 mDa in nearly every case.

A, Single MS MALDI-QqTOF spectrum of the peptide mixture obtained from tryptic digestion of the 46-kDa protein prior to HPLC fractionation. B, Expanded view of a small section of the MALDI mass spectrum in A. C, The same section of the spectrum obtained from fraction 23, after HPLC separation of the mixture. The peak labels indicate the residue numbers corresponding to the intact protein; in one case loss of 64 Da is indicated. The intense peak corresponding to T277–293 in the mixture is absent in fraction 23 (it elutes in fraction 21), but several weaker peaks that are present, like T389–405, are significantly enhanced by the HPLC. The improvement helps to identify them and is essential for high mass accuracy and for subsequent MS/MS analysis. Measured and predicted masses for all the tryptic peptides can be found in Table II; Δm is less than 10 mDa in nearly every case.
T

Measured m/z and calculated MH

M* represents oxidized methionine residues; Ac-SDN...is the acetylated N-terminal of the protein; Q** represents the N-terminal Gln residues converted into pyro-Glu; and N*** represents an Asn residue converted into Asp due to deamidation

m/z measuredMH+ calculated (Da)Δm (mDa)Residues start-endPeptide sequence
601.303601.305−3204–209GNSPAR
601.322601.3311103–107ELSPR
698.357698.358−1144–149DHIGTR
708.330708.335−5 96–102GGDGKM*K
711.333711.3312294–299QGTDYK
746.387746.3834356–361HIDAYK
 749.354749.3540178–185GGSQASSR
805.378805.380−2196–203NSTPGSSR
831.462831.4575227–233LNQLESK
876.452876.461−9101–107M*KELSPR
886.401886.406−5170–177GFYAEGSR
916.478916.4780362–369TFPPTEPK
928.543928.546−3348–355DNVILLNK
946.513946.5112 62–68EELRFPR
1105.5481105.553−5339–347LDDKDPQFK
1144.4931144.499−61–10Ac-SDNGPQSNQR
1154.5791154.580−1376–385TDEAQPLPQR
1166.5571166.559−2267–276Q**YNVTQAFGR
1183.5891183.5863267–276QYNVTQAFGR
1202.6101202.613−3238–248GQQQQGQTVTK
1282.6781282.6753375–385KTDEAQPLPQR
1330.6981330.708−10238–249GQQQQGQTVTKK
1410.7741410.7713376–387KKTDEAQPLPQR
1611.6981611.6926406–421QLQNSM*SGASADSTQA
1684.8951684.8914128–143EGIVWVATEGALNTPK
1687.8981687.905−7210–226MASGGGETALALLLLDR
1703.8971703.900−3210–226M*ASGGGETALALLLLDR
1774.8381774.8362278–293GPEQTQGNFGDQDLIR
1850.8331850.8276 15–32ITFGGPTDSTDNNQNGGR
1851.8141850.8113 15–32ITFGGPTDSTDNNQN***GGR
1875.8791875.8790389–405Q**PTVTLLPAADM*DDFSR
1892.9051892.906−1389–405QPTVTLLPAADM*DDFSR
1930.9441930.9377277–293RGPEQTQGNFGDQDLIR
2005.0082005.0062388–405KQPTVTLLPAADMDDFSR
2015.0812015.0810339–355LDDKDPQFKDNVILLNK
2021.0052021.0014388–405KQPTVTLLPAADM*DDFSR
2077.0482077.0435320–338IGM*EVTPSGTWLTYHGAIK
2091.1262091.120−2150–169NPNNNAATVLQLPQGTTLPK
2151.0022151.0106 69–88GQGVPINTNSGPDDQIGYYR
2252.0622252.071−9300–319HWPQIAQFAPSASAFFGMSR
2297.0782297.092−14108–127WYFYYLGTGPEASLPYGANK
2307.1282307.11117 69–89GQGVPINTNSGPDDQIGYYRR
2324.1872324.190−3 41–61RPQGLPNNTASWFTALTQHGK
2516.3252516.339−14210–233M*ASGGGETALALLLLDRLNQLESK

Initial efforts to identify the protein (based on data base searching against the peptide fingerprint), failed to yield any significant matches, suggesting that it was a novel protein. De novo peptide sequencing was therefore undertaken in order to characterize it. For this purpose, samples were digested in the presence of a 50/50 mixture of ordinary water and H2 18O, as described above, because the addition of either 18O or 16O during enzymatic cleavage yields spectra containing both species and thus distinguishes fragments containing the C terminus from those containing the N terminus by their distinctive isotopic patterns (14, 15, 16). In order to determine the amino acid sequence of the proteolytic fragments, each clearly observed peptide ion was selected in turn as a parent by the mass-selecting quadrupole of the QqTOF instrument and subjected to collisionally induced dissociation in the collision cell. For example, the resulting daughter ion spectrum from the m/z = 2297 parent is shown in Fig. 3 , where the advantages of the 16O/18O addition for distinguishing the C- and N-terminal ions are clearly evident. The y ions, which contain the C terminus, all show the doublet structure superimposed on the usual isotopic pattern, whereas the b ions, containing the N terminus, have a normal pattern. A comparison between the measured m/z values and the masses calculated from the deduced sequence is given in Table I .
F

MS/MS spectrum of tryptic fragment of The complete spectrum is shown in A, with the amino acid sequence indicated between the y-series fragments. An example of a b-series fragment is shown in B, and of a y-series fragment in C. The signature isotopic pattern of fragments containing the C terminus is visible in C. The measured and predicted masses for all identified peaks are shown in Table I.

T

Calculated and measured masses for b and y ions from MS/MS measurements of the 2297.092-Da tryptic fragment

y-ionm/z foundMH+ calculatedΔm (mDa)ResidueΔm (mDa)m/z foundMH+ calculatedb-ion
 y1147.113K22279.0832279.081b20
 y2261.138261.156−18N2150.986b19
 y3332.193332.1930A2036.943b18
 y4389.197389.215−18G1965.906b17
 y5552.283552.2785Y01908.8851908.885b16
 y6649.328649.331−3P−131745.8091745.822b15
 y7762.413762.415−2L211648.7901648.769b14
 y8849.438849.447−9S81535.6941535.685b13
 y9920.478920.484−6A−151448.6381448.653b12
y101049.5221049.526−4E191377.6371377.616b11
y111146.5711146.579−8P1248.573b10
y121203.5921203.601−9G−41151.5161151.520 b9
y131304.6361304.648−8T−51094.4941094.499 b8
y141361.6541361.670−16G1993.452993.451 b7
y151474.7391474.754−15L−21936.409936.430 b6
y161637.8181637.8171Y11823.357823.346 b5
y171800.8841800.8813Y9660.291660.282 b4
y181947.9701947.94921F1497.220497.219 b3
y192111.0222111.01210Y−5350.145350.150 b2
y202297.0862297.092−6W187.087 b1

MS/MS spectrum of tryptic fragment of The complete spectrum is shown in A, with the amino acid sequence indicated between the y-series fragments. An example of a b-series fragment is shown in B, and of a y-series fragment in C. The signature isotopic pattern of fragments containing the C terminus is visible in C. The measured and predicted masses for all identified peaks are shown in Table I. Calculated and measured masses for b and y ions from MS/MS measurements of the 2297.092-Da tryptic fragment Further examples are provided in the supplemental material. Fig. S1 shows the daughter ions from dissociation of the 1144-Da N-terminal peptide, indicating deletion of the N-terminal methionine and acetylation of the resulting N-terminal serine. Fig. S2 shows a comparison between HPLC-separated ions from tryptic and Lys-C digestions, respectively, showing alternate cleavages at adjacent lysines. Fig. S3 shows a spectrum of the parent ion containing the C terminus, the one C-terminal peptide that shows no doublet structure. A comparison of experimental m/z values and masses calculated for the deduced sequences of all the peptides observed in tryptic digests is given in Table II . In both Tables I and II, most observed m/z values and the masses calculated for the deduced amino acid sequences agree within ∼10 mDa, lending credibility to the assignments; the anomalously high values observed for a few ions in Table I correspond to peaks of very low intensity. Measured m/z and calculated MH M* represents oxidized methionine residues; Ac-SDN...is the acetylated N-terminal of the protein; Q** represents the N-terminal Gln residues converted into pyro-Glu; and N*** represents an Asn residue converted into Asp due to deamidation The MS and MS/MS measurements just described were applied first to the peptides resulting from tryptic digests of the gel band, listed in Table I, and then to the products of a Lys-C digest. BLAST searching (20, 21) of the total GenBank™ protein data base with these peptides was then undertaken in order to search for homology. The most definitive example was provided by the 2297-Da tryptic peptide. In that case, the highest rated results of the BLAST search are shown in Fig. 4 ; all are coronavirus nucleocapsid proteins, and all yield BLAST scores of 40 to 41, with E values of 0.003. Moreover, the highest rated hit in the BLAST search that is not a coronavirus protein (a bacterial protein, in this case) had a score of only 29 and a high E value of 9.4. Thus, the ∼46-kDa protein is clearly a coronavirus nucleocapsid protein; indeed, there is complete agreement between the first 10 residues and those found by BLAST in a region of the coronaviruses that is highly conserved. On the other hand, only three out of the next nine residues agree with any of the other viruses, so the SARS virus is significantly different from any of the other coronaviruses. BLAST searches with the other peptides led to similar conclusions; in particular, they strengthened the evidence for significant differences between the SARS coronavirus and any other coronavirus in the data base.
F

Comparison of the amino acid sequence of the 2297-Da peptide (deduced The shaded regions indicate areas of identity.

Comparison of the amino acid sequence of the 2297-Da peptide (deduced The shaded regions indicate areas of identity. By April 12, these measurements had been carried out and most of them analyzed, yielding almost complete sequence information on the individual peptides, as summarized in Table II. The task of fitting together the peptides was not yet done, however, because there were still a number of ambiguities in their order. To sort out this problem, an Asp-N digestion had also been carried out (but not yet separated on the HPLC), and Glu-C and perhaps Arg-C digestions were planned as soon as sufficient material was available. However, these measurements turned out to be unnecessary, because at that stage a nucleotide sequence of infectious material (also prepared by NML), was obtained by a group at the Michael Smith Genome Centre in Vancouver (4) (GenBank™ accession number AY274119), soon followed by similar results from several other laboratories (see for example Ref. 5). It soon became clear to us that the open reading frame identified by the Vancouver group as specifying the coronavirus nucleocapsid protein did in fact predict the amino acid sequence of the ∼46-kDa protein that we were analyzing, as might be expected from the BLAST homology reported above. Consequently, we were able to remove the remaining ambiguities in ordering the proteolytic fragments listed in Table II. A comparison of our results with the predicted sequence is shown in Fig. S4A; the mass spectral data cover more than 96% of the predicted sequence and include both C and N termini. The mass spectra also indicate removal of the N-terminal methionine and oxidation of all other methionines, as well as acetylation of the resulting N-terminal serine, as shown in Fig. S1. The N-terminal deletion and acetylation presumably occur as a result of post-translational modifications (22), which of course could not be predicted by the nucleotide data. Otherwise, our results confirm the predicted sequence (GenBank™ accession number AY274119), a result consistent with the samples being derived from the same infectious source at NML.

Mass Spectra from Proteolytic Digests of the Spike Protein—

In addition to the almost completely defined ∼46-kDa protein, we have partially characterized a protein that appeared as a very weak band at an apparent mass of ∼180 kDa in the gel separation (Fig. 1 B). Despite the low intensity, 39 peptides in the initial tryptic digest were found to belong to the ∼139-kDa “spike protein” predicted by the nucleotide sequence (GenBank™ accession number AY274119), and 36 of these were sufficiently intense for MS/MS measurements, which confirmed the identification (30% coverage). A summary of the data and the coverage for this protein is given in Table S1. This protein is homologous to spike proteins in other coronaviruses, which contain a large number of potential glycosylation sites (NXT or NXS). Thus, they are usually assumed to be extensively glycosylated and to act as attachment proteins. Indeed, the predicted sequence of the spike protein of the SARS coronavirus contains 23 of these potential N-glycosylation sites, of which 17 are identified as likely sites by the Netglyc 1.0 server (available at www.cbs.dtu.dk/services/NetNGlyc). (O-glycosylation may also be possible, but has not been examined here.) To investigate glycosylation in the spike protein, a tryptic digest was treated with PNGase F to remove the glycans, as described above. This step converts asparagine residues to aspartic acids, thus specifying the corresponding deglycosylated peptides through observation of their mass differences of 0.984 Da per deglycosylated site from the values calculated from the predicted amino acid sequence. This procedure identified nine glycopeptides from observation of their deglycosylated products (Table III ) and raised the sequence coverage to 42%. MS/MS measurements on the deglycosylated peptides confirmed the predicted single N-glycosylation sites and showed that T111–126, T316–333, and T1140–1163 had two glycosylation sites each (Table III). For example, PNGase digestion produced two distinct deglycosylated peptides for T111–126, with molecular ions at m/z 1758 and 1759. MS/MS measurements on the m/z 1758 ion revealed that the parent was glycosylated on Asn119 only, but similar measurements on the m/z 1759 ion showed that both Asn118 and Asn119 were glycosylated in this parent. Another example is presented in Fig. S5, which shows MS and MS/MS spectra of the deglycosylated peptide T222–232, although this experiment did not produce details on the exact nature and composition of the N-glycans. We note that in these measurements no peptides were observed containing possible sites that were not glycosylated, suggesting that some of the other sites may also be modified by glycosylation.
T

Deglycosylated peptides found in PNGase F-treated tryptic digest of 139-kDa spike protein

Bold N symbols represent deglycosylated sites.

PeptideSequenceCalc. [M+H]+ with no glycosylationm/z measured after PNGaseΔmNo. of sites
T111–126SQSVIIINNSTNVVIR SQSVIIINNSTNVVIR1756.9921757.973 1758.9630.981 1.9711 2
T222–232LPLGINITNFR1257.7321258.7150.9831
T226–287YDENGTITDAV DCSQNPLAELK2453.1142454.1010.9871
T316–333FPNITNLCPF GEVFNATK2069.0172071.0011.9842
T778–796YFGGFNFSQ ILPDPLKPTK2169.1382170.1210.9831
T1056–1068NFTTAPAICHEGK1445.6851446.6720.9871
T1074–1089EGVFVFNG TSWFITQR1887.9391888.9310.9921
T1140–1163NHTSPDVDLGDI SGINASVVNIQK2493.2592495.231.9712
T1174–1187NLNESLIDLQELGK1585.8441586.8230.9791

Deglycosylated peptides found in PNGase F-treated tryptic digest of 139-kDa spike protein Bold N symbols represent deglycosylated sites. The tryptic digest of the spike protein without PNGase F deglycosylation yielded spectra of four of these glycopeptides that were intense enough (barely) for detailed analysis, as summarized in Table IV . Here the relatively large Δm values for the carbohydrate residues originate from peak distortions due to the low intensity. Problems in T226–287 were especially bothersome; there several glycoforms (indicated by asterisks in Table IV) were detected but could not be measured accurately because of a combination of weak signal and overlap with the 18O peak resulting from the previous labeling of the C terminus (see “Experimental Procedures”). Microheterogeneity was observed in all four glycopeptides, and each glycoform was assigned an N-linked glycan composition. Two peptides, T222–232 and T778–796, were found to have high-mannose substitution MxNy, whereas the other two (T111–126 and T226–287) showed complex glycan structures MxNyGzF; (M, mannose; N, N-acetylglucosamine; G, galactose; F, fucose), similar to the pattern observed in the bovine coronavirus hemagglutinin protein (23). These compositions each encompass more than one isomeric structure; some examples are given in Fig. 5 D. We note that some observed glycoforms may result from in-source fragmentation, which could be reduced by the use of electrospray ionization on the QqTOF instrument (24, 25) rather than MALDI, although the results suggest that such fragmentation is not an important factor in the present case (see below).
T

Glycosylated peptides found in tryptic digest of 139-kDa spike protein without treatment with PNGase F

Peptide symbols: N, first glycosylation site; N, second glycosylation site. Carbohydrate composition symbols: M, mannose; F, fucose; G, galactose; N, N-acetylglucosamine. *, m/z values could not be determined accurately (see text). **, Δm, measured - calculated mass.

PeptideSequenceCalc., [M+H]+m/zm/z measured glycopeptideResidual carbohydrate massCarbohydrate compositionCalculated carbohydrate massΔm**MS/MS glycopeptide
T111–126SQSVIIIN1756.9923201.5141444.522M3N4F1444.534−0.012
NSTNVV3404.5721647.580M3N5F1647.613−0.033
IR3525.6691768.677M3N4G2F1768.6400.037
3566.6701809.678M3N5GF1809.6660.012
3607.6821850.690M3N6F1850.693−0.003
3728.6871971.695M3N5G2F1971.719−0.024
3769.6782012.686M3N6GF2012.745−0.059
3890.7612133.769M3N5G3F2133.772−0.003
3931.7932174.801M3N6G2F2174.7980.030
4093.8182336.826M3N6G3F2336.859−0.025
4255.8442498.852M3N6G4F2498.9040.052
4296.9052539.913M3N7G3F2539.930−0.017
T222–232LPLGINI1257.7322636.2151378.483M6N21378.4760.007
TNFR2798.261540.528M7N21540.5280.000
2960.3061702.574M8N21702.581−0.007
3122.3731864.641M9N21864.6340.007
T226–287YDENGT2453.1144100.6661647.552M3N5F1647.613−0.061
ITDAVD4303.7351850.621M3N6F1850.6930.072
CSQNPL4427.1*1974.0M3N5FG21971.719
AELK4468.9*2015.8M3N6FG2012.745
4589.1*2136.0M3N5G3F2133.772
4792.2*2339.1M3N6G3F2336.859
4953.9*2500.8M3N6G4F2498.904
T778–796YFGGFN2169.1383385.591216.452M5N21216.423−0.029
FSQILPD3547.6441378.506M6N21378.476−0.030
PLKPTK3709.7381540.600M7N21540.529−0.071
3871.7021702.564M8N21702.5810.017

F

Single MS (A and B) and MS/MS (C) detection of glycosylated tryptic peptides of the 139-kDa spike protein. A, MS spectrum of HPLC fraction 21 of tryptic digest of the 139-kDa band. Labeled peaks are the monoisotopic [M+H]+ ions of glycosylated forms of T111–126 (see Table III). B, MS spectrum of HPLC fraction 25 showing glycosylated forms of T222–232. C, MS/MS spectra of the 3122.373-Da peak from B. D, Suggested high-mannose and complex N-glycan structures, emphasizing possible diglycosylation of T111–126.

Glycosylated peptides found in tryptic digest of 139-kDa spike protein without treatment with PNGase F Peptide symbols: N, first glycosylation site; N, second glycosylation site. Carbohydrate composition symbols: M, mannose; F, fucose; G, galactose; N, N-acetylglucosamine. *, m/z values could not be determined accurately (see text). **, Δm, measured - calculated mass. Single MS (A and B) and MS/MS (C) detection of glycosylated tryptic peptides of the 139-kDa spike protein. A, MS spectrum of HPLC fraction 21 of tryptic digest of the 139-kDa band. Labeled peaks are the monoisotopic [M+H]+ ions of glycosylated forms of T111–126 (see Table III). B, MS spectrum of HPLC fraction 25 showing glycosylated forms of T222–232. C, MS/MS spectra of the 3122.373-Da peak from B. D, Suggested high-mannose and complex N-glycan structures, emphasizing possible diglycosylation of T111–126. Fig. 5 A shows the MALDI mass spectrum of the HPLC fraction containing glycosylated T111–126. This spectrum is interpreted in Table III as containing either one or two glycosylation sites; peaks between m/z 3000 and 3800 show one possible glycosylation site, because compositions allow only one trimannosyl core, but those higher than m/z 3800 possibly correspond to diglycosylated T111–126. Here it is likely that both Asn119 and Asn118 are glycosylated with complex structures, with fucosylation on only one of them, as illustrated by an example in Fig. 5 D. In Fig. 5 B, MS of the glycosylated T222–232 HPLC fraction clearly highlights the presence of high-mannose structures. Here the predominance of (Man)2(GlcNAc)9, the highest possible form of N-linked high-mannose oligosaccharide, suggests that there is little in-source fragmentation. Fig. 5 C is the tandem mass spectrum of T222–232 with a (Man)2(GlcNAc)9 attachment. This spectrum shows losses of one to five mannose residues, loss of the whole oligosaccharide moiety (m/z 1257.733), loss of the whole moiety minus one GlcNAc (m/z 1460.808), and loss of the carbohydrate residue via a cross-ring cleavage (m/z 1340.768). MS/MS analysis of m/z 3404.572, 3607.682, and 4093.818 ions confirmed the presence of complex glycan structures from observed losses of Gal-GlcNAc moieties (data not shown). MS/MS was also performed on glycoforms of T226–287 and T778–796, with results consistent with the suggested complex glycan compositions in Table IV (spectra not shown). All MS/MS spectra recorded in this study showed that the preferred fragmentation mode was loss of the entire oligosaccharide rather than loss of one residue at a time, which again argues against extensive in-source fragmentation. Other candidate peptides were sought in their possible glycosylated forms but were not detected, perhaps because of the low amount of sample. Alternatively, they may not be detectable as positive ions at these low sample levels because of the presence of negatively charged sialic acids; it has been shown that sialylation has a detrimental effect on positive mode ionization, at least in the case of free N-linked oligosaccharides (26, 27, 28). Indeed, the several galactosylated complex oligosaccharide compositions found in this study suggest the undetected presence of sialic acid, because the latter compound attaches to terminal galactose in such structures. The glycosylation study conducted here is only preliminary and will be followed by more detailed structural analyses involving glycan release, labeling, and MALDI and electrospray MS. A complementary experiment could also involve exoglycosidase digestions of HPLC fractions containing the glycopeptides. Stimson et al. (24) have already shown that detailed structural analysis of glycans may be conducted on low femtomole amounts of glycopeptides from murine prion proteins by a combination of exoglycosidases and electrospray MS.

DISCUSSION

The Nucleocapsid Protein N—

Comparison of deduced amino acid sequences of different coronavirus N proteins revealed only ∼32% identity between the SARS-related coronavirus and known viruses from the three coronavirus clusters. Correspondingly, the phylogenetic tree (29) of the N protein-deduced amino acid sequences indicated that the SARS-related virus is only distantly related to any of the other clusters (Fig. S4B). The evolutionary distance between the viruses, based on this phylogenetic tree analysis, makes it difficult to speculate about the origin of the SARS virus, although recent reports in the media have implicated various wild animals that are used for food in Guangdong, particularly the civet cat, whose genome was not in the data base. Despite the striking heterogeneity of the SARS corona N protein when compared with other corona nucleoproteins, certain domains seem functionally conserved (30). The SR-rich region of SARS N protein resembles that of murine and bovine coronaviruses; in a short stretch of 36 residues (amino acids 176–212) it contains 14 serines and 7 arginines. The amino acid sequence in this region is highly variable among coronaviruses except for a core motif SRXX for which double or triple repeats are a distinguishable feature among all coronavirus N proteins. This region has been mapped as the RNA binding domain of the N protein. An intriguing feature of SARS N protein is that it contains five SRXX motifs (see shaded amino acid sequence in Fig. S4A); whether that will translate into much higher RNA binding activity remains to be seen. However, this finding supports the concept of a conserved function within the SR-rich domain. The appearance of a shorter form of the N protein late in infection has been observed with transmissible gastroenteritis, mouse hepatitis virus, feline infections peritonitis virus, bovine coronavirus, avian infectious bronchitis virus, and turkey coronavirus (∼2 to 5 kDa less) in cell culture. It has been demonstrated that host cell caspases, which are activated during coronavirus infection, are responsible for this cleavage (31). A common caspase cleavage motif is present in all of the mentioned coronavirus N proteins. Furthermore, the accumulation of the shortened N protein form was correlated with a reduction in virus production by a factor of ∼100. These observations suggest that cleavage of viral nucleocapsid protein by host cell caspases could be a general mechanism by which infected cells eliminate coronaviruses. Interestingly, no caspase cleavage motif is present in the SARS-related coronavirus N protein.

The Spike Protein S—

The spike protein is a major target of the cellular immune response to coronaviruses and plays an important role in the initial stages of infection. It mediates the attachment of the virus to the cell surface receptors and induces the fusion of the viral and cellular membranes. The importance of N-glycosylation of the attachment proteins has often been highlighted in virus-receptor interactions in several types of virus: In influenza C and A viruses myxoviruses, Rosenthal et al. showed that N-glycosylation of the hemagglutinin-esterase-fusion proteins can have dramatic effects on immune escape, virulence, and interactions with cellular receptors (32). The hemagglutinin components have been shown to interact with sialic acid moieties on the receptors, and it is known that neuraminidase inhibitors inhibit the replication of influenza viruses A and B (33). Hepatitis viruses have been shown to bud into the endoplasmic reticulum and depend on N-glycosylation of coat proteins to form infectious virus particles (34, 35). Dwek and his colleagues investigated the effect of deoxynojirimycin, an alpha-glucosidase inhibitor, and found that it blocks oligosaccharide processing after monoglucosylation of Asn sites. The glucosylated proteins were shown to misfold. Even at very low inhibitor concentration, viral titers dropped by nearly 100-fold (36). Studies in animals showed that deoxynojirimycin had a negligible effect on host glycosylation (37), and thus drugs such as this alpha-glucosidase inhibitor are seen as good candidates for treatment of hepatitis B. Hepatitis C virus may also respond to these inhibitors (38). However, similar studies using sugar inhibitors on HIV, which has many N-linked sites, showed less sensitivity to misglycosylation (39). Rossen and coworkers modified the N-glycosylation characteristics of coronavirus spike proteins in cultured epithelial cells and found that N-glycans had an important impact on virus formation and behavior. For example, inhibition of spike N-glycosylation by tunamycin, which inhibits the synthesis of N-glycans, resulted in the synthesis of spikeless virions (40). The same authors also discussed the implications of N-glycosylation of hemagglutinin proteins of epithelial cell coronaviruses (40). We note, however, that the SARS-associated coronavirus genome sequence does not contain a gene encoding hemagglutinin or large genes derived from another virus or host cell (4), although is believed that host range, tissue tropism, and virulence of animal coronaviruses can be changed by mutating the S gene, thus modifying the S proteins (12, 41). It has been shown that sialic acid plays important roles in host-receptor interactions. We therefore plan to study the exact compositions of spike N-linked glycans after detachment by PNGase, because the sialic acid content of a glycoprotein can be determined in an isolated oligosaccharide pool. The study of N-linked glycan structures by MS is well documented (see, for example, Ref. 42), and established methods are available to conduct such analyses.

Possible Therapeutic Applications—

The present studies provide the first description of the actual proteins derived from the novel coronavirus thought to be the etiologic agent of SARS. Similar to the pattern observed with animal coronaviruses (43), the 46-kDa nucleoprotein appears to be the major immunogenic antigen, as it was the only viral protein recognized by acute and early convalescent sera from several patients recovering from SARS. While the immune response to the nucleoprotein could serve as an early diagnostic marker for infection, it is unlikely that an immune response to this protein offers protection, because it is an internal protein and neutralizing antibodies are more likely to target the surface proteins (12). However, it has been shown for other coronaviruses that some antigenic peptides of the N protein can be recognized on the surface of infected cells by T cells (12). The spike glycoprotein is certainly a surface protein, so it may offer an attractive target. Although no drugs with proven efficacy against coronaviruses are currently approved, potential targets exist for new drugs. For example, protease inhibitors could prevent processing of the RNA polymerase or cleavage of the viral S glycoprotein. Finding antibodies against the S glycoprotein or against the unidentified SARS coronavirus receptor are also possible routes to take), and the use of glycosylation inhibitors that have minimal effects on host cells would be an interesting approach (36). Very recently, an important contribution by Hilgenfeld et al. outlined a plan for drug design based on inhibition of the viral main proteinase, called Mpro or 3CLpro, which controls the activities of the coronavirus replication complex (44). Ideas for development of vaccines against SARS also include the use of killed or subunit vaccines containing the spike glycoprotein together with other viral proteins.

Why Analyze the Proteins?—

The application of de novo sequencing by MS provides an alternative to the usual genomic approach for protein identification. It has the advantage of distinguishing the actual proteins expressed from those that are simply hypothesized or predicted by the nucleotide sequence. It may also be useful to realize that questions of homology can be investigated by examining protein proteolytic fragments even in the complete absence of genome information. Indeed, the results of the BLAST search (Fig. 4), and the conclusion that the 47-kDa protein that NML had isolated was a nucleocapsid protein belonging to an extensively modified coronavirus, were reported at a meeting of the local participants in this investigation on April 3, 2003, more than a week before the nucleotide sequence became available. Even when the nucleotide sequence is available, analysis of the proteins (which is much easier in that case), provides significant complementary information, particularly on post-translational modifications (22). The relatively minor modifications observed in the nucleocapsid protein are not particularly newsworthy, but we nevertheless believe that the result is useful in ruling out (probably) such modifications as an explanation of the unusual properties of the virus. However, glycosylation in the SARS spike protein, first investigated here, is more exciting; it is likely to play a key role in attachment of the virus to cell surface receptors, and therefore may have important therapeutic applications, as pointed out above.
  40 in total

1.  Mass spectrometry in viral proteomics.

Authors:  J J Thomas; R Bakhtiar; G Siuzdak
Journal:  Acc Chem Res       Date:  2000-03       Impact factor: 22.384

2.  Identification of novel sites of O-N-acetylglucosamine modification of serum response factor using quadrupole time-of-flight mass spectrometry.

Authors:  Robert J Chalkley; A L Burlingame
Journal:  Mol Cell Proteomics       Date:  2003-04-08       Impact factor: 5.911

3.  The role of N-linked glycosylation in the secretion of hepatitis B virus.

Authors:  A Mehta; T M Block; R A Dwek
Journal:  Adv Exp Med Biol       Date:  1998       Impact factor: 2.622

4.  Identification of severe acute respiratory syndrome in Canada.

Authors:  Susan M Poutanen; Donald E Low; Bonnie Henry; Sandy Finkelstein; David Rose; Karen Green; Raymond Tellier; Ryan Draker; Dena Adachi; Melissa Ayers; Adrienne K Chan; Danuta M Skowronski; Irving Salit; Andrew E Simor; Arthur S Slutsky; Patrick W Doyle; Mel Krajden; Martin Petric; Robert C Brunham; Allison J McGeer
Journal:  N Engl J Med       Date:  2003-03-31       Impact factor: 91.245

5.  A study of immunoglobulin G glycosylation in monoclonal and polyclonal species by electrospray and matrix-assisted laser desorption/ionization mass spectrometry.

Authors:  Julian A Saba; Jeremy P Kunkel; David C H Jan; Werner E Ens; Kenneth G Standing; Michael Butler; James C Jamieson; Hélène Perreault
Journal:  Anal Biochem       Date:  2002-06-01       Impact factor: 3.365

Review 6.  Balancing N-linked glycosylation to avoid disease.

Authors:  H H Freeze; V Westphal
Journal:  Biochimie       Date:  2001-08       Impact factor: 4.079

7.  The viral spike protein is not involved in the polarized sorting of coronaviruses in epithelial cells.

Authors:  J W Rossen; R de Beer; G J Godeke; M J Raamsman; M C Horzinek; H Vennema; P J Rottier
Journal:  J Virol       Date:  1998-01       Impact factor: 5.103

8.  Clinical features and short-term outcomes of 144 patients with SARS in the greater Toronto area.

Authors:  Christopher M Booth; Larissa M Matukas; George A Tomlinson; Anita R Rachlis; David B Rose; Hy A Dwosh; Sharon L Walmsley; Tony Mazzulli; Monica Avendano; Peter Derkach; Issa E Ephtimios; Ian Kitai; Barbara D Mederski; Steven B Shadowitz; Wayne L Gold; Laura A Hawryluck; Elizabeth Rea; Jordan S Chenkin; David W Cescon; Susan M Poutanen; Allan S Detsky
Journal:  JAMA       Date:  2003-05-06       Impact factor: 56.272

9.  Evaluation of antibody response to canine coronavirus infection in dogs by Western Blotting analysis.

Authors:  G Elia; N Decaro; A Tinelli; V Martella; A Pratelli; C Buonavoglia
Journal:  New Microbiol       Date:  2002-07       Impact factor: 2.479

10.  Structure of the haemagglutinin-esterase-fusion glycoprotein of influenza C virus.

Authors:  P B Rosenthal; X Zhang; F Formanowski; W Fitz; C H Wong; H Meier-Ewert; J J Skehel; D C Wiley
Journal:  Nature       Date:  1998-11-05       Impact factor: 49.962

View more
  93 in total

1.  Characterization of severe acute respiratory syndrome-associated coronavirus (SARS-CoV) spike glycoprotein-mediated viral entry.

Authors:  Graham Simmons; Jacqueline D Reeves; Andrew J Rennekamp; Sean M Amberg; Andrew J Piefer; Paul Bates
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-09       Impact factor: 11.205

2.  Use of the COOH portion of the nucleocapsid protein in an antigen-capturing enzyme-linked immunosorbent assay for specific and sensitive detection of severe acute respiratory syndrome coronavirus.

Authors:  Maofeng Qiu; Jin Wang; Hongxia Wang; Zeliang Chen; Erhei Dai; Zhaobiao Guo; Xiaoyi Wang; Xin Pang; Baoxing Fan; Jie Wen; Jian Wang; Ruifu Yang
Journal:  Clin Diagn Lab Immunol       Date:  2005-03

Review 3.  The molecular biology of coronaviruses.

Authors:  Paul S Masters
Journal:  Adv Virus Res       Date:  2006       Impact factor: 9.937

4.  MALDI QqTOF MS combined with off-line HPLC for characterization of protein primary structure and post-translational modifications.

Authors:  Oleg V Krokhin; Werner Ens; Kenneth G Standing
Journal:  J Biomol Tech       Date:  2005-12

5.  Retrospective serological investigation of severe acute respiratory syndrome coronavirus antibodies in recruits from mainland China.

Authors:  Sumeng Yu; Maofeng Qiu; Zeliang Chen; Xiaobo Ye; Yaling Gao; Aimin Wei; Xiaoyi Wang; Ling Yang; Jin Wang; Jie Wen; Yajun Song; Decui Pei; Erhei Dai; Zhaobiao Guo; Cheng Cao; Jian Wang; Ruifu Yang
Journal:  Clin Diagn Lab Immunol       Date:  2005-04

6.  A single asparagine-linked glycosylation site of the severe acute respiratory syndrome coronavirus spike glycoprotein facilitates inhibition by mannose-binding lectin through multiple mechanisms.

Authors:  Yanchen Zhou; Kai Lu; Susanne Pfefferle; Stephanie Bertram; Ilona Glowacka; Christian Drosten; Stefan Pöhlmann; Graham Simmons
Journal:  J Virol       Date:  2010-06-23       Impact factor: 5.103

7.  Specific asparagine-linked glycosylation sites are critical for DC-SIGN- and L-SIGN-mediated severe acute respiratory syndrome coronavirus entry.

Authors:  Dong P Han; Motashim Lohani; Michael W Cho
Journal:  J Virol       Date:  2007-08-22       Impact factor: 5.103

8.  A novel mechanism for LSECtin binding to Ebola virus surface glycoprotein through truncated glycans.

Authors:  Alex S Powlesland; Tanja Fisch; Maureen E Taylor; David F Smith; Bérangère Tissot; Anne Dell; Stefan Pöhlmann; Kurt Drickamer
Journal:  J Biol Chem       Date:  2007-11-05       Impact factor: 5.157

9.  Comparison of immunoglobulin G responses to the spike and nucleocapsid proteins of severe acute respiratory syndrome (SARS) coronavirus in patients with SARS.

Authors:  Jincun Zhao; Wei Wang; Wenling Wang; Zhendong Zhao; Yan Zhang; Ping Lv; Furong Ren; Xiao-Ming Gao
Journal:  Clin Vaccine Immunol       Date:  2007-05-02

Review 10.  Analysis of carbohydrates and glycoconjugates by matrix-assisted laser desorption/ionization mass spectrometry: An update for 2003-2004.

Authors:  David J Harvey
Journal:  Mass Spectrom Rev       Date:  2009 Mar-Apr       Impact factor: 10.946

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.