| Literature DB >> 34739558 |
Luis Ariel Espinosa1, Yassel Ramos1, Ivan Andújar1, Enso Onill Torres1, Gleysin Cabrera1, Alejandro Martín1, Diamilé Roche1, Glay Chinea1, Mónica Becquet1, Isabel González1, Camila Canaán-Haden1, Elías Nelson1, Gertrudis Rojas2, Beatriz Pérez-Massón2, Dayana Pérez-Martínez2, Tamy Boggiano2, Julio Palacio2, Sum Lai Lozada Chang2, Lourdes Hernández2, Kathya Rashida de la Luz Hernández2, Saloheimo Markku3, Marika Vitikainen3, Yury Valdés-Balbín4, Darielys Santana-Medero4, Daniel G Rivera5, Vicente Vérez-Bencomo4, Mark Emalfarb6, Ronen Tchelet6, Gerardo Guillén1, Miladys Limonta1, Eulogio Pimentel1, Marta Ayala1, Vladimir Besada1, Luis Javier González7.
Abstract
Subunit vaccines based on the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2 provide one of the most promising strategies to fight the COVID-19 pandemic. The detailed characterization of the protein primary structure by mass spectrometry (MS) is mandatory, as described in ICHQ6B guidelines. In this work, several recombinant RBD proteins produced in five expression systems were characterized using a non-conventional protocol known as in-solution buffer-free digestion (BFD). In a single ESI-MS spectrum, BFD allowed very high sequence coverage (≥ 99%) and the detection of highly hydrophilic regions, including very short and hydrophilic peptides (2-8 amino acids), and the His6-tagged C-terminal peptide carrying several post-translational modifications at Cys538 such as cysteinylation, homocysteinylation, glutathionylation, truncated glutathionylation, and cyanylation, among others. The analysis using the conventional digestion protocol allowed lower sequence coverage (80-90%) and did not detect peptides carrying most of the above-mentioned PTMs. The two C-terminal peptides of a dimer [RBD(319-541)-(His)6]2 linked by an intermolecular disulfide bond (Cys538-Cys538) with twelve histidine residues were only detected by BFD. This protocol allows the detection of the four disulfide bonds present in the native RBD, low-abundance scrambling variants, free cysteine residues, O-glycoforms, and incomplete processing of the N-terminal end, if present. Artifacts generated by the in-solution BFD protocol were also characterized. BFD can be easily implemented; it has been applied to the characterization of the active pharmaceutical ingredient of two RBD-based vaccines, and we foresee that it can be also helpful to the characterization of mutated RBDs.Entities:
Keywords: Buffer-free digestion; Hydrophilic peptides; Modified cysteine; RBD; SARS-CoV-2
Mesh:
Substances:
Year: 2021 PMID: 34739558 PMCID: PMC8569510 DOI: 10.1007/s00216-021-03721-w
Source DB: PubMed Journal: Anal Bioanal Chem ISSN: 1618-2642 Impact factor: 4.478
Sequences of the recombinant receptor-binding domain of SARS-CoV-2 characterized in this work
a)The numbers between parentheses correspond to the amino acid positions of the RBD of SARS-CoV-2 (UniprotKB access: P0DTC2)
b)The sequences written in bold correspond to the cloned regions of the RBD of SARS-CoV-2. Sequences written in italics indicate other sequence segments not related to RBD but added to the N- and/or C-terminal end of the protein during the cloning strategy. Cysteines are highlighted in red
c)Two molecules of RBD-CHO are linked by an intermolecular disulfide bond between Cys538-Cys538
Fig. 1A comparison between the in-solution standard digestion (a) and buffer-free digestion [20] (b) protocols for the ESI–MS analysis of the tryptic digests. Black rectangles at the left and right sides of the figure indicate the time required for the individual steps in each protocol. Square boxes at the bottom-left and bottom-right in the figure indicate the total time consumed for each protocol. NEM and IAA mean N-ethylmaleimide and iodoacetamide, respectively
Fig. 2a SDS-PAGE analysis under reducing and non-reducing conditions of N-glycosylated and deglycosylated RBD-HEK_A and detected with silver staining. Lane 1: Molecular weight markers of low-range from 31 to 97 kDa (Bio-Rad). Lanes 2–3: N-glycosylated and deglycosylated protein in non-reducing conditions detecting the monomer and a low-abundance (13%) dimer species of RBD-HEK_A. Lane 4: Control of PNGase F used in the N-deglycosylation. Lanes 5–6: N-glycosylated and deglycosylated protein under reducing conditions. b ESI–MS analysis of the RBD-HEK_A deglycosylated with PNGase F. c Resultant ESI–MS spectrum after deconvolution with MaxEnt v 1.0 software. The inset shown in (c) corresponds to the expanded ESI–MS spectrum in the range shown by a broken line rectangle. The masses between parentheses indicate the expected molecular masses of the detected species. A detailed assignment of this ESI–MS spectrum is shown in Table 2. The ESI–MS spectra shown in (d) and (e) correspond to the ESI–MS analysis of the resultant tryptic peptides of RBD-HEK_A digested with trypsin following the SD and in-solution BFD (with ethanol precipitation) protocols shown in Fig. 1(a) and (b). Asterisks in (d) correspond to background signals, not assigned to tryptic peptides. The inset shown in (e) corresponds to an expanded region where the O-glycosylated N-terminal end peptide (Val320-Arg328 + [HexNAc:Hex:NeuAc2])2+ and two disulfide bonded peptides (assigned as S-S391-5254+) were detected. Monosaccharide symbols follow the SNFG system [60] and the O-glycan structures as previously reported [33]. The upper and lower mass spectra shown in (f), (g), and (h) correspond to expanded regions of the ESI–MS spectra shown in (d) and (e), respectively. A detailed assignment for all tryptic peptides in this figure is summarized in Table 3
Summary of the ESI–MS analysis for the SD and the in-solution BFD protocols and sequence coverage of RBD proteins characterized in this work
| Protein | Molecular mass | Sequence assignmenta) | Sequence coverageb) | |||
|---|---|---|---|---|---|---|
| Exp. (Da) | Theor. (Da) | Error (ppm) | SD | BFD | ||
27,163.79 27,181.18 27,195.08 27,209.29 27,308.95 27,381.17 27,560.04 27,746.39 | - - 27,195.46 - - 27,381.62 27,560.69 27,746.96 | - - − 13.97 - - − 16.43 − 23.58 − 20.54 | RBD + HexNAc:Hex:NeuAc2 + 87 Da RBD + HexNAc:Hex:NeuAc2 + 106 Da RBD + HexNAc:Hex:NeuAc2 + Cys RBD + HexNAc:Hex:NeuAc2 + hCys RBD + HexNAc:Hex:NeuAc2 + 232 Da RBD + HexNAc:Hex:NeuAc2 + ECG RBD + HexNAc2:Hex2:NeuAc2 + Cys RBD + HexNAc2:Hex2:NeuAc2 + ECG | 82 | 100 | |
26,982.06 26,995.61 27,009.65 27,053.73 27,095.12 27,166.19 27,181.66 27,196.27 27,347.31 27,476.17 | 26,982.22 - - - - 27,168.38 - - 27,347.55 27,476.67 | − 5.93 - - - - − 80.6 - - − 8.78 − 18.19 | RBD + HexNAc:Hex:NeuAc2 + Cys RBD + HexNAc:Hex:NeuAc2 + hCys RBD + HexNAc:Hex:NeuAc2 + 147 Da RBD + HexNAc:Hex:NeuAc2 + 191 Da RBD + HexNAc:Hex:NeuAc2 + 232 Da RBD + HexNAc:Hex:NeuAc2 + ECG RBD + HexNAc2:Hex2:NeuAc2 − 47 Da RBD + HexNAc2:Hex2:NeuAc2 − 32 Da RBD + HexNAc2:Hex2:NeuAc2 + Cys RBD + HexNAc2:Hex2:NeuAc2 + EC | 85 | 100 | |
| 53,141.06 | 53,141.62 | − 10.54 | (RBD + HexNAc:Hex:NeuAc)2 | 80.6 | 100 | |
| 53,433.40 | 53,432.87 | − 9.92 | (RBD)2 + HexNAc:Hex:NeuAc + HexNAc:Hex:NeuAc2 | |||
| 53,724.72 | 53,724.14 | − 10.79 | (RBD + HexNAc:Hex:NeuAc2)2 | |||
| 25,117.44 | 25,117.14 | − 11.94 | RBD reduced and carbamidomethylatedc) | - | 99 | |
| 22,590.33 | 22,590.26 | − 3.09 | RBD | - | 100 | |
23,481.58 23,683.30 23,644.03 23,847.28 24,009.12 23,969.29 24,172.71 24,130.26 24,333.80 24,292.36 24,495.52 24,455.07 24,658.32 24,819.64 | 23,482.09 23,685.29 23,644.23 23,847.43 24,009.57 23,968.52 24,171.71 24,130.66 24,333.86 24,292.81 24,496.00 24,454.95 24,658.14 24,820.29 | − 21.92 − 84.91 − 8.45 − 6.29 − 18.74 + 32.12 + 41.37 − 16.57 − 2.46 − 18.52 − 19.59 + 4.90 + 7.29 − 26.19 | RBD + M3 RBD + M3A1 RBD + M4 RBD + M4A1 RBD + M5A1 RBD + M6 RBD + M6A1 RBD + M7 RBD + M7A1 RBD + M8 RBD + M8A1 RBD + M9 RBD + M9A1 RBD + M10A1 | - | 100 | |
| 25,835.29 | 25,434.41 | - | RBD + 400 Da | - | 99 | |
a)HexNAc: N-acetyl hexosamine, Hex: hexose, SA: sialic acid, M: mannose, GlcNAc: N-acetylglucosamine, ECG, glutathione; Cys, cysteine; hCys, homocysteine. Glycans structures were represented according to GlycoStore nomenclature
b)Expressed in % of the sequences provided in Table 1. SD and BFD mean that the RBD molecule was characterized by in-solution SD and BFD protocols, respectively
c)Non-reduced molecular mass of RBD-Ec was estimated by SDS-PAGE analysis and observed between the stacking and separating gel (> 97,000 Da) in Fig. 5a
Summary of the 100% sequence coverage assignment by ESI–MS of the tryptic digestion using the in-solution buffer-free (BFD) and 82% by the standard digestion (SD) protocol of RBD-HEK_A expressed in HEK293T
a)The three alanine and six histidine residues located at the C-terminal end (residues 542–550) of the protein do not correspond to the RBD and were inserted in the cloning stage to facilitate the purification process of the recombinant protein by using IMAC. The superscript numbers indicate the location of the tryptic peptides within the analyzed protein. A brief description of the PTMs linked to the corresponding peptides is included. NEM, N-ethylmaleimide; CNEM cysteine alkylated with N-ethylmaleimide at the thiol group. Nt and Ct indicate an N- and C-terminal end. The residues indicated as D correspond to potential N-glycosylation sites located at Asn331 and Asn343 that were transformed into Asp by PNGase F
b)Monosaccharide symbols follow the SNFG system [60] and the O-glycans structures as previously reported [33]
Fig. 5a SDS-PAGE analysis under reducing and non-reducing conditions of N-glycosylated and deglycosylated (RBD-CHO) and detected with silver staining. Lane 1: Molecular weight markers of low-range from 31 to 97 kDa (Bio-Rad). Lanes 2–3: N-glycosylated and deglycosylated protein under reducing conditions detecting the reduced monomer. Lanes 4–5: N-glycosylated and deglycosylated protein in non-reducing conditions detecting the dimer species [(RBD-CHO)]. b ESI–MS spectrum of a dimeric RBD deglycosylated with PNGase F. c Deconvolution of the ESI–MS spectrum shown in (b) reveals the presence of the three major O-glycoforms of (RBD-CHO). Between parentheses the expected molecular masses of the different O-glycoforms are shown. (RBD)2 represents an abbreviated form for referring to the (RBD-CHO) molecule. Monosaccharide symbols follow the SNFG system [60] and the O-glycan structures are as previously reported [33]. The ESI–MS spectra shown in (d) and (e) correspond to the (RBD-CHO) digested with trypsin following the SD and in-solution BFD (precipitated with acetone) protocol, respectively. Asterisks in (d) correspond to background signals, not assigned to tryptic peptides and (S–S)n+ to peptides containing a disulfide bond between the described cysteines. The insets shown in (d) and (e) correspond to the expanded regions of the mass spectra (m/z 981.5–995.5) shown by rectangles with broken lines showing the O-glycosylated peptides and two disulfide bond peptides (assigned as S-S391-5254+ and S-S379-4324+). The upper- and lower-mass spectra shown in (f) and (g) correspond to two expanded regions (m/z 520.4–524.1 and m/z 650.5–655.2) of the ESI–MS spectra shown in (d) and (e), respectively. A detailed assignment for all tryptic peptides in this figure is summarized in Table S4
Fig. 3ESI–MS/MS spectra of C-terminal peptides (538CVNF541-AAAHHHHHH) of RBD-HEK_A containing C538
modified by a cyanylation, b glutathionylation, c cysteinylation, and d homocysteinylation
Fig. 4MS/MS spectrum of two copies of the C-terminal peptide (538CVNF541-AAAHHHHHH) of RBD-HEK_A linked by an intermolecular disulfide bond between two Cys538. The nomenclature of fragment ions is in agreement with that proposed by Mormann et al. [61]
Fig. 6a SDS-PAGE analysis of the recombinant RBD-Ec analyzed under reducing (Lane 2) and non-reducing (Lane 3) conditions and detected with Coomassie staining. Lane 1 corresponds to the molecular weight markers of low-range from 14 to 97 kDa (Bio-Rad). b Deconvoluted ESI–MS spectrum of the reduced and S-carbamidomethylated protein. The expected molecular mass is indicated in parentheses. c ESI–MS analysis of the recombinant protein expressed in E. coli and digested with trypsin by using in-solution BFD protocol. Signals assigned as (S–S)n+ correspond to the peptides containing disulfide bonds between the cysteines that are described. The signals labeled with (Nt-His6)n+ correspond to the N-terminal peptide containing a His6 tag in its amino acid sequence. A detailed assignment for all tryptic peptides in this figure is summarized in Table S5
Fig. 7a NP-HPLC profile (upper chromatogram) of the 2AB-N-glycans released by PNGase F treatment of the recombinant RBD-C1 and corresponding dextran ladder (lower chromatogram) used to calculate the GU indexes for all 2AB-N-glycans and to perform for the structural assignment. The asterisks correspond to non-assigned glycoforms. The numbers above peaks in the dextran ladder indicate the corresponding glucose units. The nomenclature used in the structural assignment of the 2-AB N-glycans agrees with the ones proposed by the SNFG system [60]. The deconvoluted ESI–MS spectrum shown in (b) corresponds to the intact protein with potential N-glycosylation site located at the Asn343 occupied to several glycoforms. A magnification of 10 × is shown in the low molecular mass region of (b). The ESI–MS spectrum shown in (c) corresponds to the RBD-C1 treated with PNGase F and digested following the in-solution BFD protocol shown in Fig. 1b. The ESI–MS spectrum shown in (d) corresponds to the reduced and S-alkylated glycosylated RBD-C1. Signals assigned as (C# + cam)n+ correspond to tryptic peptides containing carbamidomethyl cysteine residues at position #. The inset shown in (d) corresponds to an expanded region (m/z 1237–1662) showing the presence of several signals assigned to the N-terminal end glycopeptides (T333-R346) with several N-glycans linked to the glycosylated Asn343. Signal assigned as (C480/488 + cam)3+ corresponds to the peptide D467-R509 containing the Cys480 and Cys488 S-alkylated with iodoacetamide. A detailed assignment for all tryptic peptides in this figure is summarized in Table S6
Fig. 8a ESI–MS analysis of the deglycosylated RBD-cmyc-Pp expressed in P. pastoris. b Deconvoluted ESI–MS spectrum. The expected mass of the N-deglycosylated protein is shown in parentheses. c ESI–MS analysis of the in-solution BFD trypsin digestion of the N-deglycosylated RBD-cmyc-Pp. The inset shows the isotopic ion distribution of a 4 + ion corresponding to peptides [Leu387-Arg403]-S–S-[Val510-Lys528] linked by a disulfide bond between C391-C525. A summary of the above results is shown in Tables 2–3 and the detailed assignment for all signals in (c) is shown in Table S7. d ESI–MS/MS spectrum of peptides [EAEAEFS-Asn331-Arg346]-S–S-[Ile358-Lys378] linked by a disulfide bond between C336 and C361. This species contains an extension of seven amino acids (EAEAEFS-) added to the expected N-terminal end [Asn331-Arg346] due to an incomplete processing of the propeptide (alpha mating factor) during protein expression. Asn331 and Asn343 are transformed into Asp residues due to the action of PNGase F. The nomenclature for the fragment ions observed in the MS/MS spectrum agrees with the proposed by Mormann et al. [61]
Fig. 9The ESI–MS spectra shown in (a) and (b) correspond to expanded regions of the tryptic peptides derived from RBD-HEK_Adigested by in-solution BFD protocol after precipitation with acetone and ethanol, respectively. The signals assigned in (b) as (C538 + ECG)3+ and (C538 + 374 Da)3+ correspond to the C-terminal peptide 538CVNF541-AAAHHHHHH with the C538 modified with glutathione and a chemical modification of unknown chemical nature that increased its molecular mass by 374 Da, respectively. The MS/MS spectra shown in (c) and (d) correspond to the internal non-modified Val445-Arg454 peptide (m/z 609.80, 2 +) and the same peptide with a modification that increased its molecular mass by 40.02 Da (m/z 629.81, 2 +), respectively. This chemical modification introduced in the precipitation step with acetone is located alternatively at the N-terminal end (V + 40) or at the second position glycine (G + 40). The MS/MS spectra shown in (e) correspond to the cysteinylated peptide C-terminal end peptide (538CVNF541-AAAHHHHHH) with the C538 linked by a disulfide bond (-S–S-) to a Cys residue (C–OH) modified at the N-terminal end with an N-ethylmaleimide group (NEM-) introduced during the sample processing. Peptide and C–OH have been assigned as P1 and P2, respectively. The nomenclature of fragment ions is in agreement with the proposed by Mormann et al. [61]