Literature DB >> 29376659

Parsimonious Charge Deconvolution for Native Mass Spectrometry.

Marshall Bern¹, Tomislav Caval², Yong J Kil¹, Wilfred Tang¹, Christopher Becker¹, Eric Carlson¹, Doron Kletter¹, K Ilker Sen¹, Nicolas Galy², Dominique Hagemans², Vojtech Franc², Albert J R Heck².

Abstract

Charge deconvolution infers the mass from mass over charge (m/z) measurements in electrospray ionization mass spectra. When applied over a wide input m/z or broad target mass range, charge-deconvolution algorithms can produce artifacts, such as false masses at one-half or one-third of the correct mass. Indeed, a maximum entropy term in the objective function of MaxEnt, the most commonly used charge deconvolution algorithm, favors a deconvolved spectrum with many peaks over one with fewer peaks. Here we describe a new "parsimonious" charge deconvolution algorithm that produces fewer artifacts. The algorithm is especially well-suited to high-resolution native mass spectrometry of intact glycoproteins and protein complexes. Deconvolution of native mass spectra poses special challenges due to salt and small molecule adducts, multimers, wide mass ranges, and fewer and lower charge states. We demonstrate the performance of the new deconvolution algorithm on a range of samples. On the heavily glycosylated plasma properdin glycoprotein, the new algorithm could deconvolve monomer and dimer simultaneously and, when focused on the m/z range of the monomer, gave accurate and interpretable masses for glycoforms that had previously been analyzed manually using m/z peaks rather than deconvolved masses. On therapeutic antibodies, the new algorithm facilitated the analysis of extensions, truncations, and Fab glycosylation. The algorithm facilitates the use of native mass spectrometry for the qualitative and quantitative analysis of protein and protein assemblies.

Entities: Chemical Disease Gene Species

Keywords: algorithm; cetuximab; daclizumab; factor P; glycoprotein; high-resolution native mass spectrometry; infliximab; intact mass; maximum entropy; monoclonal antibody; parsimony; properdin

Mesh：

Substances：

Year: 2018 PMID： 29376659 PMCID： PMC5838638 DOI： 10.1021/acs.jproteome.7b00839

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Electrospray ionization mass spectra of biological macromolecules and protein complexes contain series of ion signals corresponding to the same chemical species in a sequence of charge states. The masses and intensities (ion currents) of the analyzed chemical species, as represented by an entire neutral-mass spectrum, can be inferred from the mass over charge measurements by computational deconvolution. All charge deconvolution algorithms in use today are iterative algorithms that converge to a deconvolved neutral mass spectrum along with charge distributions for the neutral masses that together explain the observed m/z (mass over charge) spectrum. The most widely used deconvolution algorithm, with implementations called MaxEnt and ReSpect, was developed about 25 years ago[1,2] and licensed to most of the mass spectrometry (MS) instrument manufacturers. This algorithm converges to a deconvolved neutral mass spectrum that optimizes an objective function that measures the quality of the result using criteria such as fit to the observed data, peak width, correlation between neighboring charge states, and—its defining characteristic—the Shannon entropy of the neutral-mass spectrum. A more recent algorithm, UniDec,[3] leaves out the entropy term, and builds in expected correlation between neighboring charge states by blending them with a smoothing filter. UniDec also includes specific support for ion mobility data and nanodisk analysis. Other recent work has focused on peak enhancement of m/z spectra[4] to improve the performance of maximum entropy charge deconvolution for native mass spectrometry. Regardless of the algorithmic details, the deconvolution iteration generally converges to a local rather than a global optimum. Two important user-controlled parameters for deconvolution are the input m/z range and the output mass range. Deconvolution algorithms usually assume that all of the ions (except perhaps some low-charge m/z peaks, recognizable by resolved isotopes) in the input range represent chemical species in the mass range. This assumption allows deconvolution of lower signal-to-noise spectra by limiting the number of masses and charges that the algorithm must consider, but it runs the risk that chemical species outside the mass range may be undetected or give false additional masses within the user-set target mass range. A practical solution entails deconvolution of the m/z range onto a wide mass range to survey the masses, followed by deconvolution of selected m/z ranges onto narrow mass ranges to capture more detailed information. With a wide target mass range, deconvolution can produce “harmonic” artifacts, for example, false mass peaks at one-half or twice the true mass, due to coincidences of the m/z series for masses with ratio relationships. Even with relatively narrow target mass ranges, off-by-one charge assignments produce another type of artifact, side lobes on either side of the true masses, for example 3000 Da too low and high if the strongest m/z signal is around m/z 3000. Both harmonic and off-by-one artifacts increase entropy of the deconvolution, so the entropy term in the objective function, which helps the algorithm resolve closely spaced masses, has the undesired side effect of promoting artifacts. Artifacts are a minor problem in some scenarios, but they can be quite misleading in other practical applications: (1) Automated workflows that forego expert human inspection (2) Analysis of antibodies, including bispecifics, where harmonic artifacts may be mistaken for half-mAbs, aggregations, or mispairings (3) Antibody–drug conjugates (ADCs), where off-by-one artifacts may bias quantitation of drug loading (4) Heavily glycosylated or other highly modified proteins To be fair, note that MS automation, bispecifics, and ADCs barely existed when the maximum entropy algorithm was developed in the early 1990s. Perhaps the most important development in intact MS since the early 1990s is native MS,[5−9] made possible by methods and instrument innovations including the introduction of Orbitrap mass analyzers optimized for the transmission of high m/z ions.[10,11] Native MS has enabled the measurement of the micro- and macro-heterogeneity in proteins and complexes bound to multiple cofactors[12] or harboring multiple post-translational modifications (PTMs)[13−18] and in large endogenous protein assemblies, such as the ribosome[11] and intact viruses.[13,19] Complex native MS spectra, sometimes exhibiting ion signals of several hundred species of different molecular weight, require sophisticated algorithms and software to extract qualitative and quantitative information on co-occurring proteoforms or protein–ligand stoichiometries. To address these issues, we present an improved charge deconvolution algorithm that divides the process into two stages: charge inference and peak sharpening. The charge inference stage aims for an artifact-free neutral mass spectrum with a “parsimonious” set of mass peaks that explains the observed m/z spectrum. The optional peak sharpening stage uses point-spread-function deconvolution on the neutral mass spectrum to resolve closely spaced peaks. Post-deconvolution peak sharpening on the neutral mass spectrum has practical advantages over coupled charge inference and peak sharpening, including speed of processing, visual inspection of before and after spectra, and compatibility with a variety of well-developed super-resolution algorithms, such as Richardson–Lucy,[20,21] maximum entropy,[22] and convolutional neural networks. This design choice imposes some restrictions on the super-resolution algorithm’s underlying physical model; for example, the point-spread function may depend upon mass, for example, broadening at higher mass, but not upon charge or m/z. We focus on the charge inference stage because charge inference is central and unique to ESI mass spectrometry, and it is also the source of the most misleading deconvolution artifacts, meaning false masses far removed from all true masses. (The super-resolution stage can produce minor artifacts such as “ringing” around true masses.) We demonstrate parsimonious charge inference on complex glycosylated therapeutic antibodies and a heavily glycosylated plasma glycoprotein, all analyzed under native conditions. We reveal on several therapeutic antibodies a variety of interesting causes of species microheterogeneity,[23] including N-terminal extensions and truncations, abundant C-terminal lysine retention, and multiple glycosylation sites. We argue that this improved parsimonious charge deconvolution tool will benefit the qualitative and quantitative analysis of protein therapeutics, including biosimilar testing, drug load quantification in ADCs, and glycoproteoform analysis.

Materials and Methods

Chemicals and Materials

The three therapeutic mAbs, namely, cetuximab (lot number 7663503, expiration date 3/2010), daclizumab (lot number B0035, expiration date unknown), and infliximab/Remicade (lot number and expiration date unknown) used in this work are all commercially available and were kind gifts from Genmab (Utrecht, The Netherlands). All mAb samples given to us likely represent expired batches (see Table S2). Properdin, also known as Factor P (Uniprot code: P27918), purified from human blood plasma, was obtained from Complement Technology (Tyler, TX). We obtained amino acid sequences from literature[24] and Web searches (www.commonchemistry.org). All amino acid sequences lacked the N-terminal signal peptides (except daclizumab, for which we used the sequence with signal peptide obtained from its European patent application: EP 2 527 429 A2), and specifications of the samples are listed in Tables S1 and S2. Dithiothreitol (DTT), iodoacetamide (IAA), and ammonium acetate (AMAC) were purchased from Sigma-Aldrich (Steinheim, Germany). Phosphate buffer was from Lonza (Verviers, Belgium). Formic acid (FA) was from Merck (Darmstadt, Germany). Acetonitrile (ACN) was purchased from Biosolve (Valkenswaard, The Netherlands). Sequencing-grade trypsin was obtained from Promega (Madison, WI). Lys-C, Glu-C, and Asp-N were obtained from Roche (Indianapolis, IN). PNGase F was obtained from Asparia Glycomics (San Sebastian, Spain). The IdeS enzyme for cetuximab digestion was purchased from Genovis (Lund, Sweden).

Sample Preparation for Native MS

The powder of the therapeutic mAbs was reconstituted in Milli-Q water. The aqueous mAbs samples and unprocessed protein solution (phosphate buffer at pH 7.2) containing ∼30–40 μg of properdin were buffer-exchanged with 150 mM aqueous AMAC (pH 7.5) by centrifugation using a 10 kDa cutoff filter (Merck Millipore, Germany). The resulting protein concentration was measured by UV absorbance at 280 nm and adjusted to 2 to 3 μM prior to native MS analysis. PNGase F was used to cleave the N-glycans of mAbs and properdin using protocols previously described.[25] Cetuximab was used to demonstrate the processing of native spectra of mAb treated by IdeS enzyme. The aqueous cetuximab (30 ug) was incubated with IdeS enzyme (30 units) in phosphate buffer at pH 7.5 for 30 min at 37 °C. This sample was either submitted to the native MS measurements or further treated with 20 mM DTT and incubated for 30 min at 37 °C. All samples were buffer-exchanged to 150 mM AMAC (pH 7.5) prior to native MS measurements.

Native MS Analysis

Samples were analyzed on a modified Exactive Plus Orbitrap instrument with extended mass range (EMR) (Thermo Fisher Scientific, Bremen) using a standard m/z range of 500–10 000, as previously described in detail.[25] The voltage offsets on the transport multipoles and ion lenses were manually tuned to achieve optimal transmission of protein ions at elevated m/z. Nitrogen was used in the higher energy collisional dissociation (HCD) cell at a gas pressure of (6 to 8) × 10–10 bar. MS parameters used: spray voltage 1.2 to 1.3 V, source temperature 250 °C, source fragmentation, and collision energy were varied from 30 to 100 V, and resolution (at m/z 200) 35 000 for properdin and 70 000 for mAbs. The instrument was mass calibrated as previously described using a solution of CsI.[25]

Proteolytic Digestion for Bottom-up Proteomics

The mAb daclizumab (5 μg) was reduced using 10 mM DTT at 56 °C for 30 min and alkylated with 30 mM IAA at room temperature for 30 min in the dark. The excess of IAA was quenched by using 10 mM DTT. The protein solution was first digested with Lys-C (or AspN, or GluC) at an enzyme-to-protein ratio of 1:50 (w/w) for 4 h at 37 °C and then overnight with trypsin at an enzyme-to-protein ratio of 1:100 (w/w) at 37 °C. The proteolytic digest was desalted by Oasis μElution plate,[26] dried, and dissolved in 40 uL of 0.1% FA prior liquid chromatography (LC)–MS and MS/MS analysis.

LC–MS and MS/MS Analysis

Proteolytic peptides from daclizumab (typically 300 fmol) were separated and analyzed using an Agilent 1290 Infinity HPLC system (Agilent Technologies, Waldbronn, Germany) coupled online to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). Reversed-phase separation was accomplished using a 100 μm inner diameter 2 cm trap column (in-housed packed with ReproSil-Pur C18-AQ, 3 μm) (Dr. Maisch, Ammerbuch-Entringen, Germany) coupled to a 50 μm inner diameter 50 cm analytical column (in-house packed with Poroshell 120 EC-C18, 2.7 μm) (Agilent Technologies, Amstelveen, The Netherlands). Mobile-phase solvent A consisted of 0.1% FA in water, and mobile-phase solvent B consisted of 0.1% FA in ACN. The flow rate was set to 300 nL/min. A 45 min gradient was used as followed: 0–10 min, 100% solvent A; 10.1–35 min 10% solvent B; 35–38 min 45% solvent B; 38–40 min 100% solvent B; 40–45 min 100% solvent A. Nanospray was achieved using a coated fused silica emitter (New Objective, Cambridge, MA) (outer diameter, 360 μm; inner diameter, 20 μm; tip inner diameter, 10 μm) biased to 2 kV. The mass spectrometer was operated in positive ion mode, and the spectra were acquired in the data-dependent acquisition mode. For the MS scans the scan range was set from 300 to 2000 m/z at a resolution of 60 000 and the AGC target was set to 4 × 105. For the MS/MS measurements HCD and electron-transfer and higher-energy collision dissociation (EThcD) were used. HCD was performed with normalized collision energy of 35%. A supplementary activation energy of 20% was used for EThcD. For the MS/MS scans the scan range was set from 100 to 2000 m/z and the resolution was set to 30 000, the AGC target was set to 5 × 105, the precursor isolation width was 1.6 Th, and the maximum injection time was set to 300 ms.

LC–MS/MS Data Analysis

Raw LC–MS/MS data on the digest of daclizumab were interpreted using Byonic software (Protein Metrics).[27] The following parameters were used for data searches: precursor ion mass tolerance, 10 ppm; product ion mass tolerance, 20 ppm; fixed modification, Cys carbamidomethyl; variable modification, Met oxidation.A semitryptic specificity search was chosen for all samples. The protein database contained the daclizumab protein amino acid sequence (Table S1).

Description of Algorithm

An m/z spectrum is a sequence of pairs m = (x, y), where x is the m/z value and y is the intensity value. Most often the intensity y represents a single species of ions, but, in general, the intensity represents a mix of ions of various charges, and we let c(m) denote the fraction of the intensity that has charge k for k = 1, 2, ..., up to some maximum charge. For each i the sum of c(m) values over all k is one. The c(m) values are initially unknown and set to be equal, but the algorithm iteratively learns these values as it learns the neutral mass spectrum. An observed m/z value m maps to a sequence of neutral masses, k·x – k·1.00728, with intensities c(m)·y for k = 1, 2, .... Here we are assuming positive-mode MS; for negative mode the neutral mass is k·x + k·1.00728, where 1.00728 is the mass of a proton in Daltons. We can compute a full neutral mass spectrum by accumulating, over all m, the intensities c(m)·y into a vector at the appropriate x values, k·x – (or for negative mode + ) k·1.00728. The result of this m/z-to-mass “backward” mapping is a sequence of points, M = (X, Y). For each point M in the neutral mass spectrum, we can also keep a record of the intensity contributions C(M) from each charge k and normalize these contributions so that for each j the C(M) values sum to one. The M points and C(M) values can be used in a mass-to-m/z “forward” mapping to give a modeled m/z spectrum. Alternation of backward and forward mappings improves the values of the unobserved c(m), C(M), and Y variables. The computation stops after a predefined number of iterations or when the neutral mass spectrum converges, meaning that it changes very little between iterations. The quality of a deconvolution can be evaluated by various criteria, and deconvolution algorithms either implicitly or explicitly aim to optimize an objective function that combines the criteria. To our knowledge, none of the maximum entropy algorithms disclose their objective functions or optimization algorithms; however, the primary criterion is always goodness of fit, which can be measured by forward mapping the neutral mass spectrum to an m/z spectrum and then evaluating, for example, the sum of the squares of the differences between the observed and computed values. A second criterion is smoothness of charge distributions C(M). Maximum entropy methods add into the objective function a weighting factor times the Shannon entropy of the neutral mass spectrum regarded as a probability distribution, that is, the sum over j of −Ẏ log2Ẏ, where Ẏ = Y/∑Y. The entropy criterion tends to split broad peaks into multiple sharper peaks. In the algorithm used here, we introduce a new criterion based on the assumption that m/z coincidences are rare, especially in highly resolved mass spectra, so that for each i the intensity at m/z point m is more likely to derive from a single mass value than from two masses, more likely to derive from two masses than from three, and so forth. This criterion tends to drive the iteration to a “parsimonious” neutral mass spectrum that contains a minimal set of mass peaks to explain the m/z spectrum. Notice that if the sample does contain a problem pair of masses, say a monomer and a dimer, then each m point may still be fairly pure if there is some separation in m/z, for example, if the dimer cannot carry twice the charge of the monomer. Separation in m/z is less reliable in mass spectra taken under “standard” denaturing conditions than in native mass spectra, in which different oligomers tend to claim distinct m/z ranges. If there is no separation in m/z, then the dimer explains every m/z peak explained by the monomer, and the evidence of the monomer is merely taller m/z peaks at even charges of the dimer. In this case, the monomer’s intensity in the computed neutral mass spectrum depends on the relative weighting of the parsimony and charging smoothness criteria. We implemented the new charge inference algorithm in C++ in a commercial product called Protein Metrics Intact or PMI Intact, shown in Figure . Input data from almost any type of MS instrument can be sliced by elution time into any number of possibly overlapping time windows, and summed mass spectra for each time window can be further sliced by m/z for separate deconvolution. Both m/z and mass point spacing are user-controllable; mass spacing below ∼0.2 Da preserves isotope resolution. We also implemented Richardson–Lucy point spread deconvolution, which we call “peak sharpening” to avoid confusion. This iterative algorithm takes as input 1D or 2D signals (such as a time series, mass spectrum, or image), along with a point spread function F, and computes an output whose convolution with F gives a result close to the observed input. Our current version of the software (v2.15, released in December 2017) lets the user define point spread functions with Gaussian or Lorentzian, possibly asymmetric, tails. Gaussian tails approximate isotope distributions and measurement inaccuracy; heavy Lorentzian tails may approximate adducts. PMI Intact also includes interactive visualization. Peaks in the deconvolved mass spectrum may be selected interactively, and the software marks the selected peaks and the m/z points that map to these peaks with matching colored dots for human inspection and validation. The software also enables automatic peak assignment from protein sequences, masses, or mass deltas as well as automatic graphical report generation.

Figure 1

Screenshot of Protein Metrics Intact software interface. The software applied to the mAb cetuximab provides tables of input files, elution peaks, and detected masses (upper left); total ion chromatogram (upper right); m/z spectra (lower left) summed over the selected time window; and deconvolved neutral mass spectrum (lower right). Mass peaks are interactively connected to m/z peaks by colored dots. The mass peak at 152 354 is a good match for the calculated average isotope mass of 152 356 Da for cetuximab with G2FGal2 on its Fab glycosylation sites and G0F on its Fc glycosylation sites. 152 515 and 152 676 match G2FGal2 Fab glycosylation with one- and two-G1F Fc glycosylation. PMI Intact is currently in use for a diverse set of applications including analysis of both reduced and intact monoclonal antibodies, IdeS-digested and intact bispecific antibodies, antibody-drug conjugates,[28] DNA oligos, heavily glycosylated glycoproteins, protein–ligand binding, and noncovalently bound protein complexes up to 1 MDa or more.

Software Tests

We tested PMI Intact on data from properdin and the three antibodies daclizumab, infliximab, and cetuximab. Experimental high-resolution native MS data from our laboratory was already published for properdin[16] and therefore represented an ideal test-case to demonstrate the power of this new algorithm. The three antibodies were chosen because they presented interesting analytical challenges due to their complex glycosylation profiles or extensive protein processing characteristics. We benchmarked PMI Intact against Protein Deconvolution 4.0 (Thermo Fisher Scientific) on the properdin data using identical m/z and mass ranges for the two programs. For PTM composition analysis, data were interpreted manually and glycan structures were deduced based on known biosynthetic pathways. Average masses were used for the PTM assignments, including hexose/mannose/galactose (Hex/Man/Gal, 162.1424 Da), N-acetylhexosamine/N-acetylglucosamine (HexNAc/GlcNAc/GalNAc, 203.1950 Da), and N-acetylneuraminic acid (NeuAc, 291.2579 Da). All used symbols and text nomenclature are according to recommendations of the Consortium for Functional Glycomics.

Results

As a first demonstration of the value of parsimony in the deconvolution of ESI mass spectra, we reanalyzed published high-resolution mass spectra on the plasma protein properdin. This protein may exist in various oligomeric states and harbors a diversity of modifications on various sites, including N- and O- glycosylation, as well as C-mannosylation, making properdin a challenging target for structural analysis. Our initial native MS measurements revealed monomer and dimer of properdin. We first tested whether Protein Metrics Intact and Thermo Protein Deconvolution 4.0 could find both monomer and dimer using m/z and mass ranges large enough to accommodate both forms; this is a challenging problem for charge deconvolution algorithms due to coincidences of m/z peaks. As seen in Figure S1 of the Supporting Information, Protein Metrics Intact gives an accurate deconvolution, but depending upon input parameter settings, Thermo Protein Deconvolution 4.0 either gives numerous large artifact peaks or loses the dimer form altogether, and it was impossible to find a setting that gave an accurate deconvolution. Figure shows a more detailed comparison of Protein Metrics and Thermo deconvolutions, alongside the major charge state from the m/z spectrum previously used for manual analysis.[16] When deconvolved with wide m/z and mass ranges, Thermo software, along with losing the dimer form, loses many of the medium abundance monomer proteoforms yet finds some of the lower abundance proteoforms, possibly because they are at half the mass of dimer forms. Thermo also gives highly variable peak widths. A wide mass peak in a deconvolved mass spectrum generally indicates mass uncertainty, caused by m/z peaks with different charges mapping to slightly different m values, but in this case the wide mass peaks at 53 866 and 54 176 Da seem to be caused by dimer m/z peaks mistaken for monomer. PMI Intact returns a deconvolution in good visual agreement with the major charge states of the m/z spectrum and mass agreement within ±2 Da of calculated masses of correct proteoform assignments.[16] The previous work made assignments by manual inspection of individual m/z peaks, which have poorer accuracy than the deconvolved mass spectrum peaks. The previous analysis also made several assignment errors that are now apparent from the improved resolution and mass accuracy of Intact’s deconvolved spectrum. PMI Intact gave about 25 interpretable species in this analysis (Supporting Information Figure S2). PMI Intact also revealed relatively high abundance of salt adducts (i.e., Na+ and K+) to some of the ion species. On the basis of this knowledge, we also analyzed a further desalted properdin sample by native MS, for which we obtained spectra nearly free of salt adducts, enabling us to find evidence of a low abundance of triantennary N-glycans (Figure S5), whose assignments could be confirmed by bottom-up glycopeptide analysis (Figure S6). Interestingly, the triantennary N-glycans were found on proteoforms with 15 C-mannosylations but not on those with fewer C-mannosylations, not even those with 14 C-mannosylations, which are most abundant in this sample. This is evidence of whole-protein correlation between PTMs that could not easily be obtained from bottom-up, middle-down, or top-down fragmentation spectra of a 54 kDa protein with 20 labile PTMs.

Figure 2

Proteoform profile of monomeric properdin. (a) Zoom of the 3800–4000 m/z range of properdin monomer mass spectrum. (b) Zoom of PMI Intact’s deconvolution computed on m/z range 3000–6500 and m range 10 000–160 000. (c) Zoom of thermo deconvolution computed on the same m/z and m ranges. Thermo deconvolution misses a number of proteoforms, including abundant forms at 53 380, 54 304, and 54 466, most likely due to interference from the dimer. As a further demonstration of the utility of the new deconvolution algorithm to target protein therapeutics, we analyzed three clinically approved and used mAbs. As a first example, Figure shows results on the PNGaseF-treated deglycosylated mAb daclizumab. Somewhat surprisingly, we observed three quite distinct masses in the deconvolved spectrum, namely, at 11 057, 132 792, and 143 831 Da, along with +340 Da masses for each of these peaks and 2 × 340 Da for the 143 831 Da species only. The calculated mass for deglycosylated daclizumab is 143 832 Da = 2 × 48 717 (heavy chain) + 2 × 23 215 (light chain) – 32 (for the 16 disulfide bonds). See Supporting Information S1 for protein sequences. The 11 057 Da species is an exact integer match to the average isotope mass of the heavy chain initial sequence QVQLVQSGAEVKKPGSSVKVSCKASGYTFTSYRMHWVRQAPGQGLEWIGYINPSTGYTEYNQKFKDKATITADESTNTAYMELSSLRSEDTAVYYCARGG.G (where. denotes the cleaved bond) with N-terminal pyro-Glu and the expected single disulfide bond. G.G is a well known clipping site for monoclonal antibodies,[29] attributed to the flexibility of GG, and in this case the even-more-flexible GGG sequence occurs in the heavy chain CDR3, making it solvent-accessible. The mass 132 792 Da corresponds to the full-length mAb minus the initial sequence ending in GG. The fact that the mass of the observed fragments minus the mass of the intact mAb, (132 792 + 11 057) – 143 831 = 18 Da, gives the mass of water, reveals that hydrolysis is causing the cleavage rather than gas-phase fragmentation inside the mass analyzer. The extra +340 Da peaks are consistent with an N-terminal extension of VHS (part of the signal peptide). A small peak for S, with measured mass delta (104.041 Da ≈ 87.032 for S + 17.027 for pyroQ) correct to < 0.02 Da, which is 2 ppm, in isotope-resolved Figure c supports this interpretation. FWHM (full width at half-maximum) peak widths at m/z 1900 are ∼0.08, sufficient to resolve isotopes of 11 kDa masses. FWHM of the full mAb peaks at m/z 6000 are ∼0.9, limited by the isotope distribution of the molecule (calculated FWHM of 1 at m/z 6000) rather than by instrument resolution, which should be below 0.2 at 6000 m/z as Orbitrap resolution decreases with the square root of m/z.

Figure 3

Full range m/z and deconvolved native ESI mass spectra of the deglycosylated mAb daclizumab. The m/z spectrum (a) shows three distinct charge series. In the mass spectrum the peak at 143 831 Da represents the mass of the full mAb without glycans or C-terminal Lys. 11 057 and 132 792 Da (which sum to 143 849) reveal the occurrence of two fragments formed via a GG clip from the heavy chain N-terminus. 143 831 is accompanied by two smaller peaks at a ΔMw of +340 and +680 Da. The fragments of 132 792 and 11 057 Da each have only one +340 peak. These molecules originate from N-terminal extensions of the amino acid residues VHS (part of the signal peptide). (c) An isotope-resolved deconvolved mass spectrum. The small peak at 11 161.451 (≈ 11 057.410 + 87.032 + 17.027) fits the GG clip along with N-terminal S, which prevents the formation of a pyro-Glu at the most abundant N-terminal Q. Thus three distinct N-termini coexist in this mAb product; the most abundant is pyroQVQLV..., the less abundant is VHSQVQLV.., and least abundant is SQVQLV.... We based the interpretation of GG clipping and VHS extension only on the deconvolved mass spectra and protein sequence; this inference would be difficult without high-resolution mass spectrometry and accurate artifact-free deconvolution. We then searched our bottom-up proteomics data for nonspecific peptides and peptides with N-terminal extensions, and the search results confirmed our interpretation (Supporting Information Figures S5 and S6). The information from the native MS data prompted us to look for these features in the LC–MS/MS peptide data. Next, we targeted the mAb infliximab. We first analyzed deglycosylated infliximab, because the spectrum of the deglycosylated antibody, displayed in Figure , helps to interpret the more complicated spectrum of nondeglycosylated infliximab. The peak at 145 623 Da is an exact match for the calculated deglycosylated mass of 145 623 Da, and the mass deltas of +128 Da for the other two large peaks in the deglycosylated infliximab are exact integer matches for C-terminal lysines, a modification known to occur frequently in recombinant mAbs. The presence of this triplet of mAb species harboring zero, one, and two C-terminal lysines leads to a denser and more complicated spectrum for nondeglycosylated infliximab. The peaks at 148 511, 148 638, and 148 768 Da in the glycosylated infliximab spectra can be assigned as matches to proteoforms with two N-glycans with composition G0F (= HexNAc(4)Hex(3)Fuc(1)) (with average-isotope additional mass of 2891), along with zero, one, and two C-terminal lysines.

Figure 4

Deconvolved high-resolution native mass spectra of the deglycosylated and glycosylated mAb infliximab. Deglycosylated infliximab (a) shows three abundant species with masses in agreement with the amino acid sequence of the full mAb, along with species from which one or two C-terminal lysines had been clipped. The small peaks at 14 042 and 146 837 Da most likely represent, respectively, glycation on 146 042 and Man5 on 145 623. In the deconvolved mass spectrum of the glycosylated infliximab, the marked peaks exhibit the same triplets originating from the mAb with zero, one, or two C-terminal lysines, along with two N-glycans with G0F (= HexNAc(4)Hex(3)Fuc(1)). Each marked peak begins a chain of peaks with ∼162 Da spacing, showing glycosylation heterogeneity. For example, the peaks at masses 148 511, 148 673, 148 838, 149 091, 149 256, and 149 416 Da correspond to the mAb with no C-terminal lysine and zero to five Gal monosaccharides. Extending the complexity of the targeted mAb still further, we next analyzed cetuximab, as far as we know the only therapeutic antibody in current clinical use that has, along with the usual Fc glycosylation site, an additional glycosylation site in the Fab region. Therefore, we chose to digest cetuximab with IdeS[30] to separate Fab and Fc. IdeS digestion produces a F(ab′)2 component. Reduction with DTT then reduces the F(ab′)2 into Fd subunits, that is, the heavy chain from the N-terminus up to ...PAPELLG, but often leaves disulfide bonds within subunits intact. After IdeS digestion, the Fc may appear as either ∼50 kDa Fc species held together noncovalently or ∼25 kDa Fc/2. High-resolution native MS data acquired for this whole mixture of species, that is, the light chain LC (∼23 kDa), the glycosylated Fc/2 (∼25 kDa), the glycosylated Fd (∼27 kDa), and the glycosylated Fc (∼50 kDa), processed by Protein Metrics Intact deconvolution, gave results in close agreement with a previous detailed analysis of cetuximab Fab glycosylation,[31] except that we noted now that the previous analysis misidentified peaks at 27 688, 27 832, and 28 216 Da as glycans with somewhat unusual GlcNAc-Gal-GlcNAc antennas. These misidentifications may stem from arithmetic mistakes as the masses are each off by ∼100 Da. We interpret the peak at 27 688 Da as HexNAc(4)Hex(7)Fuc(2), that is, a glycan with antennal Fuc, which gives an exact mass match to the closest integer and connects biosynthetically to the most abundant glycoform in the deconvolved spectrum, HexNAc(4)Hex(7)Fuc(1). Figure includes small unlabeled peaks at 27 834 and 28 215, which are within 2 Da of the misidentified peaks in the previous analysis and also within 2 Da of the theoretical masses for the Fd with HexNAc(4)Hex(7)Fuc(3) and HexNAc(5)Hex(9)Fuc(2), respectively. As shown in Table S3 in the Supporting Information, the deconvolved spectrum includes at least 14 recognizable Fd glycoproteoforms over a 100-fold dynamic range. In native MS on intact proteins, glycoproteoforms with and without sialic acids have similar ionization propensities and gas-phase stabilities, and hence peak intensities in the deconvolved mass spectrum should give accurate relative quantification.[25]

Figure 5

High-resolution native mass spectra and deconvoluted masses of the IdeS-digested and reduced mAb cetuximab. Deconvolution of the full m/z range (a) of cetuximab shows mass clusters (b) at about 23.4, 25.4, 27.5, and 50.5 kDa, corresponding to the light-chain LC, the glycosylated Fc/2, the glycosylated Fd, and the glycosylated Fc, respectively. A zoom of the 23–28 kDa range (c) shows good agreement with the theoretical masses of 23 423 Da for the LC with intrachain disulfide bonds, 25 233 Da for Fc/2 + G0F, and 27 543 for Fd + G2FGal(2). A further zoom in of the 26–29 kDa range (d) shows the more complicated Fab-arm/Fd glycosylation, including Gal-α-Gal and antennal fucosylation. On the basis of the detailed analysis of the IdeS induced fragments of cetuximab, we were also able to annotate many of the abundant ion signals in the complicated intact cetuximab spectrum. Summing 23 422 (LC), 27 543 (Fd + G2FGal2), and 25 232 (Fc/2 + G0F) from Figure c and then multiplying by 2 and subtracting 36 Da for gain of water from IdeS digestion along with 4 for interchain disulfide bonds gives a mAb proteoform at 152 354, a perfect match for the peak with the orange dot in Figure . The peak at 152 515 then represents a proteoform with G1F on one of the Fc sites; this peak is taller than 152 354 because G1F has almost equal abundance as G0F in Figure b,c, and there are two chances for an extra Gal. The peaks at 151 866, 152 027, 152 189, and 152 676 are interpretable as proteoforms differing in number of galactose monosaccharides. The peaks at 152 808 and 152 961 probably contain unresolved proteoforms, including multiple fucosylation on the Fd.

Discussion

For the past 25 years, charge deconvolution of protein ESI–MS data has almost exclusively been performed by some implementation of the maximum entropy algorithm. During this time period, MS instruments and associated technologies such as chromatography and sample handling have improved in speed, resolution, and sensitivity, and partially as a consequence of technology improvements, the variety, complexity, and masses of target molecules for intact and native MS have increased significantly. Therefore, high-resolution native MS is now widely adopted by the pharmaceutical industry to characterize some of their most important protein therapeutics, such as the mAbs analyzed here. These developments motivate the development of accurate, automated, and user-friendly deconvolution programs that can handle more difficult data with less user intervention and validation. A primary contribution of the work presented here is the use of parsimony in charge deconvolution. Parsimony is a guiding principle in other inverse problems arising in bioinformatics including phylogeny reconstruction from genomic data and protein inference from proteomics data. Because of its use of parsimony, Protein Metrics Intact gave fewer and smaller artifact peaks than Protein Deconvolution 4.0 on a complicated monomer/dimer example. Artifact reduction is important whenever the sample contains, or could possibly contain, molecules spanning a wide mass range, for example, light and heavy chains, monomers and dimers, or full proteins and clips. Another contribution of the work presented here may seem obvious and unimportant, but we believe it is fundamental and far-reaching. This contribution is the “factorization” of charge deconvolution into two subproblems: charge inference and super-resolution. The two subproblems are not closely connected, even though they can both be solved by iterative algorithms. In the case of maximum entropy methods, the two subproblems are actually antagonistic, as accurate charge inference tends to decrease entropy and super-resolution explicitly aims to increase entropy. Decoupling the two problems will enable mass spectrometrists to work on charge inference, a problem unique to the field, while borrowing and adapting well-developed super-resolution algorithms from astronomy, geophysics, and so forth. Although we chose the samples primarily as demonstrations of the new algorithm, our studies did reveal some unexpected characteristics of the targeted mAbs and properdin. For properdin we identified several novel low abundance proteoforms harboring triantennary N-glycans, seemingly exclusively on proteoforms with 15 C-mannosylations. These new proteoforms went unnoticed in our previous work due to the presence of salt adducts and the lack of a charge deconvolution program that could handle such difficult data. For daclizumab, we found both N-terminal extension and GG clipping, which, to our knowledge, have not been previously published. Such information is important to drug manufacturers because the clipped proteoform may have completely different therapeutic effects than the intact monoclonal antibody.[29] Our daclizumab sample, however, was already quite old and possibly past its expiration date, so the clipping may be due to extended storage. For cetuximab, we showed an analysis of a mAb with both Fc and Fd glycosylation using a combination of native MS with the new deconvolution algorithm, along with IdeS digestion to separate subunits, and bottom-up proteomics to confirm identified glycoforms, including glycans on the Fd site with antennal fucose. As demonstrated in other studies,[16−18,25,32,33] native MS proves to be advantageous for analysis of mAbs and plasma glycoproteins. Native MS gives greater separation between charge states.[34] Without this separation, properdin and cetuximab would most likely give overlapping m/z states, which would seriously hamper deconvolution and visual validation. Another advantage of native MS for these target molecules is improved dynamic range; fewer charge states and lower charge means that there is more trap capacity available for minor species, such as the clipped N-terminal sequence in daclizumab. On the contrary, native MS generally requires more starting material than intact MS on denatured proteins, and native MS can lose resolution on FTICR and Orbitrap instruments by shifting the signal to higher m/z. Neither of these disadvantages, however, applies to typical analyses of therapeutic mAbs because sample is usually abundant, and FTMS resolution is more often limited by isotopic spread or adducts than by instrument resolution. Finally, intact MS in either native or denaturing conditions provides a clear qualitative and quantitative survey of all of the proteoforms distinguishable by mass, thereby helping to identify which modifications need to be looked for in complementary bottom-up or middle-down data. The future analysis of protein therapeutics and plasma proteins is likely to rely on hybrid MS methods, complemented by advanced bioinformatics methods to analyze and integrate the data from each of the information channels. We look forward to equally rapid progress in bioinformatics to keep pace with the rapid development in instruments and experimental methods.

27 in total

Review 1. Fragmentation of monoclonal antibodies.

Authors: Josef Vlasak; Roxana Ionescu
Journal: MAbs Date: 2011-05-01 Impact factor: 5.857

2. Fast analysis of recombinant monoclonal antibodies using IdeS proteolytic digestion and electrospray mass spectrometry.

Authors: Guillaume Chevreux; Nolwenn Tilly; Nicolas Bihoreau
Journal: Anal Biochem Date: 2011-04-27 Impact factor: 3.365

3. Native-MS Analysis of Monoclonal Antibody Conjugates by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry.

Authors: Iain D G Campuzano; Chawita Netirojjanakul; Michael Nshanian; Jennifer L Lippens; David P A Kilgour; Steve Van Orden; Joseph A Loo
Journal: Anal Chem Date: 2017-12-18 Impact factor: 6.986

Review 4. Glycoproteomics: A Balance between High-Throughput and In-Depth Analysis.

Authors: Yang Yang; Vojtech Franc; Albert J R Heck
Journal: Trends Biotechnol Date: 2017-05-17 Impact factor: 19.536

5. Native mass spectrometry and ion mobility characterization of trastuzumab emtansine, a lysine-linked antibody drug conjugate.

Authors: Julien Marcoux; Thierry Champion; Olivier Colas; Elsa Wagner-Rousset; Nathalie Corvaïa; Alain Van Dorsselaer; Alain Beck; Sarah Cianférani
Journal: Protein Sci Date: 2015-03-31 Impact factor: 6.725

6. Byonic: advanced peptide and protein identification software.

Authors: Marshall Bern; Yong J Kil; Christopher Becker
Journal: Curr Protoc Bioinformatics Date: 2012-12

7. Improved Peak Detection and Deconvolution of Native Electrospray Mass Spectra from Large Protein Complexes.

Authors: Jonathan Lu; Michael J Trnka; Soung-Hun Roh; Philip J J Robinson; Carrie Shiau; Danica Galonic Fujimori; Wah Chiu; Alma L Burlingame; Shenheng Guan
Journal: J Am Soc Mass Spectrom Date: 2015-09-01 Impact factor: 3.109

8. Hybrid mass spectrometry approaches in glycoprotein analysis and their usage in scoring biosimilarity.

Authors: Yang Yang; Fan Liu; Vojtech Franc; Liem Andhyk Halim; Huub Schellekens; Albert J R Heck
Journal: Nat Commun Date: 2016-11-08 Impact factor: 14.919

9. Proteoform Profile Mapping of the Human Serum Complement Component C9 Revealing Unexpected New Features of N-, O-, and C-Glycosylation.

Authors: Vojtech Franc; Yang Yang; Albert J R Heck
Journal: Anal Chem Date: 2017-03-07 Impact factor: 6.986

10. Correct primary structure assessment and extensive glyco-profiling of cetuximab by a combination of intact, middle-up, middle-down and bottom-up ESI and MALDI mass spectrometry techniques.

Authors: Daniel Ayoub; Wolfgang Jabs; Anja Resemann; Waltraud Evers; Catherine Evans; Laura Main; Carsten Baessmann; Elsa Wagner-Rousset; Detlev Suckau; Alain Beck
Journal: MAbs Date: 2013-06-20 Impact factor: 5.857

32 in total

1. Programmable design of orthogonal protein heterodimers.

Authors: Zibo Chen; Scott E Boyken; Mengxuan Jia; Florian Busch; David Flores-Solis; Matthew J Bick; Peilong Lu; Zachary L VanAernum; Aniruddha Sahasrabuddhe; Robert A Langan; Sherry Bermeo; T J Brunette; Vikram Khipple Mulligan; Lauren P Carter; Frank DiMaio; Nikolaos G Sgourakis; Vicki H Wysocki; David Baker
Journal: Nature Date: 2018-12-19 Impact factor: 49.962

2. Crystallization and X-ray analysis of monodisperse human properdin.

Authors: Dennis Vestergaard Pedersen; Margot Revel; Trine Amalie Fogh Gadeberg; Gregers Rom Andersen
Journal: Acta Crystallogr F Struct Biol Commun Date: 2019-01-23 Impact factor: 1.056

3. Rapid online buffer exchange for screening of proteins, protein complexes and cell lysates by native mass spectrometry.

Authors: Zachary L VanAernum; Florian Busch; Benjamin J Jones; Mengxuan Jia; Zibo Chen; Scott E Boyken; Aniruddha Sahasrabuddhe; David Baker; Vicki H Wysocki
Journal: Nat Protoc Date: 2020-01-31 Impact factor: 13.491

4. Characterization of [2Fe-2S]-Cluster-Bridged Protein Complexes and Reaction Intermediates by use of Native Mass Spectrometric Methods.

Authors: Mengxuan Jia; Sambuddha Sen; Christine Wachnowsky; Insiya Fidai; James A Cowan; Vicki H Wysocki
Journal: Angew Chem Int Ed Engl Date: 2020-03-03 Impact factor: 15.336

5. Eliminating Artifacts in Electrospray Deconvolution with a SoftMax Function.

Authors: Michael T Marty
Journal: J Am Soc Mass Spectrom Date: 2019-08-07 Impact factor: 3.109

6. Native and Denaturing MS Protein Deconvolution for Biopharma: Monoclonal Antibodies and Antibody-Drug Conjugates to Polydisperse Membrane Proteins and Beyond.

Authors: Iain D G Campuzano; John H Robinson; John O Hui; Stone D-H Shi; Chawita Netirojjanakul; Michael Nshanian; Pascal F Egea; Jennifer L Lippens; Dhanashri Bagal; Joseph A Loo; Marshall Bern
Journal: Anal Chem Date: 2019-07-12 Impact factor: 6.986

Review 7. High-Resolution Native Mass Spectrometry.

Authors: Sem Tamara; Maurits A den Boer; Albert J R Heck
Journal: Chem Rev Date: 2021-08-20 Impact factor: 72.087

8. Surface-Induced Dissociation of Noncovalent Protein Complexes in an Extended Mass Range Orbitrap Mass Spectrometer.

Authors: Zachary L VanAernum; Joshua D Gilbert; Mikhail E Belov; Alexander A Makarov; Stevan R Horning; Vicki H Wysocki
Journal: Anal Chem Date: 2019-02-12 Impact factor: 6.986

9. Variable-Temperature Electrospray Ionization for Temperature-Dependent Folding/Refolding Reactions of Proteins and Ligand Binding.

Authors: Jacob W McCabe; Mehdi Shirzadeh; Thomas E Walker; Cheng-Wei Lin; Benjamin J Jones; Vicki H Wysocki; David P Barondeau; David E Clemmer; Arthur Laganowsky; David H Russell
Journal: Anal Chem Date: 2021-04-27 Impact factor: 6.986

10. Native Mass Spectrometry of Iron-Sulfur Proteins.

Authors: Jason C Crack; Nick E Le Brun
Journal: Methods Mol Biol Date: 2021