We describe the creation of a mass spectral library composed of all identifiable spectra derived from the tryptic digest of the NISTmAb IgG1κ. The library is a unique reference spectral collection developed from over six million peptide-spectrum matches acquired by liquid chromatography-mass spectrometry (LC-MS) over a wide range of collision energy. Conventional one-dimensional (1D) LC-MS was used for various digestion conditions and 20- and 24-fraction two-dimensional (2D) LC-MS studies permitted in-depth analyses of single digests. Computer methods were developed for automated analysis of LC-MS isotopic clusters to determine the attributes for all ions detected in the 1D and 2D studies. The library contains a selection of over 12,600 high-quality tandem spectra of more than 3,300 peptide ions identified and validated by accurate mass, differential elution pattern, and expected peptide classes in peptide map experiments. These include a variety of biologically modified peptide spectra involving glycosylated, oxidized, deamidated, glycated, and N/C-terminal modified peptides, as well as artifacts. A complete glycation profile was obtained for the NISTmAb with spectra for 58% and 100% of all possible glycation sites in the heavy and light chains, respectively. The site-specific quantification of methionine oxidation in the protein is described. The utility of this reference library is demonstrated by the analysis of a commercial monoclonal antibody (adalimumab, Humira®), where 691 peptide ion spectra are identifiable in the constant regions, accounting for 60% coverage for both heavy and light chains. The NIST reference library platform may be used as a tool for facile identification of the primary sequence and post-translational modifications, as well as the recognition of LC-MS method-induced artifacts for human and recombinant IgG antibodies. Its development also provides a general method for creating comprehensive peptide libraries of individual proteins.
We describe the creation of a mass spectral library composed of all identifiable spectra derived from the tryptic digest of the NISTmAb IgG1κ. The library is a unique reference spectral collection developed from over six million peptide-spectrum matches acquired by liquid chromatography-mass spectrometry (LC-MS) over a wide range of collision energy. Conventional one-dimensional (1D) LC-MS was used for various digestion conditions and 20- and 24-fraction two-dimensional (2D) LC-MS studies permitted in-depth analyses of single digests. Computer methods were developed for automated analysis of LC-MS isotopic clusters to determine the attributes for all ions detected in the 1D and 2D studies. The library contains a selection of over 12,600 high-quality tandem spectra of more than 3,300 peptide ions identified and validated by accurate mass, differential elution pattern, and expected peptide classes in peptide map experiments. These include a variety of biologically modified peptide spectra involving glycosylated, oxidized, deamidated, glycated, and N/C-terminal modified peptides, as well as artifacts. A complete glycation profile was obtained for the NISTmAb with spectra for 58% and 100% of all possible glycation sites in the heavy and light chains, respectively. The site-specific quantification of methionine oxidation in the protein is described. The utility of this reference library is demonstrated by the analysis of a commercial monoclonal antibody (adalimumab, Humira®), where 691 peptide ion spectra are identifiable in the constant regions, accounting for 60% coverage for both heavy and light chains. The NIST reference library platform may be used as a tool for facile identification of the primary sequence and post-translational modifications, as well as the recognition of LC-MS method-induced artifacts for human and recombinant IgG antibodies. Its development also provides a general method for creating comprehensive peptide libraries of individual proteins.
Since their first approval by the US Food and Drug Administration in 1986, monoclonal antibodies (mAbs) have emerged as one of the fastest growing classes of protein therapeutics in the treatment of various human diseases. This success has evolved into new generations of products, such as bispecific mAbs, mAb fragments, antibody-drug conjugates, and other derivatives. However, because of their large size and the high degree of heterogeneity that may arise from various post-translational and chemical modifications during cell culture, purification, formulation, and storage, the characterization of these proteins continue to present a substantial challenge. In an effort to support and promote high-quality measurements, the National Institute of Standards and Technology (NIST) issued a humanized IgG1κ mAb reference material, RM 8671 (NISTmAb). This reference mAb embodies the quality and characteristics of a donated biopharmaceutical product and has been characterized by NIST researchers, industry and academic community collaborators. As a part of the NISTmAb project to provide reference data, we report here a high-resolution spectral library of peptides derived from this molecule. Because all IgG antibodies have very similar constant regions, this library should be a useful tool for analyzing these regions for all human IgG and IgG-based biotherapeutics.Over the past decade, untargeted bottom-up LC-MS has become the method of choice for mAb characterization at the peptide level. Despite this progress, the ability of an LC-MS analysis to determine all modifications on each residue of mAb therapeutics has not been demonstrated. This approach typically involves the digestion of denatured proteins by trypsin or other proteases, followed by the separation of peptides by LC-MS, and their identification by sequence search methods from tandem mass spectra. Here, we report development of an alternative peptide identification method based on matching experimentally derived spectra in a comprehensive library. Special strategies were developed and applied for identifying all major modifications at each residue level because conventional peptide identification methods have been developed for mixtures of thousands of proteins containing a limited number of targeted modifications. We have previously built a peptide spectral library for humanserum albumin (HSA), and, more recently, a glycopeptide library of high-resolution spectra was derived for the NISTmAb. These glycopeptides are invisible to most commonly used database identification programs since they fragment primarily by the loss of sugar units and reveal little of the peptide backbone sequence under commonly used collision energy conditions. In that library, a total of 60 different glycans identified from more than 200 different glycopeptides were represented for the NISTmAb. In this work, we extended the prior work in order to construct an extensive library of all identifiable peptides from the NISTmAb. A wide range of digestion conditions were used, as well as more comprehensive in-depth results from 24-fraction 2D LC separations, to acquire as many relevant peptide ion spectra as possible, including many short and long peptides and charge states. We show that, in addition to the known advantages of speed and sensitivity when using a library search, the NISTmAb library-based identification of modified peptides is advantageous over the database search tools for readily and reliably determining identities of low levels of uncommon modified species in antibody drugs. The detection of these modifications, challenging for search engine analyses, can be achieved without using special settings or procedures.
Results
Overview of MS/MS spectra of the library
A high-resolution mass spectral library of all identified peptides produced in the tryptic digests of the NISTmAb sample was created for comprehensive analysis of mAbs. This library is composed of over 12,600 high-quality tandem spectra of more than 3,300 peptide ions. The 12 varieties of spectra included in the library are shown in Table 1. The second variety consists of 1,700 spectra of glycopeptides presented in a published paper that can be used together with this library. Varying collisional energies of higher-energy collision dissociation (HCD) were used and spectra were generated by 1D LC for 13 different digestion conditions as well as by 2D LC (20- or 24-fraction) studies of single digests (Materials and Methods and Supplementary Material Table S-1). The 1D studies (264 runs) cover a wide range of common digestion conditions, in order to generate the widest possible variety of peptides of analytical concern. No attempts were made to optimize digestion; instead, we employed distinct but complementary protocols to achieve a broad coverage of peptides. A study of optimization of the NISTmAb tryptic digestion has been reported. The 2D studies (553 runs) increased the depth of coverage and quality of these peptide spectra. The library of annotated mass spectra of all NISTmAbpeptide ions acquired in this work, along with the library search software, is freely available at https://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:lib:human_igg1k_mab_drugs.
Table 1.
Twelve varieties of spectra in the NISTmAb library.
spectra varieties
# ions
# spectra
1
Unmodified major class peptidea
378
2239
2
N-linked glycosylated peptidesb
247
1700
3
oxidized peptides
668
1945
4
deamidated peptides
245
996
5
N-/C terminal peptides
137
547
6
glycated peptides
93
656
7
formylated peptidesc
117
483
8
adducted peptides
125
191
9
overalkylated peptides
30
105
10
other types of modified peptidesd
206
642
11
diagnostic peptidese
466
2027
12
in-source or in-solution semi-tryptic peptidesf
1007
3189
Note:
Variety 1 (e.g., no missed cleavages or expected missed cleavages).
Variety 2 contains glycopeptides from the NISTmAb glycopeptide library.
because formic acid was used in the protocol.
Variety 10 includes transpeptidation, dehydration, gln->pyro-glu, glu->pyro-glu, ammonia-loss, and others.
Variety 11 contains miscleaved and under/over alkylated peptides.
semi-tryptic peptides are produced in ESI (in-source) or in digestion (in-solution).
Twelve varieties of spectra in the NISTmAb library.Note:Variety 1 (e.g., no missed cleavages or expected missed cleavages).Variety 2 contains glycopeptides from the NISTmAbglycopeptide library.because formic acid was used in the protocol.Variety 10 includes transpeptidation, dehydration, gln->pyro-glu, glu->pyro-glu, ammonia-loss, and others.Variety 11 contains miscleaved and under/over alkylated peptides.semi-tryptic peptides are produced in ESI (in-source) or in digestion (in-solution).
Peptide classification
The number of distinct peptide spectra in the library organized for full antibody chains as well as their constant regions are shown in Table 2. These numbers include distinct library sequences, peptides (sequence + modification), peptide ions (sequence + modification + charge state), spectra (sequence + modification + charge state + collision energy), and the total number of underlying peptide-spectrum matches (PSMs).
Table 2.
Number of distinct sequences, peptides, ions, and spectra for the heavy and light chains selected in the library that are derived from over six million peptide-spectrum matches (PSMs).
Number of distinct sequences, peptides, ions, and spectra for the heavy and light chains selected in the library that are derived from over six million peptide-spectrum matches (PSMs).Note: peptide = sequence + modification, peptide ion = sequence + modification + charge state, spectra = sequence + modification + charge state + collision energy.This work employed a peptide classification scheme developed earlier for building spectral libraries of tryptic peptides for a single protein, HSA. This scheme divided all peptides into the following six classes.Class 1. No missed tryptic-cleavage (includes K/R at the N-terminal resulting from cleavage between adjacent cleavable residues, and K/R followed by proline)Class 2. Expected missed tryptic-cleavage (D, E, K, or R within 1 residue of the missed cleavage site)Class 3. Unexpected missed tryptic-cleavage (not in Class 2)Class 4. Missed alkylation on cysteine or alkylation on other amino acidsClass 5. Modification (a. common, and b. other post-translational or method-induced)Class 6. Semi-tryptic peptides with non-specific cleavage at one terminal, including the cleavage of K/R followed by proline (a. in-source and b. in-solution).As developed in the earlier work, peptides with properties of two or more classes 3, 4, 5b, or 6b are generally rejected as improbable peptide features (see examples in Discussion). In this scheme, multiple unexpected miscleavages are allowed, but multiple, unexpected modifications are not. An ideal digestion profile from an analyst's perspective would be one where all the expected tryptic peptides (Class 1 or Class 2) were abundant, while the others (Class 3, 4, 5, or 6) were not. In the following sections, this classification scheme is applied to nine experiments in which each sample prepared using Protocol 4 was digested for 0.25 h, 2 h, and 18 h, with LC-MS analysis in triplicate (see GuanRT/TCEP in Supplementary Material Table S-1 and Figure S-1). The comparison of numbers and abundances of peptides (e.g., median values from triplicate experiments of the same digestion) in each class is given in Figure 1.
Figure 1.
Summed peptide identifications and abundances in six classes obtained by median results of triplicate analyses from three separate tryptic digests at (0.25, 2 and 18) h. In all digests, the sample was denatured by guanidine at room temperature and reduced by TCEP (GuanRT/TCEP).
Summed peptide identifications and abundances in six classes obtained by median results of triplicate analyses from three separate tryptic digests at (0.25, 2 and 18) h. In all digests, the sample was denatured by guanidine at room temperature and reduced by TCEP (GuanRT/TCEP).
Class 1 fully tryptic peptides
These peptides are products of simple tryptic cleavage, and therefore expected to represent the bulk of the ion intensity in tryptic digests of proteins. Other identified peptides often originate from modifications to these peptides. Figure 1A shows that 93, 118, and 122 peptides of Class 1 were detected in three triplicate analyses at the digestion time points (0.25, 2 and 18 h), respectively. All major fully cleaved tryptic peptide ions were observed at short digestion times, with significantly higher signal intensity for many at longer times, yielding about 39%, 54% and 74% of total ion intensity at 0.25, 2 and 18 h, respectively, as illustrated in Figure 1B.
Class 2 peptides with expected missed-cleavage
Tryptic cleavage rates can be reduced by the presence of adjacent cleavage sites or acidic residues. Depending on the degree of digestion, these can be found at high relative abundances, even though they are not generally targets in peptide mapping studies. At short digestion times (0.25 and 2 h), Class 2 accounts for 9% and 7% of identified peptides (Figure 1A), and 25% and 21% of total intensity (Figure 1B), respectively. Approximately 30% of the 20 top-ranking peptides identified belonged to Class 2. Such peptides are frequently observed with trypsin digestion at shorter times., Longer Class 2 peptides can be beneficial in achieving full sequence coverage and can assist in the determination of protein sequence and modification.
Class 3 peptides with unexpected missed cleavage sites
Class 3 peptides generally result from incomplete digestion, probably because of insufficient denaturation, which left certain peptide bonds less accessible to digestion. As seen from Figure 1A, Class 3 comprises fewer identifications than Classes 1 and 2 peptides. Intensities of these peptides are low compared to those of Classes 1 and 2 (Figure 1B), representing 8%, 3% and 3% of total intensities at 0.25 h, 2 h, and 18 h, respectively. Class 3 peptides can be used to assess digestion efficiency of different sample preparation methods.
Class 4 peptides with under/over alkylation
Class 4 measures undesirable misalkylation of cysteine residues or alkylation of other residues in a protein digest with iodoacetamide.
Figure 1A shows less than 5% of Class 4 peptides relative to all identified peptides. Their summed abundances are typically below 3% of the total intensity identified in our triplicate experiments. Misalkylated peptides were generally found to elute 1 – 4 min after their alkylated counterpart. It should be noted that about 30 peptides were alkylated at a site other than cysteine residue, the most significant of which are shown in Table 3. Methionine was found to be the most frequent target of carbamidomethylation, as has been previously reported. Other overalkylated sites also include serine and the N-terminal amino group.
Table 3.
Representative overalkylated peptides detected by 1D LC-MS/MS of the NISTmAb using Protocol 4 GuanRT/TCEP with 18 h of digestion.
non-overalkylated
overalkylated
peptidea
z
site
RT min
log(int)
RTdiff
% int
WQQGNVFSCSVMHEALHNHYTQK
3, 4, 5
Met
47.88
11.95
−13.44
4.2
DIQMTQSPSTLSASVGDR
2, 3
Met
48.04
11.32
−13.16
9.3
ESGPALVKPTQTLTLTCTFSGFSLSTAGMSVGWIR
3
Met
77.42
10.96
−8.35
11.7
DTLMISR
1, 2
Met
34.01
11.50
−9.90
2.5
VGYMHWYQQKPGK
2, 3, 4
Met
34.22
11.55
−11.56
1.9
VTNMDPADTATYYCAR
2, 3
Met
44.57
10.85
−12.91
6.2
VYACEVTHQGLSSPVTK
2, 3
N-term
35.80
11.76
1.03
0.3
FNWYVDGVEVHNAK
2, 3
N-term
48.81
11.82
1.45
0.2
STSGGTAALGCLVK
2
N-term
40.49
11.66
0.86
0.1
EPQVYTLPPSREEMTK
3
Met
38.72
11.22
−9.67
0.3
VDNALQSGNSQESVTEQDSK
2, 3
N-term
31.62
11.64
0.68
0.1
HKVYACEVTHQGLSSPVTK
3, 4
N-term
28.98
10.89
1.50
0.3
TPEVTCVVVDVSHEDPEVK
3
Ser
50.25
11.29
−0.49
0.1
Note:
all cysteines were alkylated. RT min retention time in min. log(int): log of summed peak area of all charge states. RTdiff: retention time difference (in min) of overalkylated and non-overalkylated peptides, % int: percentage of total intensity of overalkylated and non-overalkylated peptides. Median RT, log(int), RTdiff, and % int were calculated from the triplicate analysis.
Representative overalkylated peptides detected by 1D LC-MS/MS of the NISTmAb using Protocol 4 GuanRT/TCEP with 18 h of digestion.Note:all cysteines were alkylated. RT min retention time in min. log(int): log of summed peak area of all charge states. RTdiff: retention time difference (in min) of overalkylated and non-overalkylated peptides, % int: percentage of total intensity of overalkylated and non-overalkylated peptides. Median RT, log(int), RTdiff, and % int were calculated from the triplicate analysis.
Class 5 modifications
Class 5 peptides include biological post-translational modifications (PTM) present in the original sample as well as analytical artifacts produced during liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis. The former is often an important target of proteomics, whereas the latter is a nuisance, especially when impeding the analysis of the former. As seen in Figure 1A, between 665 to 744 different modified peptide ions were identified in a triplicate experiment at different digestion times, constituting 49% to 55% of identified product ions. The ion intensity for this class accounted for 15% to 18% of the total identified ion intensity (Figure 1B), making it the third most abundant class. Among all Class 5 modifications, Met/Trp oxidation, N-terminal Cys or Gln loss of ammonia, and water loss from N-terminal Glu are commonly observed in proteomic experiments, while other modifications are dependent on the target of analytical interest or specific experimental conditions (see Peptide Classification Scheme in Materials and Methods). These distinct types of modifications are further divided as subclasses 5a (common) and 5b (rare) in identifying false positives during the library development (see Discussion).
Class 6 semi-tryptic peptides
Class 6 peptides have one terminus resulting from a non-tryptic cleavage. From the results of triplicate analyses by three separate tryptic digests at 0.25 h, 2 h and 18 h, a total of 219 (18%), 293 (22%) and 415 (28%) different semi-tryptic peptide ions were detected, respectively (Figure 1A). However, these are low abundance peptides accounting for only 4% to 7% of the total abundance identified. These peptides are separated into two subclasses.In-source fragments. These are well-known artifacts easily identified by their coelution with their more abundant presumed precursor tryptic peptide. Their precursor m/z must match major fragment ions in the spectrum of the parent ion under lower collisional energies. In our single experiment, typically 30% of all identified semi-tryptic peptides originate from in-source fragmentation. For example, a prominent in-source peptide PAPIEK arose from ALPAPIEK, where the PAPIEK fragment was dominant, consistent with the known preference for N-terminal cleavage of proline and the facile formation of B2-ions from short peptides.“In-solution” fragments. Peptides arise from irregular cleavage during sample preparation or digestion. These presumably result from inaccurate trypsin cleavage due to incomplete denaturation or related anomalies. Most identified semi-tryptic peptide ions belong to this subclass. Of the total identified semi-tryptic peptides, 157, 210, and 298 semi-tryptic peptides were found for 0.25 h, 2 h and 18 h digestions, respectively, all with elution times different than their potential parent tryptic peptides.
Primary sequence peptides
The representative tryptic peptides derived from the NISTmAb are shown in Table 4, acquired from the same triplicate experiment with 18 h of digestion that was used for reporting peptide classes (see Section 1.2). Most are Class 1 (fully tryptic, unmodified peptides), but a few are Class 5 peptides, representing major protein modifications: N-terminal pyroglutamic acids (H1), N-linked glycopeptides (H22 and H23), and C-terminal lysine clip (H38). Fifty-eight of 62 possible fully-cleaved peptides by trypsin were detected in the 1D LC-MS/MS analyses, and four possible dipeptides were not detected (Table 4). The LC-MS attributes (theoretical precursor m/z, absolute and relative abundance, and retention time (RT)) of each peptide ion were extracted from raw data and determined in their most abundant charge state. The median values of these attributes were calculated across triplicate analysis of the same NISTmAb digest. The mass accuracy of all identified peptides is within 5 ppm.
Table 4.
Detailed analysis of 1D LC-MS peptide mapping data determined for the NISTmAb heavy and light chains. Median values in Column 4 – 8 were calculated across triplicate injections for each peptide under Protocol 4 GuanRT/TCEP with 18 h of digestion.
summed values of N-linked glycopeptides carrying asialylated, biantennary core-fucosylated glycans.
values obtained from 2 h of digestion.
the C-terminal Lys clip.
most abundant charge state.
all cysteine residues are alkylated.
four possible dipeptides were not detected, e.g., DR (heavy chain 67–68), AK (heavy chain 342–343), SR (heavy chain 418–419), HK (light chain 187–188), but present in peptides with miscleavages. Pep: tryptic peptide number. : theoretical value. Relab: ratio of the ion abundance relative to the maximum ion of the run. stdev: standard deviations of relative abundances. H22, L18, L21: these peptides represented expected-miscleavages are included to complement the primary structure.
Detailed analysis of 1D LC-MS peptide mapping data determined for the NISTmAb heavy and light chains. Median values in Column 4 – 8 were calculated across triplicate injections for each peptide under Protocol 4 GuanRT/TCEP with 18 h of digestion.Note:pQ = protein N-terminal pyroglutamic acid.pQ = peptide N-terminal pyroglutamic acid.summed values of N-linked glycopeptides carrying asialylated, biantennary core-fucosylated glycans.values obtained from 2 h of digestion.the C-terminal Lys clip.most abundant charge state.all cysteine residues are alkylated.four possible dipeptides were not detected, e.g., DR (heavy chain 67–68), AK (heavy chain 342–343), SR (heavy chain 418–419), HK (light chain 187–188), but present in peptides with miscleavages. Pep: tryptic peptide number. : theoretical value. Relab: ratio of the ion abundance relative to the maximum ion of the run. stdev: standard deviations of relative abundances. H22, L18, L21: these peptides represented expected-miscleavages are included to complement the primary structure.
Early-eluting peptides
Peptides having five or fewer amino acids account for 40% of all theoretical peptides and 14% of the amino acid residues in NISTmAb. Since it is often challenging to find short peptides due to their generally poor resolution of early eluting peptides, they were identified by the targeted analysis like that developed for glycation in the next section. As shown in Table 4, most short peptides eluted in the solvent front along with digest buffer components where their signals are reduced due to ion suppression and chromatographic peak broadening (data not shown). This explains both their lower detection rate and variable abundance compared to the better-resolved peptides.
Modified peptides with complex fragmentation characteristics
A total of 24 different modification categories were clearly identified (Table 5) and the probable origins of each category are noted in the last column. All modified peptide spectra are included in the library. Among them, some common modifications such as deamidation, oxidation, or formylation can be readily identified from their tandem spectra by mass shifts in precursor and product ions. However, others such as glycosylation, glycation, alkylation of methionine, or metal-ion adduction can greatly influence peptide fragmentation. For these latter modifications, special methods are needed for their identification and are discussed below.
Table 5.
Library summary of a total of 24 modification categories detected in the NISTmAb tryptic peptides by combined results of all 13 sample preparation protocols. Glycosylation was previously reported in Reference 16.
Modification
Mass difference
Sitea
# Ions
# Spectra
Origin
Oxidation
15.9949
M, W, K, H, P
668
2482
in-sample, in-source, in-column or in-digestion
Deamidated
0.9840
N, Q
245
996
in-sample or in-digestion
Formyl
27.9949
N-terminal, K, S, T
117
483
in-digestion
Hex
162.0528
N-terminal, K
93
656
in-sample or in-digestion
Gln->pyro-Glu
−17.0265
Q at N-terminal
70
291
in-sample or in-digestion
Cation:Fe[III]
52.9115
any
42
60
in-source
Glu->pyro-Glu
−18.0106
E at N-terminal
33
86
in-digestion
Carbamidomethyl
57.0215
M, N-terminal
30
105
in-digestion
Cation:Fe[II]
53.9193
any
29
43
in-source
Carbamyl
43.0058
N-terminal
28
66
in-digestion
Lys (transpeptidation)
128.0950
N-terminal
25
70
in-digestion
Cation:Na
21.9819
any
22
37
in-source
Lys loss (heavy chain)
−128.0950
C-terminal
17
72
in-sample
Arg (transpeptidation)
156.1011
N-terminal
17
49
in-digestion
Cation:Ca[II]
37.9469
any
17
29
in-source
Cation:2Na
43.9639
any
15
22
in-source
Carboxy
43.9898
M, N-terminal
13
23
in-digestion
Dehydrated
−18.0106
N-terminal
12
36
in-source or in-digestion
Pyro-carbamidomethyl
39.9949
C at N-terminal
5
17
in-digestion
Ammonia-loss
−17.0265
N
5
15
in-sample or in-digestion
Nitro
44.9851
Y
4
9
in-source
Acetyl
42.0106
H
4
5
in-sample or in-digestion
Dioxidation
31.9898
W
3
9
in-sample or in-source
Trioxidation
47.9847
W
1
1
in-sample or in-source
Note:
the N-terminal modification occurred during the digestion process refers to peptides, while the N-terminal modification detected in sample refers to protein. For example, Gln->pyro-Glu and Hex were found either in the protein N-terminal (in-sample) or at the peptide N-terminal (in-digestion), while other N-terminal modifications only occurred at the peptide level.
Library summary of a total of 24 modification categories detected in the NISTmAb tryptic peptides by combined results of all 13 sample preparation protocols. Glycosylation was previously reported in Reference 16.Note:the N-terminal modification occurred during the digestion process refers to peptides, while the N-terminal modification detected in sample refers to protein. For example, Gln->pyro-Glu and Hex were found either in the protein N-terminal (in-sample) or at the peptide N-terminal (in-digestion), while other N-terminal modifications only occurred at the peptide level.
Glycated peptides
Protein glycation is the non-enzymatic adduction of a sugar molecule on the ε-amino groups on lysine or on a protein N-terminus. It most commonly leads to a mass increase of 162 Da by a hexose sugar. Glycation presents a special challenge because the fragmentation of glycated peptides involves the breakdown of the sugar, generating abundant product ions corresponding to various characteristic neutral losses, but limited peptide sequence ions. The localization of glycation on lysine resides impedes trypsin cleavage and, combined with their low abundance, adds to identification difficulties. Most conventional sequence search engines usually miss these modifications, and in view of their significance in mAb quality measurement, a special workflow involving both 1D and 2D LC-MS/MS analyses directed for their identification was developed. This is shown in Figure 2 and summarized below.
Figure 2.
Workflow for identifying peptides and sites of glycation in the NISTmAb.
A list was first created of m/z, charge state, and RT of all library-identified, unmodified peptides (Classes 1, 2, and 3) containing either lysine or the protein N-terminus and containing up to three missed cleavages.For each ion in the above list, all precursor ions with masses greater by a hexose unit (within 5 ppm) and within a 10-min retention window were marked as possible glycated peptides.Identifications were validated by ensuring a constant shift in RT for a given glycated peptide ion relative to its non-glycated analog. It was found that virtually all glycated peptides eluted within 3 min of their unmodified analog. Consequently, spectra of components eluting outside this range were rejected.Candidate spectra obtained in Step 3 were confirmed if all major fragment ions were consistent with the peptide and they contained no major unexplained fragments (>20% of the base peak). Specifically, various characteristic neutral losses were associated with glycation, four of which were frequently observed: 3 H2O (54.03 Da), 3 H2O + HCHO (84.04 Da), 4 H2O (72.04 Da), and C6H10O5 (162.05 Da), from the precursor, y, b ion series. Filters in the Steps 3 and 4 led to rejection of approximately 10% of the candidate spectra.Workflow for identifying peptides and sites of glycation in the NISTmAb.Using the above workflow, a total of 276 spectra of 58 glycated peptide ions were acquired by 1D LC-MS/MS analysis. Additionally, 590 spectra were identified for 92 glycated peptide ions in the 2D LC-MS/MS fractionation runs, covering a range of different collisional energies. Table 6 summarizes the results of the glycated peptides identified in at least two runs of the 1D or 2D studies. These data include peptide sequence, charge states, ion intensity (total peak area summed over charge states), as well as the observed RT (min) and the RT difference between glycated and non-glycated peptides. All observed glycated peptides and their relative RT in the 1D analyses were confirmed in the 2D analyses.
Table 6.
Abundances and retention times of glycated peptides obtained from 1D and 2D LC-MS/MS analyses of 2 h NISTmAb digests. The 1D study includes 10 protocols excluding Protocols 3, 12, and 13 (see Method), while 2D study uses only Protocol 2 GuanRT/DTT.
2D LC-MS
1D LC-MS
region
site
peptide sequence
zc
log(int)
RT min
RT diff
# runs
stdev
log(int)
RT min
RT diff
# runs
stdev
Heavy chain: V and CDR2 (58, 66)
58
ALEWLADIWWDDKK
2,3
8.85
81.66
−0.46
13
0.20
7.32
70.96
−0.75
7
0.03
66
HYNPSLKDR
2,3
9.29
26.33
0.17
14
0.08
73
LTISKa
2
8.21
30.14
2.48
13
0.33
LTISKDTSKNQVVLK
2,3,4
9.75
41.28
−0.06
18
0.02
7.90
28.33
−0.18
12
0.04
LTISKDTSK
2,3
9.68
26.44
0.24
13
0.13
7.89
15.05
0.37
9
0.10
77
DTSKNQVVLK
2,3
8.85
30.55
0.61
17
0.06
7.56
17.43
0.24
9
0.03
83
NQVVLKVTNMDPADTATYYCAR
2,3,4
7.99
62.11
−0.16
8
0.05
Heavy chain: CH1
136
GPSVFPLAPSSKa
2
8.01
54.17
−0.43
17
0.08
GPSVFPLAPSSKSTSGGTAALGCLVK
2,3
8.82
65.90
−0.48
18
0.02
8.15
53.78
−0.47
11
0.04
hinge
225/249
SCDKTHTCPPCPAPELLGGPSVFLFPPKPK
3,4,5
9.57
64.02
−1.12
17
0.23
8.81
54.69
−0.35
21
0.09
Heavy chain: CH2
251
PKDTLMISR
2,3
7.91
38.46
−0.03
18
0.03
7.32
26.15
−0.01
3
0.05
291/293
FNWYVDGVEVHNAKTKPR
3,4,5
9.05
50.59
−0.18
17
0.05
320
VVSVLTVLHQDWLNGKEYK
2,3,4
8.59
70.02
−0.41
18
0.12
8.19
59.47
−0.31
25
0.05
323
VVSVLTVLHQDWLNGKEYKCK
3,4,5
9.13
66.65
−0.18
18
0.02
8.23
54.18
−0.20
3
0.02
325
CKVSNKALPAPIEK
3,4
8.84
37.01
0.01
18
0.03
7.79
23.33
0.04
10
0.04
329
VSNKALPAPIEK
2,3
10.51
38.63
−0.02
18
0.01
8.76
24.94
−0.04
25
0.02
337
ALPAPIEKa
2
8.79
40.37
0.98
18
0.25
ALPAPIEKTISK
2,3
9.17
46.92
−0.13
17
0.05
7.92
33.67
−0.10
5
0.07
Heavy chain: CH3
363
EEMTKNQVSLTCLVK
2,3
8.42
54.25
−0.19
16
0.03
395
GFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSK
3,4
7.82
84.09
−0.32
3
0.01
417
LTVDKSR
2,3
8.36
24.62
0.70
15
0.16
442
WQQGNVFSCSVMHEALHNHYTQKSLSLSPG
3,4,5
8.92
65.55
−0.19
17
0.08
450
SLSLSPGK
2
8.41
38.96
0.17
17
0.03
light chain: N-terminus
1
DIQMTQSPSTLSASVGDR
2,3
8.71
59.08
0.73
18
0.12
light chain: V and CDR2 (52)
19
VTITCSASSRb
2
7.75
36.21
0.52
18
0.09
38/41
VGYMHWYQQKPGK
2,3,4
8.75
45.13
−0.30
18
0.10
VGYMHWYQQKPGKAPK
3,4,5
9.41
42.02
−0.21
18
0.06
8.31
28.92
−0.14
14
0.07
44
APKLLIYDTSK
3
7.81
48.11
−0.21
17
0.04
52
LLIYDTSKa
2
7.93
47.95
−0.49
18
0.10
LLIYDTSKLASGVPSR
2,3
9.13
59.98
−0.19
18
0.03
8.40
47.96
−0.21
19
0.03
53
LASGVPSRb
2
7.81
33.79
1.65
18
0.20
102
YCFQGSGYPFTFGGGTKVEIKR
3,4
8.30
63.28
−0.18
16
0.05
106
VEIKR
1,2
7.96
23.02
0.56
8
0.24
light chain: C
125
TVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPR
3,4
9.42
94.81
−0.10
17
0.10
126
SGTASVVCLLNNFYPRb
2,3
8.14
79.24
−0.98
2
0.01
144
EAKVQWK
2,3
8.88
30.05
0.69
16
0.13
148
VQWKVDNALQSGNSQESVTEQDSK
2,3,4
9.53
53.52
−0.01
18
0.04
7.64
40.98
−0.03
18
0.03
VQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK
4
8.75
66.83
−0.09
18
0.04
168
VDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK
3,4
8.97
63.28
−0.50
17
0.14
VDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEK
3,4
9.98
62.44
−0.31
18
0.02
8.51
50.22
−0.39
15
0.05
VDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHK
4,5,6
9.27
58.00
−0.23
18
0.03
8.17
45.14
−0.17
9
0.06
182
DSTYSLSSTLTLSKADYEK
3
8.64
62.27
−0.38
15
0.05
187
ADYEKHKVYACEVTHQGLSSPVTK
2,3,4,5,6
10.48
43.95
−0.02
18
0.01
8.83
30.62
−0.03
18
0.03
189
HKVYACEVTHQGLSSPVTK
3,4
9.14
39.30
−0.13
9
0.09
8.10
26.75
−0.10
19
0.04
206
VYACEVTHQGLSSPVTKa
3
8.05
45.93
−0.54
18
0.03
VYACEVTHQGLSSPVTKSFNRGEC
3,4
8.83
52.21
−0.08
18
0.03
7.84
39.29
−0.07
3
0.03
Note:
Glycation was detected on the peptide C-terminal lysine residue.
Glycation was detected on the peptide N-terminus. Both cases (a and b) were suspected to occur during or after digestion.
All charge states detected for this peptide with the most abundant charge shown in boldface. All cysteines were alkylated. region: V – variable domains, C – constant domains, CH, CH, CH – constant domains 1, 2 and 3, CDR2 – complementarity-determining region 2; log(int): median absolute ion intensity (total peak area) of the glycated peptide based on summing up all charge states; RT: median retention time (min); RTdiff: glycated RT – unglycated RT; stdev: the standard deviation of RTdiff (min);
Abundances and retention times of glycated peptides obtained from 1D and 2D LC-MS/MS analyses of 2 h NISTmAb digests. The 1D study includes 10 protocols excluding Protocols 3, 12, and 13 (see Method), while 2D study uses only Protocol 2 GuanRT/DTT.Note:Glycation was detected on the peptide C-terminal lysine residue.Glycation was detected on the peptide N-terminus. Both cases (a and b) were suspected to occur during or after digestion.All charge states detected for this peptide with the most abundant charge shown in boldface. All cysteines were alkylated. region: V – variable domains, C – constant domains, CH, CH, CH – constant domains 1, 2 and 3, CDR2 – complementarity-determining region 2; log(int): median absolute ion intensity (total peak area) of the glycated peptide based on summing up all charge states; RT: median retention time (min); RTdiff: glycated RT – unglycated RT; stdev: the standard deviation of RTdiff (min);The median RT values of the high-performance liquid chromatography (HPLC) for glycated peptides and their RT difference from that of the unmodified forms in Table 6 reveal that most of glycated peptides exhibit a consistent, small RT shift within ± 1 min from their unmodified forms. The exceptions are several short peptides that are shifted by as much as 3 min. Figure 3 shows the retention shifts of glycated peptides. Generally, glycated peptides elute after their unmodified forms when RT < 40 min and before their unmodified forms when RT > 40 min.
Figure 3.
The LC-MS elution characteristics of glycated versus non-glycated peptides. Red dots indicate glycated peptides eluting after the unmodified form (most with fewer than 10 residues) while blue dots eluting before the unmodified form.
The LC-MS elution characteristics of glycated versus non-glycated peptides. Red dots indicate glycated peptides eluting after the unmodified form (most with fewer than 10 residues) while blue dots eluting before the unmodified form.The 2D separations also found several unexpected glycation sites. As noted in Table 6, glycation was found at the C-terminal lysine residue in five peptides and at the N-terminus of three peptides. These identifications were carefully confirmed and validated by manual inspection of the data from full MS scan (MS1) and tandem spectra (MS2). Examples are included in Supplementary Material Figure S-2. We suspect that these peptides may have been glycated during sample processing.Glycation is known to alter peptide fragmentation due to the ease of fragmenting the sugar group, generally resulting in few sequence-specific product ions, thereby hindering their automated identification. This is illustrated in Figure 4, which compares spectra of a glycated to a non-glycated peptide. As shown here, the mass spectrum of a glycated peptide is typically dominated by characteristic, abundant fragment ions corresponding to the losses of water molecules from the labile hexose sugar. All high-quality glycation spectra were annotated and added to the spectral library.
Figure 4.
Tandem mass spectra of (A) triply-charged glycated peptide ion at 21 eV, DSTYSLSSTLTLSKADYEK, compared with (B) triply-charged unmodified form of the same peptide sequence at 29 eV.
Tandem mass spectra of (A) triply-charged glycated peptide ion at 21 eV, DSTYSLSSTLTLSKADYEK, compared with (B) triply-charged unmodified form of the same peptide sequence at 29 eV.
Metallated peptides
Peptides containing metal ion adducts are common low abundance artifacts in electrospray ionization (ESI). A total of 191 such spectra with sodium, calcium, or iron were identified by MS-GF+, half of them with low scores. The most common such peptide ion adduct was sodium; 59 sodiated ions were identified. Examination of adduct fragmentation behavior showed that the metal atom did not generally appear to be localized at specific amino acids as suggested by Unimod. Rather, metal atoms appear to be bound to multiple amino acids in a peptide sequence, effectively impeding fragmentation in the relevant region, and reducing their “scores” from sequence search methods. Furthermore, the formation of a unique characteristic product ion was frequently observed in singly-charged sodiated peptides via rearrangement involving the loss of the C-terminal residue upon fragmentation.Fragmentation characteristics of sodiated peptides are examined and compared to their non-sodiated counterparts in Figure 5. The spectrum without sodium in Figure 5B exhibits uniform dissociation, yielding both b and y product ion series. Many of these fragment ions are absent in the sodiated spectrum in Figure 5A. The sodiated spectrum primarily contains abundant singly-charged, sodium-containing y fragment ions: y7 to y9 and b10, and low abundance doubly-charged, sodium-containing y ions, y9 to y12. As shown in Figure 5A, the impeded fragmentation region (indicated by a red line) in the sodiated peptide is consistent with sodium bound to these multiple amino acid residues. Note also the fragment at m/z 758.3365, corresponding to a truncated, 2+ ion VGYMHWYQQKPG, apparently arising from a rearrangement involving the loss of the C-terminal Lys, a known diagnostic ion for sodiated peptides.
Figure 5.
Tandem mass spectra of (A) doubly-charged sodiated peptide ion, VGYMHWYQQKPGK compared with (B) doubly-charged non-sodiated form of the same sequence.
Tandem mass spectra of (A) doubly-charged sodiated peptide ion, VGYMHWYQQKPGK compared with (B) doubly-charged non-sodiated form of the same sequence.Because the fragmentation of metal adducts typically produces less sequence information than corresponding non-adducted peptide ions, their confident identification required confirmation by their co-elution with the non-metallated precursor. Additionally, the peak signal-to-noise ratio requires that there be at least 15% of the adduct-containing peaks with intensities above 10% of the base peak.
Alkylated methionine
As illustrated in Figure 6, the product ions in HCD from these methionine-alkylated peptides show significant non-sequence specific cleavage, with an abundant neutral loss of C3H7NOS, making them hard to identify by sequence search engines. The identification of these peptides was conducted using a workflow similar to that developed for glycation summarized in Figure 2 and four steps of the previous section. Specifically, all library-identified, unmodified peptide ions (Classes 1 and 2) were collected in the first step, and they were considered as possible overalkylated peptides in Step 2 if their masses are greater by carbamidomethylation (within 5 ppm). The HCD fragmentation analysis included an abundant neutral loss of C3H7NOS for methionine alkylation.
Figure 6.
Tandem mass spectra of triply-charged peptide ion with methionine alkylation, WQQGNVFSC(CAM)SVM(CAM)HEALHNHYTQK, at m/z 953.4344. Abundant peaks in the high m/z region arise from a neutral loss of C3H7NOS (105 Da) from parent and y ion series.
Tandem mass spectra of triply-charged peptide ion with methionine alkylation, WQQGNVFSC(CAM)SVM(CAM)HEALHNHYTQK, at m/z 953.4344. Abundant peaks in the high m/z region arise from a neutral loss of C3H7NOS (105 Da) from parent and y ion series.
Applications of the NISTmAb spectral library to structure and PTM characterization
Confirmation of NISTmAb primary sequence
Peptides in the library cover 99% of the NISTmAb sequence, representing 211 of 213 light chain residues and 444 of 450 heavy chain residues. Though most PTM species of IgG1-based antibodies have very low abundances, three of them have relative intense signals, namely N-terminal pyro-glu, clipped C-terminal Lys, and N-linked glycosylation (Table 4). As reported previously, the N-linked glycosylation site was approximately 99% occupied. The N-terminal glutamine residue was almost completely converted to pyroglutamic acids and approximately 90% of the terminal Lys was truncated. These observations obtained in the present 1D study agree with previous NISTmAb analysis reports.
Determination of glycation sites on NISTmAb
Glycation of mAbs can occur at low levels during biomanufacturing where sugars are used as energy and carbon sources. In principle, glycation could occur on every available free amino site across the entire recombinant antibody. However, only a limited number of glycated sites detected on the heavy and light chains have been reported so far using standard analytical approaches. Their low abundance and interference with tryptic cleavage have made these modifications difficult to identify and quantify.NISTmAb has a total of 51 potential glycation sites, with an N-terminal amine for each antibody chain, and 35 and 14 lysine residues in the heavy chain and light chain, respectively. From the present 1D LC-MS/MS runs, glycation was found on 11 heavy and 7 light chain lysine residues. From 2D LC-MS/MS runs, a total of 21 (58% of possible) glycation sites were detected in the heavy chain and 15 (100%) in the light chain, or nearly 3/4 of all possible sites. These glycation sites detected in the present 1D study are generally in agreement with previous reports, while additional 16 glycated sites are detected by the 2D study.Potential glycation sites on lysine residues are distributed across the entire NISTmAb as shown in Columns 1 and 2 of Table 6. There are 16 in the constant domains (CH1, CH2, and CH3) of the heavy chain and 8 in the constant region of the light chain, all of which should be applicable to other IgG1 antibodies. Glycation at lysine residues 58 and 66 in the heavy chain and lysine 52 in the light chain are involved in the complementarity-determining region (CDR) 2, and thus they could be critical sites to monitor due to the known biological function in the CDR regions for antigen binding. The C-terminal lysine of the heavy chain was also glycated at a low level. Additionally, glycation was observed on the free N-terminus of the light chain in 2D LC-MS/MS runs, although the level was very low compared to that of lysine residues, consistent with the observation by Gadgil et al. No glycation was detected on the heavy chain N-terminus by either 1D or 2D runs, probably due to the conversion of nearly all terminal glutamic acid residue to pyroglutamic acid.To summarize, glycated peptide spectra were identified for most lysine residues in the NISTmAb. The light chain potential sites are documented for the first time to be fully accessible to glycation. However, all identified glycated peptides are minor species, with the relatively intense glycated ions typically having around 0.3% of the intensity of the most abundant ion in the same 1D LC-MS run.
Quantification of methionine oxidation in NISTmAb
We found that the peptides containing oxidized methionine showed complex elution behavior. This has been reported to complicate accurate quantification of site-specific methionine oxidation in antibodies. To assess the problem in the present work, all eight methionine residues in the NISTmAb were examined. Six of the Met residues are part of the heavy chain with three in the Fv domain (Met 34, Met 87, and Met 101) and three in the Fc domain (Met 255, Met 361, and Met 431), and the other two are part of the light chain Fab domain (Met 4 and Met 32). A more detailed discussion of these findings follows.
Observation of in-sample, in-column and in-source oxidation
We observed that methionine oxidation often produced two closely eluting chromatographic peaks connected by an ion signal of lower and variable abundance. The three selected ion chromatograms of oxidized peptides in Figure 7 are examples that illustrate this “bridge-like” feature. As can be seen in a1, b1, and c1 of Figure 7, unoxidized peptides elute at 31.3, 35.9, and 44.5 min. All oxidation artifacts generated by in-source oxidation coelute with their parent peptides shown in a2, b2, and c2. The in-sample oxidized peptides elute earlier than their unoxidized counterparts. Note the significant continuous signal between the in-sample oxidized and in-source oxidized peaks. The MS1 peaks shown in Figure 7 exhibit similar patterns along this “bridge”, and MS2 spectra from this retention interval were further identified to contain oxidized methionine residues. We postulate that the signal observed between in-sample and in-source oxidized chromatographic peaks is generated by in-column oxidation of unoxidized peptides containing Met residue during their passage through the column. The origin of the signal between in-sample and in-source peaks was confirmed by data from 2D studies where native and oxidized peptides were separated by the first LC into different fractions (data not shown). In summary, peptides containing a methionine residue are susceptible to oxidation in the column leading to the generation of complex chromatographic peaks from pre-column, in-column, and in-source (post-column) oxidation.
Figure 7.
Selected ion chromatograms of oxidized methionine containing peptides illustrating the “bridge-like” peak connecting in-source and in-sample formed peptides. Unoxidized methionine residues (Met255, Met361, Met431) are shown in a1, b1, and c1 while oxidized methionine residues (MetOX255, MetOX361, MetOX431) shown in a2, b2, and c2.
Selected ion chromatograms of oxidized methionine containing peptides illustrating the “bridge-like” peak connecting in-source and in-sample formed peptides. Unoxidized methionine residues (Met255, Met361, Met431) are shown in a1, b1, and c1 while oxidized methionine residues (MetOX255, MetOX361, MetOX431) shown in a2, b2, and c2.
Site specific quantification
Oxidized forms of all eight methionine residues were identified and spectra for 298 peptide ions are included in the current library. We inspected four 1D LC-MS analyses, performed on the same day that demonstrated minimal oxidation artifacts, to manually separate the effects of in-sample from in-column and in-source oxidation and quantify the relative percentage of all oxidized methionine residues. The resulting levels (as well as standard deviations) of in-sample oxidation in the peptides containing Met sulfoxide are in good agreement with the similar manual analysis reported by an interlaboratory study on identification and quantification of the NISTmAbmethionine oxidation (Figure 8). Consistent with that study, Met 255 and Met 431 are the two most oxidized sites, with 3.4% and 3.0% abundance relative to the corresponding unoxidized sites, respectively. Next are Met 34 of the heavy chain and Met 32 of the light chain, with approximately 2% oxidation. Met 361, Met 87 and Met 4 exhibit low levels of observed oxidation, at 0.7%, 1.3%, and 1.4%, respectively. As demonstrated in both studies of the NISTmAb, accurate methionine quantification requires careful separation of native and artifact oxidized peptides.
Figure 8.
Comparison of relative abundance of each oxidized Met reside in the NISTmAb reported by this study and an interlaboratory study of identification and quantification of NISTmAb methionine oxidation. NIST, this study; LAB1 – LAB3, Laboratory 1 to Laboratory 3 in Reference 36. LAB3 conducted a manual quantification of methionine oxidation. H, heavy chain. L, light chain.
Comparison of relative abundance of each oxidized Met reside in the NISTmAb reported by this study and an interlaboratory study of identification and quantification of NISTmAbmethionine oxidation. NIST, this study; LAB1 – LAB3, Laboratory 1 to Laboratory 3 in Reference 36. LAB3 conducted a manual quantification of methionine oxidation. H, heavy chain. L, light chain.
Discussion
False positive identifications
The NISTmAb spectral library is intended to contain all identifiable peptides derived from a tryptic digest. Because of the wide variety of peptides and modifications, and their wide distribution of abundances, it is not possible to derive statistically meaningful false discovery rates, as is widespread practice in global proteomics where thousands of proteins containing a very limited number of modifications are analyzed.,
In this work, all identifications were made using sequence engines, targeted analyses and a variety of peptide classification rules. One of the most significant rules is not allowing more than two rare modifications/cleavages on a single peptide (e.g., two iron atoms in a single peptide). Rare features include unexpected miscleavages, under- or over-alkylation, uncommon and low abundance modifications, and semi-tryptic peptides (Classes 3, 4, 5b, and 6b), and were generally considered as improbable peptide features. Practically, as a rule, while the assignment of two or more rare modifications would trigger spectrum rejection, multiple missed cleaved or alkylated sites were allowed as they were frequently observed.We illustrate two examples of spectra rejected using these rules. The first rejected identification is shown in Figure 9A. In this example, a spectrum was initially identified as non-alkylated formylSCDhexKTHTCPPCPAPELLGGPSVFLFPPKPK (heavy chain 222–251) at m/z 839.1618, which is within 4 ppm of its theoretical value at charge state 4+. This false identification exhibits a combination of uncommon features: 1) misalkylation of all three cysteine residues, 2) N-terminal formylation, and 3) glycation of lysine. In other respects, it satisfied criteria for acceptance. We determined that this case might be better explained by the unintended sampling of the 13C isotope of a tryptic peptide, specifically the mono-oxidized heavy chain 222–251 peptide. Reexamination of the raw data justified the latter explanation. The second example is the rejected false identification of the triply-charged, modified light chain N-terminal peptide, as DdehydratedIcation:2Ca[II]QMTQSPSTLSASVGDR at m/z 650.933 (Figure 9B). Although 63% of total product ion abundance was assignable to the dominant y series ions, two postulated N-terminal modifications led to the exclusion of this identification. The more probable assignment resulted from a manual inspection, which found that 75% of the total number of peaks (ranging from 2% to 23% of the base peak) corresponded to b series ions, arising from the loss of C 3H7NOS from the methionine-alkylated light chain L1 peptide at m/z 650.645. It was also confirmed by MS1 analysis of the ion isotopic m/z values.
Figure 9.
Examples of rejected peptide spectra. (A) Rejected MS/MS spectrum identified as formylSCDhexKTHTCPPCPAPELLGGPSVFLFPPKPK at m/z 839.1618. (B) Rejected MS/MS spectrum identified as triply-charged tryptic peptide, DdehydratedIcation:2Ca[II]QMTQSPSTLSASVGDR at m/z 650.933.
Examples of rejected peptide spectra. (A) Rejected MS/MS spectrum identified as formylSCDhexKTHTCPPCPAPELLGGPSVFLFPPKPK at m/z 839.1618. (B) Rejected MS/MS spectrum identified as triply-charged tryptic peptide, DdehydratedIcation:2Ca[II]QMTQSPSTLSASVGDR at m/z 650.933.In another test of the false positive discriminating potential of this library, we used the NISTmAb library to match spectra from an extensive global proteomic tryptic Jurkat cell digest. As an immortalized line of human T lymphocyte cells, Jurkat cells are not expected to express antibodies. The library search was performed with the NIST MSQC pipeline, a software platform for analysis of shotgun proteomics, and identification score threshold was set to 450 (1000 as highest). This search did not yield any identifications above this threshold. In contrast, other NIST human libraries identified 25,000 peptide ions in this dataset, an indication of the specificity of the NISTmAb library. Clearly, for the present library, false positive identifications from proteins unrelated to an IgG sample appear very unlikely.Because the NISTmAb constant regions are very similar for all IgG antibodies, the present library is suitable for identifying peptides and modifications originating from these regions. In Figure 10, we demonstrate this using LC-MS/MS results for Humira®, a mAb drug commercially produced in CHO cells rather than a mouse cell line, which was used for production of the NISTmAb. Because the principal glycosylation site is in the conserved region, the previously published NISTmAbglycopeptide library is directly applicable to all such IgGs (N.B. the glycopeptide library includes sialylated glycans containing Neu5Ac and Neu5Gc). This analysis was carried out using the NIST MSQC Pipeline. It identified 691 peptide spectra using a match score threshold of 450, of which 497 are from the heavy chain constant regions, and 194 from the light chain constant region. These include 165 major tryptic peptides (Class 1 and 2), 80 glycopeptides representing 25 different glycan structures (including three sialylated glycans with Neu5Ac), 83 oxidized peptides containing oxidation on Met, Trp, His, and Pro, 15 glycated peptides identifying 10 main glycation sites, and 141 deamidated peptides from Asn or Gln. Also, shown in gray, dark and light blue are other peptides, which included many analytical artifacts such as sodium adducts and formylation, and semi-tryptic and miscleaved peptides. This example demonstrates the facile re-identification of major peptide classes and known PTMs by a NISTmAb library search. We also showed that artifact peptides can be recognized that might otherwise lead to false positive identifications, and that their identities can be used for assessing the quality of the sample preparation process.
Figure 10.
Peptide identifications in a Humira® digest using the library developed in this work. Humira® was digested for 2 h after denaturing in 6 M guanidine at room temperature. Major, Classes 1 and 2 peptides; Glyco, N-linked glycopeptides; Oxid, Oxidized peptides; Deam, deamidated peptides; Glyca, Glycated peptides; Other, many analytical artifacts such as sodium adducts and formyl, and very long peptides; Semi, semi-tryptic peptides; Diag, miscleaved, and missed/over alkylated peptides.
Peptide identifications in a Humira® digest using the library developed in this work. Humira® was digested for 2 h after denaturing in 6 M guanidine at room temperature. Major, Classes 1 and 2 peptides; Glyco, N-linked glycopeptides; Oxid, Oxidized peptides; Deam, deamidated peptides; Glyca, Glycated peptides; Other, many analytical artifacts such as sodium adducts and formyl, and very long peptides; Semi, semi-tryptic peptides; Diag, miscleaved, and missed/over alkylated peptides.
Materials and methods
Materials
The Sample 8670 NISTmAb (an interim material), lot 3f1b, is an in-house IgG1κ mAb derived from a separate production lot of NISTmAb 8671. It was expressed by NS0, a model cell line from murinemyeloma, obtained from the Bioanalytical Science Group at NIST. Digestion reagents guanidine hydrochloride, urea, dithiothreitol (DTT), iodoacetamide (IAA) and Sigma trypsin T1426 were purchased from Sigma-Aldrich (St. Louis, MO). Sequencing-grade trypsin was purchased from Promega (Madison, WI). RapiGest, brand-name for sodium 3-[(2-methyl-2-undecyl-1,3-dioxolan-4-yl)methoxy]-1-propanesulfonate, was purchased from Waters (Milford, MA); Zeba spin columns (7K molecular weight cutoff (MWCO)) were purchased from Thermo Fisher Scientific (Waltham, MA). Chromatographic separations were performed on an Acclaim pepmap100 nano column (150 mm × 75 μm, C18, 3 μm particle size, 100 Å pore size, Dionex, Sunnyvale, CA). Humira pen was purchased from Abbott Laboratories (North Chicago, IL).
Tryptic digestion
General procedure
500 μg of NISTmAb was denatured for 10 min using various chemicals and solvents in different conditions in 50 μL of 100 mmol/L Tris HCI (Tris(hydroxymethyl)aminomethane hydrochloride) buffer at either room temperature or high temperature of 85°C (HT). Reduction was performed by adding 2.5 μL of 200 mmol/L DTT to the above denatured mixture and incubating at room temperature for 1 h, followed by alkylation with 10 μL of 200 mmol/L IAA at room temperature in the dark for 1 h. Next, the excess IAA was quenched by adding 10 μL of 200 mmol/L DTT solution and incubated at room temperature for 1 h. The resulting mixture was subjected to digestion by sequencing-grade trypsin at 37°C for 15 min, 2 h and 18 h for the 1D studies. Samples for 2D studies were denatured in guanidine at high temperature (85°C) or room temperature for 10 min prior to reduction and alkylation, and digested for 2 h and 18 h. The reaction was quenched by adding 50% (vol/vol) formic acid.To generate a wide variety of tryptic peptides, parameters for 13 protocol variations were used during denaturing, reduction, and alkylation of the NISTmAb. Each sample was digested with trypsin for 0.25 h, 2h, and 18 h. These were summarized here and in Supplementary Material Table S-1. (1) UreaRT/DTT: 6 mol/L urea in Tris HCI buffer was used to denature the sample at room temperature. The alkylated protein solution was subjected to desalting by Zeba spin column (7K MWCO) prior to adding 10 μg of Promega trypsin. (2) GuanRT/DTT: 6 mol/L guanidine hydrochloride in Tris HCI buffer was used to denature the sample at room temperature. The alkylated protein solution was subjected to desalting by Zeba spin column (7K MWCO) prior to adding 10 μg of Promega trypsin. (3) GuanRT/No2DTT: 6 mol/L guanidine hydrochloride in Tris HCI buffer was used to denature the sample at room temperature without quenching the excess of IAA using the second DTT. The alkylated protein solution was subjected to desalting by Zeba spin column (7K MWCO) prior to adding 10 μg of Promega trypsin. (4) GuanRT/TCEP: 6 mol/L guanidine hydrochloride in Tris HCI buffer was used to denature the sample at room temperature. Tris(2-carboxyethyl)phosphine (TCEP) was used as a reducing reagent instead of DTT. The alkylated protein solution was subjected to desalting by Zeba spin column (7K MWCO) prior to adding 10 μg of Promega trypsin. (5) GuanHT/DTT: The procedure is same as Protocol 2 with the exception that the denature temperature was set to 85°C. (6) HT/DTT: The sample in Tris HCI buffer was denatured by heating at 85°C for 1h. 10 μg of Promega trypsin was used to digest the protein. (7) RapiGestHT/DTT: 0.2% RapiGest (by volume) in Tris HCI buffer was used to denature the sample at 85°C. 10 μg of Promega trypsin was used to digest the protein. (8) RapiGestRT/DTT: 0.2% RapiGest (by volume) in Tris HCI buffer was used to denature the sample at room temperature. 10 μg of Promega trypsin was used to digest the protein. (9) MeOHRT/DTT: 20% methanol (by volume) in Tris HCI buffer was used to denature the sample at room temperature. 10 μg of Promega trypsin was used to digest the protein. (10) MeCNRT/DTT: 20% acetonitrile in Tris HCI buffer was used to denature the sample at room temperature. 10 μg of Promega trypsin was used to digest the protein. (11) TfeRT/DTT: 50% TFE (2,2,2-trifluoroethanol) in Tris HCI buffer was used to denature the sample at room temperature. 10 μg of Promega trypsin was used to digest the protein. (12) GuanRT/DTT/CheapTrypsin10: The same procedure as Protocol 2 with the exception that 10 μg of Sigma T1426 trypsin was used to digest the protein. (13) GuanRT/DTT/CheapTrypsin50: The same procedure as Protocol 12 with the exception that 50 μg of Sigma T1426 trypsin was used to digest the protein.
LC-MS/MS analysis
1D Analysis
The above digests (0.2 μg) were analyzed on a Dionex Ultimate 3000 RSLC (rapid separation liquid chromatography) Nano LC with an Acclaim pepmap100 column with a nanospray source connected to a Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific, Waltham, MA) in the positive ion mode. The experimental conditions used were the same as previously reported.
2D Analysis
The first dimension of 2D LC analysis was performed by fractionating 1 mg of digests via basic (pH 10) reverse-phase HPLC on an Agilent ZORBAX capillary column with a Thermo Dionex Ultimate 3000 HPLC system. The second dimension of 2D LC-MS/MS analysis was conducted after combining fractions as previously described.
Initial peptide identifications
Initial identifications were made from HCD spectra derived from tryptic digests using the MS-GF+ search engine against a fasta file containing the NISTmAb sequence (see Supplementary Material Document S-1). A precursor m/z tolerance of 20 ppm was used and product ion tolerances set by MS-GF+ for HCD fragmentation. In order to reliably identify both long, highly-charged peptides containing multiple missed cleavages as well as peptides containing a wide range of modifications, including semi-tryptic peptides, two separate search engine setting protocols were used. Otherwise, it was found that high scoring, false candidate semi-tryptic peptides with unusual modifications could suppress correct, lower scoring assignments of conventional tryptic peptides. The first search protocol allowed only tryptic peptides and included a list of 24 IgG and method-induced modification categories (see Table 5 and Section 1.4). The second protocol allowed semi-tryptic peptides and permitted only five common modifications (variable cysteine alkylation, methionine oxidation, ammonia loss of N-terminal Gln and Carbamidomethyl-Cys, and water loss from N-terminal Glu). Results of these two searches were merged and identifications with false discovery rate < 1% were collected. Spectra of tryptic peptides with unassigned total abundance greater than 30% were rejected, while this threshold was reduced to 20% for semi-tryptic peptides. In case of multiple identifications of a single peptide ion, the spectrum with the best score for each HCD energy was selected, annotated and added provisionally to the library for later stringent analysis. Note that MS-GF+ did not limit the number of allowed missed cleavages.
Data analysis
MS1 information for all detectable ions
The detailed analysis of peptide digestion products was done with an in-house program for processing MS1 information (NIST ProMS). The program analyzes isotope clusters of each observed species to determine charge states, RT, monoisotopic m/z, and signal intensity (peak areas derived from all observable isotope peaks) for each ion detected in all LC-MS/MS runs.
Peptide classification scheme
Peptides are separated into six classes (as discussed in
Peptide Classification) and fall into two broad categories: common and rare. This is done with the intention of rejecting peptides that contain two or more “rare” features. Common peptides are those expected from digestion and most frequently sought in sequence identification searching, such as fully tryptic peptides or tryptic peptides with unexceptional missed cleavages (near acidic groups or a terminus). These also include the other varieties of peptides found in peptide mapping experiments, for example, Met/Trp oxidation, water loss from N-terminal Glu, and N-terminal Cys or Gln loss of ammonia (commonly observed modifications), as well as in-source products formed during electrospray ionization. Other less commonly observed peptide groups (e.g., missed/over alkylation, miscleaved peptides, semi-tryptic peptides, or uncommon modifications) are classified as rare. The presence of combined rare classes or two or more uncommon characteristics associated with a single peptide would trigger the rejection of the identification. However, frequent and repeat identifications of such peptide ions are manually examined to ensure that no unexpected peptides are lost in the analysis.
Authors: Igor A Kaltashov; Cedric E Bobst; Rinat R Abzalimov; Guanbo Wang; Burcu Baykal; Shunhai Wang Journal: Biotechnol Adv Date: 2011-05-17 Impact factor: 14.227
Authors: Markus Haberger; Katrin Bomans; Katharina Diepold; Michaela Hook; Jana Gassner; Tilman Schlothauer; Adrian Zwick; Christian Spick; Jochen Felix Kepert; Brigitte Hienz; Michael Wiedmann; Hermann Beck; Philipp Metzger; Michael Mølhøj; Constanze Knoblich; Ulla Grauschopf; Dietmar Reusch; Patrick Bulau Journal: MAbs Date: 2014-01-17 Impact factor: 5.857
Authors: Xinjian Yan; Sanford P Markey; Ramesh Marupaka; Qian Dong; Brian T Cooper; Yuri A Mirokhin; William E Wallace; Stephen E Stein Journal: Anal Chem Date: 2020-04-23 Impact factor: 6.986
Authors: Silvia Millán-Martín; Craig Jakes; Sara Carillo; Tom Buchanan; Marc Guender; Dan Bach Kristensen; Trine Meiborg Sloth; Martin Ørgaard; Ken Cook; Jonathan Bones Journal: Anal Bioanal Chem Date: 2020-07-25 Impact factor: 4.142
Authors: Rafael D Melani; Kristina Srzentić; Vincent R Gerbasi; John P McGee; Romain Huguet; Luca Fornelli; Neil L Kelleher Journal: MAbs Date: 2019-10-13 Impact factor: 5.857
Authors: Qian Dong; Xinjian Yan; Yuxue Liang; Sanford P Markey; Sergey L Sheetlin; Concepcion A Remoroza; William E Wallace; Stephen E Stein Journal: J Proteome Res Date: 2021-02-08 Impact factor: 5.370