Literature DB >> 27454319

Optimized Fragmentation Regime for Diazirine Photo-Cross-Linked Peptides.

Sven H Giese^1,2, Adam Belsom², Juri Rappsilber^1,2.

Abstract

Cross-linking/mass spectrometry has evolved into a robust technology that reveals structural insights into proteins and protein complexes. We leverage a new tribrid instrument with improved fragmentation capacities in a systematic comparison to identify which fragmentation method would be best for the identification of cross-linked peptides. Specifically, we explored three fragmentation methods and two combinations: collision-induced dissociation (CID), beam-type CID (HCD), electron-transfer dissociation (ETD), ETciD, and EThcD. Trypsin-digested, SDA-cross-linked human serum albumin (HSA) served as a test sample, yielding over all methods and in triplicate analysis in total 2602 matched PSMs and 1390 linked residue pairs at 5% false discovery rate, as confirmed by the crystal structure. HCD wins in number of matched peptide-spectrum-matches (958 PSMs) and identified links (446). CID is most complementary, increasing the number of identified links by 13% (58 links). HCD wins together with EThcD in cross-link site calling precision, with approximately 62% of sites having adjacent backbone cleavages that unambiguously locate the link in both peptides, without assuming any cross-linker preference for amino acids. Overall quality of spectra, as judged by sequence coverage of both peptides, is best for EThcD for the majority of peptides. Sequence coverage might be of particular importance for complex samples, for which we propose a data dependent decision tree, else HCD is the method of choice. The mass spectrometric raw data has been deposited in PRIDE (PXD003737).

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2016 PMID： 27454319 PMCID： PMC5047652 DOI： 10.1021/acs.analchem.6b02082

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Current methods of structural biology have left a systematic and large gap in our knowledge of protein structures.[1] Cross-linking/mass spectrometry (CLMS) is an emerging tool that helps to gain structural information for challenging proteins and protein complexes. In CLMS experiments, protein complexes are chemically cross-linked, digested into peptides, and then analyzed via mass spectrometry and bioinformatics.[2−5] Identifying a cross-linked peptide pair or the linked residues within, defines their maximal distance in the folded protein. The derived distance constraints can then be used to determine the low-resolution arrangement of protein complexes[4,6,7] or even the high-resolution structure of a protein by the help of computational modeling.[8] To identify cross-linked peptides, fragmentation spectra have to be matched with peptide sequences by database search. For this purpose, a number of tools have been developed,[9,10] for example, pLINK,[11] Protein Prospector,[12,13] StavroX,[14] xQuest,[15] Kojak,[16] Xi,[6,17] or even search engines[18] based on linear peptide identification search paradigms such as Mascot.[19] One of the challenges in identifying cross-linked peptides is the unequal fragmentation of the two linked peptides,[13,17] that is, often one of the two peptides is better fragmented and thus also better characterized by fragment ions. Under collision-induced dissociation (CID) conditions this has been investigated in more detail, revealing that the intensity of observed fragment ions is also affected.[17] This is important for the scoring of cross-linked peptides since in general the number of identified fragment ions and their intensity is used for spectra matching. Despite the obvious disadvantage of the unequal fragmentation, scoring mechanisms managed to successfully exploit this fact: To judge the complete cross-linked peptide-spectrum match (PSM), the two individual peptide scores are weighted differently.[13,16] However, this should only be an ad hoc solution; ideally the experimental setup can be changed in such a way that the sequence coverage for both peptides is increased. It is plausible that one of the available fragmentation methods performs better than the others, and a comparative analysis into the behavior of cross-linked peptides might reveal options for a refined acquisition strategy. Throughout the manuscript we use CID for resonant excitation CID in the linear ion trap and HCD as the abbreviation for beam-type CID (HCD is also often referred to as higher-energy collisional dissociation). CID is one of the standard methods of fragmenting peptides in proteomics and has been used in many CLMS studies.[6,20−24] The details of CID of cross-linked peptides have recently been systematically assessed,[17] but a systematic comparison to other fragmentation methods such as HCD is lacking. HCD has also been used in many CLMS studies.[11,13,25,26] Neither a systematic analysis of cross-linked peptides under HCD exists nor under electron-transfer dissociation (ETD). ETD-based fragmentation, that is, ETD with and without supplemental activation of CID (ETciD) or HCD (EThcD)[27] has neither routinely been applied to cross-linked peptides nor investigated in much detail. A sequential fragmentation scheme of CID and ETD is reported to increase the identification and confidence levels of cross-linked peptides.[28] Another study acquired sequential CID and ETD fragmentation spectra as an optimized method for CID cleavable cross-linkers with signature peaks. Both spectra are then matched with their appropriate ion types and scored together, yielding an improved sequence coverage compared to CID alone.[29] Search strategies for noncleavable cross-linkers, however, do not rely on the detection of signature peaks, and thus, the time for the reisolation of the precursor can be saved by simply using ETD with supplemental activation. It was also shown that ETD alone can generate good ion coverage for both peptides using a novel cross-linker,[30] albeit the effect on peptides cross-linked with another cross-linker remains to be investigated. In contrast, ETD has been used frequently for complete proteins[31] or to characterize post-translational modifications (since it leaves the often labile peptide modifications intact[32]). Earlier studies stated that ETD fragment peptides with charge states higher than two, more extensively than CID.[33] However, the underlying effect seems to correlate with the mass-to-charge ratio (m/z) of the precursors.[34] For cross-linked peptides, we expect highly charged precursors[6,15,18] and, thus, potentially well-suited targets for ETD. High-sequence coverage is important to ensure selectivity during database search when trying to identify the two cross-linked peptides from the large choice of alternatives offered by the database. Good backbone fragmentation should also be beneficial to pinpoint the exact location of the cross-link. Despite being the intuitive expectation, sequence coverage and site calling precision do not necessarily have to be linked directly. Properties of the linkage site might direct fragmentation toward neighboring backbone bonds or away from them. Also, for amine-reactive cross-linkers, pinpointing the exact position of the cross-link is assisted by the restricted chemical reactivity toward lysine, serine, threonine, tyrosine, or the protein N-terminus. Hence, depending on the peptide sequence there might only be a single amino acid amenable to the cross-linker reaction. For highly reactive cross-linkers such as succinimidyl 4,4-azipentanoate (SDA) each residue in a peptide needs to be considered when locating the linkage site. Therefore, pinpointing the cross-link sites potentially requires more complete backbone fragmentation than for more specific cross-linkers. In this study we compared three different fragmentation techniques and two combined fragmentation schemes available on a novel tribrid mass spectrometer (Orbitrap Fusion Lumos, Thermo Fisher Scientific), CID, HCD, ETD, ETciD, and EThcD, on cross-linked peptides obtained by tryptic cleavage of SDA-cross-linked human serum albumin (HSA). The three-dimensional structure of HSA has been resolved by X-ray crystallography[35] and is used as ground-truth to evaluate the identification results. The right choice of fragmentation method allows the number of identified linkage sites to be increased; increasing the sequence coverage of both linked peptides boosts the confidence of the matches and also the correct localization of the cross-link site.

Methods

Sample Preparation

Purified HSA (Sigma-Aldrich, St. Louis, MO) was cross-linked using different cross-linker-to-protein, weight-to-weight (w/w) ratios: 0.152:1, 0.203:1, 0.303:1, 0.406:1, 0.606:1, 0.811:1, 1.21:1, and 1.62:1. Aliquots of purified HSA (15 μg, 0.75 mg/mL) in cross-linking buffer (20 mM HEPES–OH, 20 mM NaCl, 5 mM MgCl2, pH 7.8) were mixed with sulfo-SDA (Thermo Scientific Pierce, Rockford, IL) to initiate incomplete reaction of the protein with the sulfo-NHS ester component of the cross-linker. Human blood serum from a healthy donor (20 μg, 1.0 mg/mL) was cross-linked in a similar manner, using cross-linker-to-protein ratios (w/w) of 0.5:1, 1:1, 2:1, and 4:1. Total reaction volume in each case was 30 μL. For the second step of the cross-linking procedure, photoactivation of the diazirine group was carried out using UV irradiation from a UVP CL-1000 UV Cross-linker (UVP Inc.). Samples were irradiated for either 25 or 50 min for purified HSA samples, and either 10, 20, 40, or 60 min in the case of blood serum samples and separated using gel electrophoresis. Bands corresponding to monomeric HSA were excised from gels and the proteins reduced with DTT, alkylated using IAA, and digested using trypsin following standard protocols.[18] Peptides were loaded onto self-made C18 StageTips[36] and eluted using 80% acetonitrile and 20%, 0.1% TFA in water. The eluates from blood serum HSA and purified HSA digests were mixed 0.33:1 as a master mix to be used throughout this study. The two samples originally used in our structural analysis of HSA[8] were mixed here to gain enough material to perform the experiments of this study in triplicates.

Data Acquisition

Peptides were loaded directly (2% B, 500 nL/min) onto a spray emitter analytical column (75 μm inner diameter, 8 μm opening, 250 mm length; New Objectives) packed with C18 material (ReproSil-Pur C18-AQ 3 μm; Dr Maisch GmbH, Ammerbuch-Entringen, Germany) using an air pressure pump (Proxeon Biosystems).[37] The 0.1% formic acid served as mobile phase A and 0.1% formic acid/80% acetonitrile as mobile phase B. Peptides were eluted (200 nL/min, linear gradient of 2–40% B over 139 min) directly into an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA). Survey spectra were recorded in the Orbitrap at 120000 resolution. Spectra for all fragmentation methods were acquired using a scan range of 300–1700 m/z. Precursor ion isolation was performed with the quadrupole and an m/z window of 1.6 Th. The precursor automatic gain control (AGC) target value was 4 × 105, maximum injection time 50 ms. For CID only, CID collision energy was set to 30%. For HCD only, HCD collision energy was set to 35%. For ETD only, the option to inject ions for all available parallelizable time was selected (anion AGC 5 × 104, 60 ms maximum injection time). Supplemental activation (SA) collision energy was set to 10% for ETciD, and 25% for EThcD.

Data Analysis

Raw files were preprocessed with MaxQuant (v. 1.5.2.8) with “Top MS/MS peaks per 100 Da” set to 100.[38] Resulting peak files (APL format) were subjected to Xi (ERI Edinburgh, v. 1.5.584) and searched with the following settings: MS accuracy, 6 ppm; MS/MS accuracy, 20 ppm; enzyme, trypsin; max. missed cleavages, 4; max. number of modifications, 3; fixed modification, none; variable modifications, carbamidomethylation on cysteine; oxidation on methionine; cross-linker, SDA (mass modification: 109.0396 Da). In addition, variable modifications by the hydrolyzed cross-linker (“SDA-hyd”, mass modification: 82.0413 Da) and loop-links (“SDA-loop”, mass modification: 83.0491 Da) were allowed. SDA cross-link reactions were assumed to connect lysine, serine, threonine, tyrosine, or the protein N-terminus on the one end of the spacer with any other amino acid on the other end. FDR was estimated using XiFDR (v. 1.0.6.14)[39] on a 5% peptide spectrum match (PSM) level and 5% link-level only including unique PSMs. The reference database consisted of a single entry with the protein sequence of HSA (Uniprot: P02768). For further analysis, PSM information (precursor m/z, annotated fragments, score, peptide sequences, etc.) were extracted from a local PostgreSQL database. The annotated spectra are available in the Supporting Information (Figures S4–S8). To derive a decision tree for an optimized fragmentation scheme for cross-linked peptides we divided the acquisition range into a grid of m/z bins of size 200 for each charge state from 3 to 7. After sorting all PSMs into this theoretical grid we assigned each cell the best performing and second best performing fragmentation method. The performance was assessed through the median achieved sequence coverage of the complete cross-linked peptide. Note, sequence coverage does not depend on the possible fragment ions but rather on the actual evidence (fragment ions) for specific n-terminal or c-terminal sequences. To decide whether or not a fragmentation method is favorable over another we conducted a simple, one-sided permutation test[40] with label swaps and 10.000 iterations. P values lower than 0.05 were regarded as significant. Permutation tests were only performed if more than 15 observations were in the best performing class. If the best and second best were too similar to give significant results the best performing method was also compared to all other methods. All raw files are available via the PRIDE repository[41] (PDX: PXD003737) along with PSM results and the reference FASTA.

Results and Discussion

We investigated the impact of five fragmentation techniques (CID, HCD, ETD, EThcD, ETciD) on the analysis of cross-linked peptides using a latest generation Orbitrap mass spectrometer (Orbitrap Fusion Lumos, Thermo Fisher Scientific). HSA was used as a model protein with a known crystal structure. Cross-linking experiments suffer under CID conditions from the underrepresentation of fragment ions from one of the two peptides.[13,17] Here we define the peptide with more intense ions among the ten most intense fragment ions as the α-peptide and the remaining peptide as the β-peptide.[17] Note that the nomenclature for the two peptides in a cross-link is not standardized; other definitions using the achieved search score[13] or the peptide’s chain length or mass[4] are used. We hypothesized that the usage of other fragmentation techniques has an impact on the fragmentation pattern of cross-linked peptides and subsequently on the success rate of identification. In our analysis we applied two different FDR-levels according to the descriptive features that we evaluated.[39] For the evaluation of identification results on the crystal structure, a link-level FDR is used. For the evaluation of PSM properties (e.g., sequence coverage), a regular PSM FDR is used. An overview is available in Table S1.

HCD Fragmentation Gives the Highest Number of Identified Cross-Links

We compared the number of identified cross-links that passed a 5% link-level FDR and a 5% PSM-level FDR to assess which fragmentation approach leads to the highest identification success. The results, accumulating the three technical replicates for all fragmentation techniques, show that HCD (958 PSMs) gives the highest number of identifications followed by CID (604 PSMs, Figure A). ETciD fragmentation achieves the lowest number of identified cross-links with 296 PSMs. This order is closely related to the number of acquired spectra in all replicates. While HCD is the fastest acquisition technique producing ∼109000 MS2 spectra ETciD and ETD only produce ∼80000 spectra (Table S1). While the number of PSMs is only a proxy for the success of CLMS experiments, the true value of CLMS data comes from the corresponding distance constraints. Therefore, for the comparison of cross-linking data it makes sense to compare the results on the link-level. For the comparison on the link-level only unique links are regarded for further analysis. A unique link is defined by the combination of residues involved in a cross-link, that is, a unique residue pair.

Figure 1

Number of SDA-induced cross-links identified in HSA using different fragmentation techniques. (A) Identified PSMs and links were computed for 5% FDR-level on the respective category. (B) Evaluation of the identified cross-links against the crystal structure of HSA. The light gray distribution reflects the distance measurement between identified residues in a cross-link mapped to the crystal structure (the median is shown above the vertical line). The dark gray distribution reflects all pairwise combinations of cross-linkable residues in the crystal structure. The black vertical line at 25 Å is used to classify cross-links as long distance or not. As is the case for PSMs, HCD fragmentation also returns the highest number of identified links (Figure A). In total 1390 links (972 unique) were identified with the various methods: Of the unique links HCD observed 446 links (46%), CID 297 links (31%), EThCD 240 links (25%), ETciD 205 links (21%), and ETD 202 (21%). Note, the comparison of the links is not straightforward if the cross-link site is ambiguous. We applied a simple heuristic that assigns the linkage site to the c-terminal residue in ambiguous linkage windows. As HSA’s three-dimensional structure has been resolved, it is possible to utilize it as ground truth and further evaluate the quality of the identified links. We used 25 Å here for SDA as the maximal α-carbon distance of two linkable amino acids in the three-dimensional structure. This provides a clear distinction between true positive and false positive identifications. Each identified link that lies within 25 Å in the crystal structure is plausibly a true positive. Accordingly, every link that is further than 25 Å apart is plausibly a false positive. This is a simplified approach, as links shorter than 25 Å will also contain false positives as a result of random matching, and conversely, longer links may be true and result from protein structural flexibility. Comparing the link information from all five fragmentation techniques shows that the overall quality of the results is comparable across all fragmentation modes and distinctly different from random results. The derived distance distributions have a median of 12–13 Å and are very distinct from the random distance distribution (Figure B). In addition, the results are comparable in meeting the approximated 5% FDR. FDR analysis for the HCD data and the ETciD data slightly underestimates the number of false positives by 1% and 2.5%, respectively (Figure S1). These can be partially explained by the definition of the FDR itself, which only gives an approximation of the true false discovery rate. Furthermore, the hard cutoff that was used has a large impact on the computed FDR. For example, the ETciD distances showed a larger peak just to the right of the desired distance cutoff, indicating that a small increase in the maximal allowed distance would give an FDR closer to the desired 5%. The HCD distance distribution looks similar to a small enrichment of false positives just outside the maximal allowed distance. Thus, accounting for more flexibility would change the FDR and suggests that the different methods lead to data of comparable quality but different quantity. Having a preranking of the individual fragmentation techniques in terms of number of PSMs and unique cross-links is desirable to maximize the information content in a single run. Depending on the peptide properties, some fragmentation methods might be more suited for a certain group of peptides, and thus, using two (or more) orthogonal fragmentation techniques may increase the overall yield in peptide identifications and thus distance constraints. Disregarding the link information to focus first on the identified peptide pairs shows that HCD fragmentation also yields the largest number of unique peptide pairs (Figure A). A total of 43% (201 peptide pairs) are shared between at least two fragmentation techniques. The remaining 57% (269 peptide pairs) are unique to one of the five fragmentation techniques. To maximize the information content, HCD should be combined with CID fragmentation to increase the number of unique links by 58 (Figure B). Interestingly, ETD fragmentation can maximally increase the number of unique links by 41 by using EThcD. We suspect that the difference in the number of acquired spectra and actually identified PSMs is the main driver for this effect. We define the identification rate, IR, as , where Nid is the number of identified unique PSMs and Nacq is the total number of acquired MS2 spectra (Table S1). The IR reveals that HCD not only acquires most spectra, but also has the highest success rate of 0.88% compared to CID (0.61%), EThcD (0.52%), and ETD/ETciD (0.38%). If speed and reliability of ETD-based fragmentation should change in the future, this order of complementarity may change. In comparison with linear peptide identifications, where the IR reaches up to 54%[42] (depending on the instrumentation), the success rate of cross-link identification is much lower. A contributing factor will be the generally low abundance of cross-linked peptides when compared to linear peptides, which will reduce their frequency of selection for MS2, especially in competition with the linear peptides also present. Other factors will include poorer database matching due to often lower intensity, but also more complex spectra and a larger search space.

Figure 2

Pairwise result overlaps of fragmentation techniques. (A) Overlap of identified peptide pairs (disregarding link-site positions) between fragmentation techniques (Venn diagram generated with Jvenn49). (B) Set difference matrix shows the number of uniquely identified peptide pairs (disregarding link-site positions) by one fragmentation technique (y-axis) when compared to another one (x-axis).

ETD-Aided Fragmentation Improves the Coverage of the Second Peptide

The identification of cross-linked peptides poses two challenges: First, finding the correct peptide pair, and second, assigning the correct cross-link site. High peptide sequence coverage for both individual peptides should be beneficial to assigning the correct site. Site calling will be especially challenging when considering cross-linkers such as SDA, where the number of cross-link target sites is large. Under HCD conditions the coverage distribution for the α-peptide is the lowest, with a mean coverage around 50% (Figure A). The other four fragmentation techniques perform very similar to only small improvements in the coverage of the α-peptide with CID or EThcD fragmentation. Interestingly, ETD involving fragmentation schemes do not increase the fragmentation efficiency (measured by the peptide coverage) much for the α-peptide. In fact, the highest coverage values for the α-peptide were observed with CID fragmentation. In contrast, the sequence coverage for the beta peptide largely depends on the fragmentation method (Figure B). ETD, ETciD, EThcD, and HCD show a much better fragmentation compared to CID. Previously, ETD was reported to improve the sequence coverage compared to CID.[32,43] We observe here that for cross-linked peptides this effect is very pronounced for the β-peptide, but not for the α-peptide.

Figure 3

Achieved sequence coverage comparison. Coverage distribution of the α-peptide (A; more matches among the 10 most intense fragment ions) and the β-peptide (B). The vertical line in (A)–(C) reflects a reference value of 50% sequence coverage, meaning fragments (b, c, y, or z) match to half of the backbone links between residues along the sequence of the peptide. (C) Coverage distribution for the complete cross-linked peptide. (D) Symmetry (absolute coverage difference between alpha and beta peptide) distributions for the different fragmentation techniques. The data in (A)–(D) were analyzed using a 5% PSM FDR. In general, in cross-linked peptides, one peptide matches more and with higher intense fragment ions than the other. All fragmentation methods yield at least an average coverage of around 50% for the α-peptide. For the β-peptide, the average coverage lies between 39% and 50%. CID would be the method of choice for high α-peptide coverage. However, CID is systematically disadvantaging the β-peptide. For the β-peptide, the other fragmentation methods perform much better: EThcD and HCD almost reach the same fragmentation efficiency as for the α-peptide. In numbers, the largest discrepancy between α- and β-peptide coverage was observed with CID, with a mean coverage difference (MCD) of 19%. EThCD and HCD show the lowest MCD of 8%. The overall best coverage is observed with EThcD fragmentation (Figure C). ETciD seems to be less effective, presumably as ETD in the first stage leads to charge reduction, and CID then fragments a single precursor, while HCD fragments all. Nevertheless, ETciD greatly improves the coverage of the second peptide when compared to CID. To compare the fragmentation efficiency on both peptides in a cross-link more systematically, we define the symmetry factor (SF) aswhere covα and covβ refer to the sequence coverage of the α- and β-beta peptide, respectively. For convenience, we use the negation SF′ of SF defined as A large SF′ means that α- and β-peptide coverage are very similar and vice versa. CID shows the smallest among the five fragmentation methods of ∼0.8. The other four methods perform better than CID, with a median of ∼0.9 (Figure D). In addition, ETD, ETciD, EThcD, and HCD have a smaller spread than CID. In summary, CID exasperates the second peptide problem. Nevertheless, CID still slightly outperforms HCD in overall cross-linked peptide sequence coverage. In order to maximize overall cross-linked peptide coverage ETD, ETciD, and EThcD are recommended, based on median coverage of the complete cross-linked peptide.

Precursor m/z Has a Large Effect on the Efficiency of the Fragmentation

To follow-up on the different fragmentation behavior of cross-linked peptides we investigated how the precursor properties influence the fragmentation efficiency. We first divided the m/z acquisition range into bins of m/z 150 (starting from m/z 550). For each bin we then collected the peptide identifications of all different fragmentation methods and investigated the sequence coverage based on the m/z of the precursor. ETD and EThcD lead to the highest sequence coverage between m/z 500–800 (Figure A,B). However, ETD efficiency decreases steeply with higher m/z, making HCD and CID the better choice for precursors larger than m/z 1000. The same trend is observed for all ETD-based methods. These differences are more pronounced on the individual α- and β-peptides. When the complete peptide coverage is compared (Figure C), all methods stick more closely together but EThcD and ETD still outperform all other methods for precursors smaller than m/z 850. In higher m/z areas, only CID and HCD are able to still produce enough peptide identifications (data in the figure was limited to only include m/z bins with at least five observations).

Figure 4

Sequence coverage depending on precursor m/z and charge. The average coverage values from (A) α-peptides, (B) β-peptides, and (C) the complete cross-linked peptide are plotted vs the precursor m/z. Each dot represents the median of all identified peptides in a window of m/z 150. Error bars show the standard deviation. (D) Decision surface to optimize the sequence coverage of cross-linked peptide. The acquisition range was divided into bins of 200 m/z per charge state. In each bin the best performing fragmentation method (judged by median achieved sequence coverage) is used to color that particular bin. The “*” denotes a significant improvement in sequence coverage by using the best performing fragmentation method over the second best. Areas with less than 15 observations are colored in light red, falling back to HCD as standard fragmentation technique. Gray annotations show areas where no significant improvement could be obtained by choosing one method over the others. As demonstrated in the sections above, there are differences in the efficiency of the fragmentation of cross-linked peptides. In a more detailed comparison, we divided the acquisition range into a grid made of charge bins of size one and m/z bins of size 200. In each of these cells we then tested how well the five different fragmentation methods performed. The performance was evaluated on the cross-linked peptide sequence coverage. For the majority of peptides, EThcD achieved significantly higher sequence coverage (Figure D) than the second best method between 600 and 800 m/z (precursor charge 3–6). In addition, the m/z cells 400–600 (z = 4) and 800–1000 (z = 5) are also favored by EThcD fragmentation. Since the majority of cross-linked peptides (71%) lie within 600–1000 m/z, the most important area is dominated by EThcD fragmentation. However, evaluated by pure numbers of identifications, EThcD is not the best performing method. On average, ∼35 PSMs are missed if EThcD is chosen over the method that achieves the highest number of identifications. If the evaluation metric is changed to the highest number of identifications, HCD is outperforming the other fragmentation methods for all m/z bins (Figure S3). Therefore, HCD was selected as default method for regions where no significant improvement could be observed by any of the other methods (Figure D, HCD written in gray).

HCD, EThcD, and ETD fragmentation define the cross-link site most unambiguously

The overall sequence coverage is a valuable feature to assess the quality of peptide identifications. However, for cross-linked peptides those fragments flanking the cross-linked residues are important to define the linkage site. This resembles the localization of post-translational modifications such as phosphorylation, which greatly benefited from the usage of combined fragmentation methods.[44] Limited information about the cross-link site is available when none of the fragments next to a cross-linked residue are observed; the cross-link site can then only be assigned by prior assumptions or to larger sequence windows, which becomes problematic if the site call is off by ±5 residues (at least in HSA and using current ab initio structure computation).[8] Given the information from correct fragment identifications, a combination of one c-terminal and one n-terminal ion is enough to locate the cross-link site unambiguously. Utilizing the high-resolution/accurate mass measurement in our experimental design, we thus assumed that each assigned fragment is correct for peptides passing the specified FDR. The cross-link site in α-peptides could be assigned to a single residue in ∼65% of all PSMs identified with EThcD or HCD (Figure A). The second best performing method was ETD, with approximately 60% of PSMs where the cross-link could be assigned to a single residue. CID and ETciD PSMs show the lowest number of accurate site localizations to a single residue (below 50% of all PSMs). All methods placed the cross-link site on average within the critical 5 residue window for 97.2% ± 1.17 (α-peptides) and 95.6% ± 1.3 (β-peptides) of all PSMs. For the β-peptide, this looks very similar; EThcD and HCD show the best fragmentation behavior to localize the cross-link site (Figure B). With approximately 50% of precisely localized cross-links in the β-peptide, the link-localization is less well for the β-peptide than for the α-peptide. However, this is not as pronounced as would be expected from the sequence coverage asymmetry. This is counterintuitive since the coverage distributions for HCD is among the lowest of all five fragmentation techniques for the β-peptide. For EThcD, the results for the determination of the cross-link site are more in line with the observed coverage distributions. Still, the large difference in the coverage distribution of the α- and β-peptides seems not to be as pronounced for the distribution of correct localizations of the cross-link site. One of the possible reasons is that the cleavage of the peptides before and after the cross-link site is preferred. For CID a statistical trend was reported that cross-linked fragments outnumber linear fragments and tend to have a higher intensity.[17] We encounter the opposite for HCD, linear fragments visibly outnumber cross-linked fragments (Figure S8).

Figure 5

Cross-link site localization precision. (A) Cumulative precision curve for the α-peptide. (B) Cumulative precision curve for the β-peptide. With a precision value of one the cross-link site is unambiguously located by adjacent backbone fragments (b, c, y, or z) in the peptide. A value of two limits the cross-link site to two eligible residues.

Data-Dependent Decision Tree for Optimized Acquisition of Cross-Linked Peptides

CLMS studies vary in the degree of complexity: single proteins, multiple protein complexes or complete proteomes can be analyzed to generate protein–protein interaction information or the three-dimensional structure. Depending on the specific case we propose two different acquisition strategies (Figure A): First, for single proteins or small protein complexes, we recommend HCD as the method of choice. Since the complexity of the sample is not very high, cross-linked peptides can often be matched by precursor mass alone. In addition, HCD fragmentation generates enough fragments to precisely localize the cross-link site in the majority of cases. For the second case, that is, complex samples with many proteins not only the search space becomes an issue but also the associated random matches. A fragmentation scheme that generates highly discriminative scores for target and decoy peptides will identify more peptides under the same FDR threshold. The optimal fragmentation scheme for such an experiment is shown in Figure B. Earlier studies on the development of data dependent decision trees (DDDT) for the acquisition of linear peptides mainly support our conclusions: HCD gives the highest number of identifications, but ETD gives higher search engine scores[45] or, as in our case, higher sequence coverage. Compared to a DDDT for linear peptides our results are slightly different but still comparable. For example, linear DDDTs precursors with charge state 3+ have been analyzed with ETD up to 750 m/z[46] or 650 m/z,[45] we only use ETD from 600–800 m/z. In addition, instead of using ETD alone for 4+, 5+ precursors below 1000 m/z and 800 m/z, respectively, EThcD is used. In this study we investigated SDA-cross-linked, tryptic peptides. Other cross-linkers or enzymes may lead to peptide populations with distinct fragmentation behavior due to differences in size or amino acid composition. Note, however, that the proposed fragmentation scheme is similar to the decision tree for linear peptides[45,46] and may therefore be of more general value.

Figure 6

Acquisition strategy for cross-linked peptides. (A) Recommended acquisition scheme for cross-linking samples. (B) Data-dependent decision tree (DDDT) for cross-linked peptides. Depending on the precursor charge state (3+, 4+, 5+, 6+, and other) and the m/z, the appropriate fragmentation technique is selected.

Conclusion

For the majority of the peptides EThcD is the method of choice to achieve the highest sequence coverage. HCD is an important alternative because of its superior speed, with only somewhat reduced peptide sequence coverage. CID, ETD, and ETciD only play minor roles. We advise to adjust the acquisition scheme to follow the experimental setup: simple protein samples should be analyzed using only HCD to maximize number of observed links, which starts having value in protein structure determination.[8,47] For complex samples, we propose a decision tree that is mainly based on EThcD and HCD to maximize search specificity.

46 in total

1. StavroX--a software for analyzing crosslinked products in protein interaction studies.

Authors: Michael Götze; Jens Pettelkau; Sabine Schaks; Konstanze Bosse; Christian H Ihling; Fabian Krauth; Romy Fritzsche; Uwe Kühn; Andrea Sinz
Journal: J Am Soc Mass Spectrom Date: 2011-10-25 Impact factor: 3.109

2. Bioinformatics tools for the structural elucidation of multi-subunit protein complexes by mass spectrometric analysis of protein-protein cross-links.

Authors: Shannon L N Mayne; Hugh-George Patterton
Journal: Brief Bioinform Date: 2011-01-18 Impact factor: 11.622

3. Identification of cross-linked peptides from large sequence databases.

Authors: Oliver Rinner; Jan Seebacher; Thomas Walzthoeni; Lukas N Mueller; Martin Beck; Alexander Schmidt; Markus Mueller; Ruedi Aebersold
Journal: Nat Methods Date: 2008-03-09 Impact factor: 28.547

Review 4. XL-MS: Protein cross-linking coupled with mass spectrometry.

Authors: Andrew N Holding
Journal: Methods Date: 2015-06-12 Impact factor: 3.608

5. Toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry.

Authors: Christian K Frese; A F Maarten Altelaar; Henk van den Toorn; Dirk Nolting; Jens Griep-Raming; Albert J R Heck; Shabaz Mohammed
Journal: Anal Chem Date: 2012-10-31 Impact factor: 6.986

6. Structural characterization by cross-linking reveals the detailed architecture of a coatomer-related heptameric module from the nuclear pore complex.

Authors: Yi Shi; Javier Fernandez-Martinez; Elina Tjioe; Riccardo Pellarin; Seung Joong Kim; Rosemary Williams; Dina Schneidman-Duhovny; Andrej Sali; Michael P Rout; Brian T Chait
Journal: Mol Cell Proteomics Date: 2014-08-26 Impact factor: 5.911

7. Crystal structure of human serum albumin at 2.5 A resolution.

Authors: S Sugio; A Kashima; S Mochizuki; M Noda; K Kobayashi
Journal: Protein Eng Date: 1999-06

8. Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry.

Authors: Zhuo Angel Chen; Anass Jawhari; Lutz Fischer; Claudia Buchen; Salman Tahir; Tomislav Kamenski; Morten Rasmussen; Laurent Lariviere; Jimi-Carlo Bukowski-Wills; Michael Nilges; Patrick Cramer; Juri Rappsilber
Journal: EMBO J Date: 2010-01-21 Impact factor: 11.598

9. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013.

Authors: Juan Antonio Vizcaíno; Richard G Côté; Attila Csordas; José A Dianes; Antonio Fabregat; Joseph M Foster; Johannes Griss; Emanuele Alpi; Melih Birim; Javier Contell; Gavin O'Kelly; Andreas Schoenegger; David Ovelleiro; Yasset Pérez-Riverol; Florian Reisinger; Daniel Ríos; Rui Wang; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2012-11-29 Impact factor: 16.971

10. The beginning of a beautiful friendship: cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes.

Authors: Juri Rappsilber
Journal: J Struct Biol Date: 2010-10-26 Impact factor: 2.867

13 in total

1. A Novel MS-Cleavable Azo Cross-Linker for Peptide Structure Analysis by Free Radical Initiated Peptide Sequencing (FRIPS).

Authors: Claudio Iacobucci; Christoph Hage; Mathias Schäfer; Andrea Sinz
Journal: J Am Soc Mass Spectrom Date: 2017-07-17 Impact factor: 3.109