Proteome-wide crosslinking mass spectrometry studies have coincided with the advent of mass spectrometry (MS)-cleavable crosslinkers that can reveal the individual masses of the two crosslinked peptides. However, recently, such studies have also been published with noncleavable crosslinkers, suggesting that MS-cleavability is not essential. We therefore examined in detail the advantages and disadvantages of using the commonly used MS-cleavable crosslinker, disuccinimidyl sulfoxide (DSSO). Indeed, DSSO gave rise to signature peptide fragments with a distinct mass difference (doublet) for nearly all identified crosslinked peptides. Surprisingly, we could show that it was not these peptide masses that proved the main advantage of MS cleavability of the crosslinker, but improved peptide backbone fragmentation which reduces the ambiguity of peptide identifications. This also holds true for another commonly used MS-cleavable crosslinker, DSBU. We show furthermore that the more intricate MS3-based data acquisition approaches lack sensitivity and specificity, causing them to be outperformed by the simpler and faster stepped higher-energy collisional dissociation (HCD) method. This understanding will guide future developments and applications of proteome-wide crosslinking mass spectrometry.
Proteome-wide crosslinking mass spectrometry studies have coincided with the advent of mass spectrometry (MS)-cleavable crosslinkers that can reveal the individual masses of the two crosslinked peptides. However, recently, such studies have also been published with noncleavable crosslinkers, suggesting that MS-cleavability is not essential. We therefore examined in detail the advantages and disadvantages of using the commonly used MS-cleavable crosslinker, disuccinimidyl sulfoxide (DSSO). Indeed, DSSO gave rise to signature peptide fragments with a distinct mass difference (doublet) for nearly all identified crosslinked peptides. Surprisingly, we could show that it was not these peptide masses that proved the main advantage of MS cleavability of the crosslinker, but improved peptide backbone fragmentation which reduces the ambiguity of peptide identifications. This also holds true for another commonly used MS-cleavable crosslinker, DSBU. We show furthermore that the more intricate MS3-based data acquisition approaches lack sensitivity and specificity, causing them to be outperformed by the simpler and faster stepped higher-energy collisional dissociation (HCD) method. This understanding will guide future developments and applications of proteome-wide crosslinking mass spectrometry.
Crosslinking combined
with mass spectrometry (crosslinking MS)
is a powerful tool for detecting protein–protein interactions
and the structural characterization of proteins. Many key advances
have been made in recent years to expand the complexity of the samples
that can be analyzed with this technology. These include the database
search software,[1−3] false discovery rate (FDR) estimation,[4] and the enrichment of crosslinked peptides.[5−7] One of the key problems when identifying crosslinked peptides is
that one must, in principle, identify two peptides from the same MS1
signal. The search space is therefore initially very large, comprising
every pairwise combination of the peptides that are in the database,
i.e., (n2 + n)/2 crosslinked
peptides (n = number of linear peptides in the database).
This large search space can be reduced experimentally by separating
the crosslinked peptides during the measurement by help of an MS-cleavable
crosslinker such as disuccinimidyl sulfoxide (DSSO)[8] or any of its alternatives.[9]The conceptual advantage of MS-cleavable crosslinkers is evident.
The crosslinker readily cleaves upon activation in the mass spectrometer,
releasing the individual peptides and thereby enabling the measurement
of their individual masses. In the case of the most popular MS-cleavable
crosslinker DSSO, the crosslinker cleaves preferentially at two different
sites, leading to different crosslinker remnants (also called stubs)
for each peptide (Figure a). The asymmetric cleavage of this crosslinker produces a
pair of alkene (A) and sulfenic acid (S) stub fragments.[8] The S stub fragment commonly loses water, forming
the unsaturated thiol (T). The two most frequently observed stub peaks
per peptide, the A and the T fragment, form a signature doublet signal
with a distinct mass difference, allowing their detection and subsequent
calculation of peptide masses.[10]
Figure 1
Statistics
on frequency and intensity of peptide doublet peaks.
(a) Illustration of DSSO cleavage and the resulting signature peptide
doublets with the distinct mass difference Δm. Numbers annotate the intensity rank of the peaks, with the rank
of the more intense of the doublet peaks being the rank of the whole
doublet. (b) Ratio of identified target–target (TT) crosslink
spectrum matches (CSMs) (self and heteromeric) that contain one (lighter
color) or both (darker color) peptide doublets in each dataset (5%
CSM-level FDR). Datasets using stepped HCD (sHCD) are shown in orange-red,
while collision-induced dissociation (CID)-MS3-based methods are in
blue. (c) Fraction of detected doublets from (b) passing each intensity
rank cutoff. Shown is the cumulative proportion of CSMs containing
doublets. Datasets are colored as in (b). Synapse (Syn); Ribosome
(Ribo).
Statistics
on frequency and intensity of peptide doublet peaks.
(a) Illustration of DSSO cleavage and the resulting signature peptide
doublets with the distinct mass difference Δm. Numbers annotate the intensity rank of the peaks, with the rank
of the more intense of the doublet peaks being the rank of the whole
doublet. (b) Ratio of identified target–target (TT) crosslink
spectrum matches (CSMs) (self and heteromeric) that contain one (lighter
color) or both (darker color) peptide doublets in each dataset (5%
CSM-level FDR). Datasets using stepped HCD (sHCD) are shown in orange-red,
while collision-induced dissociation (CID)-MS3-based methods are in
blue. (c) Fraction of detected doublets from (b) passing each intensity
rank cutoff. Shown is the cumulative proportion of CSMs containing
doublets. Datasets are colored as in (b). Synapse (Syn); Ribosome
(Ribo).Knowing the individual peptide
masses simplifies the database search,
as it reduces the search space to pairwise combinations of peptides
with these masses. With the individual peptides released in the mass
spectrometer, one can also design more intricate data acquisition
approaches. The two peptides can be fragmented individually using
MS3, which provides separate fragment information of the two—now
linear—peptides. For this, generally the crosslinked peptide
is fragmented with a low-energy CID fragmentation first, to preferentially
cleave the crosslinker instead of the peptide backbone. Then, signature
doublets are selected for MS3. This approach is routinely employed
by studies that use the protein interaction reporter (PIR) crosslinker[7] and DSSO, while some others using DSSO supplement
this with a complementary electron transfer dissociation (ETD) MS2
spectrum.[11]In an alternative acquisition
method, stepped HCD (sHCD), only
a single MS2 spectrum is recorded for each crosslinked peptide pair.
The peptide is subjected to multiple different collision energies,
and the fragments are recorded in a single MS2 spectrum. This spectrum
should contain the signature doublet (from lower fragmentation energies)
as well as additional backbone fragments (from higher fragmentation
energies). These spectra can be searched in most crosslinking search
tools, with optional filtering for spectra containing cleaved signature
peaks during[2] or after[12] search.Despite the clean crosslinker cleavage producing
dominant signature
peaks in proof-of-concept data of either approach, there is a lack
of statistical data of how often this happens in general. It is unclear
how many crosslinked peptides give rise to doublets, how prominent
these doublets are, and how successful doublet selection is at covering
the peptides. It is therefore unknown how many crosslinked spectra
are left unidentified when relying on these doublets. sHCD compared
favorably to CID methods in the number of crosslinks identified,[13] but a methodical analysis comparing the information
contained in their fragmentation spectra is missing and yet is crucial
for future design of crosslinkers and acquisition methods.MS-cleavable
crosslinkers have been the tool of choice in many
proteome-wide crosslinking MS studies, and it has been suggested that
large-scale crosslinking MS depends on MS-cleavable crosslinkers.[14] While conceptually appealing, these advantages
and potential limitations of MS-cleavable crosslinkers have yet to
be analyzed in detail in “real-world” scenarios—some
comparisons exist, but usually only comparing a few crosslink spectrum
matches (CSMs). We systematically investigated the influence of the
popular MS-cleavable crosslinkers DSSO and DSBU on the fragmentation
of crosslinked peptides. We achieve this using crosslinker search
software that does not rely on the cleaved stubs for identification.
This allowed us to clarify how widespread the cleavage of DSSO and
DSBU actually is and to probe the gain of knowing the individual peptide
masses for identifying crosslinks.
Methods
Database Search
and FDR Filtering
Mass spectrometry
raw data were processed using MSconvert[16] (v3.0.11729) to convert to mgf-file format. A linear peptide search
using xiSEARCH was employed to determine median precursor and fragment
mass errors. Peak list files were then recalibrated to account for
mass shifts during measurement prior to analysis using xiSEARCH3 1.7.6.1 with the following settings: MS1 error tolerances
of 3 ppm; MS2 error tolerance of 5 ppm for the Escherichia
coli lysate dataset and 15 ppm for the others; up
to two missing precursor isotope peaks; tryptic digestion specificity
with up to two missed cleavages; modifications: carbamidomethylation
(Cys, + 57.021464 Da) as fixed and oxidation (Met, + 15.994915 Da),
deamidation (Asn and Gln, + 0.984016 Da), methylation (Glu and Asp,
+ 14.015650 Da), amidated crosslinker (Lys and protein N-terminus,
DSSO–NH2: +175.03031 Da; BS3–NH2: 155.09463 Da; DSBU–NH2 + 213.11134 Da), and hydrolyzed
crosslinker (Lys and protein N-terminus, DSSO–OH: +176.01433
Da; BS3–OH: +156.07864 Da; DSBU–OH + 214.095357 Da)
as variable modifications; maximum number of variable modifications
per peptide: 1; losses: −CH3SOH, −H2O, −NH3, and additionally masses for crosslinker-containing
ions were defined accounting for its cleavability (DSSO A: 54.01056
Da, S: 103.99320 Da, T: 85.98264 Da; DSBU A: 85.05276 Da, B: 111.032028
Da). Crosslink sites for both reagents were allowed for side chains
of Lys, Tyr, Ser, Thr, and the protein N-terminus. Note that we included
a “noncovalent” crosslinker with a mass of zero to flag
spectra potentially arising from gas-phase-associated peptides.[17] These spectra were removed prior to false discovery
rate (FDR) estimation. Results were filtered prior to FDR to matches
having a minimum of three matched fragments per peptide, a delta score
of >15% of the match score and a peptide length of at least six
amino
acids. Additionally, identifications of peptide sequences that are
found in two or more proteins were removed. FDR was estimated using
xiFDR[18] (v2.1.2) on a unique CSM level
to 5% grouped by self- and heteromeric matches. Results of the reanalysis
are deposited in PRIDE with the accession number PXD032821.
Data Evaluation
CSMs passing FDR were re-annotated
with pyXiAnnotator v0.3.4 (https://github.com/Rappsilber-Laboratory/pyXiAnnotator/) with peptide, b-, and y-type ions using MS2 tolerances as described
above. The resulting matched fragments were used to check for the
occurrence of DSSO A-T doublets and to calculate fragment sequence
coverages. We calculated the sequence coverage for our CSMs conservatively,
as the ratio of matched N-terminal and C-terminal sequence fragments
to the number of theoretically possible sequence fragments (i.e.,
100% sequence coverage would mean the detection of at least one fragment
from the N-terminal and one from the C-terminal series between all
amino acid residues of a peptide). For the doublet rank evaluation,
the deisotoped ranks from pyXiAnnotator were used. To evaluate the
MS3 triggering behavior, the MS3 precursor m/z was extracted from the scan header and compared with the
fragment annotation result of the corresponding MS2 CSM. If the MS3
precursor matched a crosslinked peptide stub fragment with 20 ppm
error tolerance, it was counted as correctly triggered.
Results
and Discussion
Prevalence of Peptide Doublets in Fragmentation
Spectra of DSSO
Crosslinked Peptides
We analyzed three publicly available
datasets of DSSO crosslinking experiments coming from three different
labs, differing in acquisition method and sample complexity (Table ). The dataset of
crosslinked E. coli lysate was acquired
using sHCD with a low, medium, and high normalized collision energy
for each MS2.[4] sHCD is also one of two
acquisition methods used to record a dataset of crosslinked, purified
70S ribosomes.[13] In addition to this, Stieger
et al. also employed a CID-MS2-HCD-MS3 approach. For this, first,
a low-energy CID-MS2 was acquired. Then, MS3 was triggered when doublets
of the correct mass difference (32 Da for A-T) were detected (Figure a). Finally, the
third dataset called here “Synapse dataset” covered
crosslinked mouse synaptosomes and was acquired with a CID-MS2-MS3
+ ETD-MS2 approach.[15] As in the Ribosome
dataset, a low-energy CID-MS2 was acquired for doublet detection.
Then, MS3 was acquired as described above, supplemented by an additional
ETD-MS2 on the same MS1 precursor.
To assess
the prevalence of doublets in the fragmentation spectra
of crosslinked peptides, we re-searched the datasets using a search
algorithm that does not rely on peptide doublets for crosslink identification.
After database search and filtering to 5% heteromeric (inter protein)
CSM-level FDR,[18] we looked for signature
A and T stub fragment doublet peaks of the identified peptides and
the intensity rank of these doublets in each spectrum (Figure a).Even though we did
not require doublets to identify crosslinked
peptides, they were very common features in our CSMs. We found doublets
frequently for at least one peptide, independent of dataset and acquisition
method (90–98%) (Figure b). The same trend holds true also for DSBU (Figure S1). The CID acquisitions displayed a higher proportion
of CSMs with both peptide doublets detected compared to the sHCD datasets.
If one looks at only the common identifications of CID and HCD to
make up for the difference in number of identifications, the amount
of doublets detected for both peptides increases noticeably for sHCD
(71%), making the difference to CID (81%) less pronounced (Figure b) as does considering
only single stub peaks (Figure S3).We next looked at the intensity of the doublet peaks across these
datasets, as this is important for their use during acquisition and
data analysis (Figures c and S2 for DSBU). In the majority of
the spectra, the more abundant doublet is among the most intense peaks,
independent of the fragmentation method used. In fact, a doublet peak
is frequently the most abundant peak (34–53% of the doublet-containing
spectra). Almost all (94–98%) doublet-containing spectra have
a peak of the more intense peptide doublet among the 20 most intense
peaks (87% for DSBU).Spectra typically displayed in publication
figures suggest that
also the less intense doublet is seen prominently in CID spectra.
However, this was only the case for 10% (Ribosome) or 20% (Synapse)
of the doublet-containing CID spectra of our investigated data. Nevertheless,
it is seen among the top 20 peaks in 78% (Ribosome) or 91% (Synapse)
of the doublet-containing CID spectra. For the sHCD data, the doublet
ranks are lower, yet still approximately 70% of spectra have them
among the 20 most intense peaks (Figure c).In conclusion, the first doublet
is among the most intense peaks
for the majority of CSMs independent of the fragmentation method,
albeit to a lesser extent for DSBU. While the second doublet increases
confidence in doublet calling, only one peptide doublet is necessary
for deriving both peptide masses, given that we know the precursor
mass. The visibility of the second peptide doublet is crucial, however,
for the successful selection of both peptides for MS3. We therefore
investigated how successful selecting doublets from CID-MS2 spectra
for MS3 was at covering one or both crosslinked peptides, and if this
more complex approach produces more confident identifications than
HCD-MS2.
Speed of HCD Outperforms Higher Sequence Coverage of CID + MS3
The ratio of identified doublets and their intensity ranks are
important criteria for selecting peptides for MS3 fragmentation. However,
absolute numbers of crosslink identifications may also be influenced
by other aspects, such as backbone fragmentation and acquisition speed.
We used the Ribosome dataset to compare these aspects, as it uses
both methods on the same sample. Here, sHCD leads to 1.4 times more
residue pairs identified than CID-MS3.[13]When comparing the common CSMs between CID and sHCD, the overall
sequence coverage in sHCD is higher compared to low-energy CID (Figure a). This comes as
no surprise, as low-energy CID is primarily applied to separate the
crosslinked peptides and not for peptide backbone fragmentation. It
is intentionally combined with MS3 scans and ETD fragmentation to
provide additional sequence information. When we include the corresponding
MS3 scans, the sequence coverage increases noticeably compared to
that of low-energy CID alone. The overall coverage from combining
fragments from CID and MS3 surpasses the sHCD coverage. Therefore,
the backbone fragmentation does not explain the higher number of CSMs
for sHCD.
Figure 2
Speed of HCD outperforms higher sequence coverage of CID + MS3
in the Ribosome dataset. (a) Sequence coverage of common CSMs (n = 776) identified in both sHCD and CID. Additionally,
sequence coverage of CID spectra combined with their respective MS3
scans is shown. Sequence coverage differs between the two crosslinked
peptides, which accordingly are defined as better or worse. Boxplots
depict the median (middle line), upper and lower quartiles (boxes),
and 1.5 times the interquartile range (whiskers). Asterisks indicate
significance calculated by a two-sided Wilcoxon signed-rank (p-value > 0.05: n.s., p-value < 0.0001:
****). (b) Number of acquired MS scans per fragmentation method. Error
bars show the 0.95 confidence interval (n = 7). (c)
Number of triggered MS3 scans per MS2 scan, for CSMs, linear peptide
spectrum matches, and crosslinker-modified linear peptide spectrum
matches, respectively. Error bars show the 0.95 confidence interval.
(d) Proportion of common CID CSMs having no doublets, only one, or
both peptide doublets correctly triggered for MS3.
Speed of HCD outperforms higher sequence coverage of CID + MS3
in the Ribosome dataset. (a) Sequence coverage of common CSMs (n = 776) identified in both sHCD and CID. Additionally,
sequence coverage of CID spectra combined with their respective MS3
scans is shown. Sequence coverage differs between the two crosslinked
peptides, which accordingly are defined as better or worse. Boxplots
depict the median (middle line), upper and lower quartiles (boxes),
and 1.5 times the interquartile range (whiskers). Asterisks indicate
significance calculated by a two-sided Wilcoxon signed-rank (p-value > 0.05: n.s., p-value < 0.0001:
****). (b) Number of acquired MS scans per fragmentation method. Error
bars show the 0.95 confidence interval (n = 7). (c)
Number of triggered MS3 scans per MS2 scan, for CSMs, linear peptide
spectrum matches, and crosslinker-modified linear peptide spectrum
matches, respectively. Error bars show the 0.95 confidence interval.
(d) Proportion of common CID CSMs having no doublets, only one, or
both peptide doublets correctly triggered for MS3.MS3 acquisition schemes require multiple scan and fragmentation
events, while sHCD only acquires a single MS2 scan. This difference
in complexity and, more importantly, acquisition speed is reflected
in the number of total MS2 scans acquired, which on average is almost
3 times lower for the CID-MS3 method because a lot of acquisition
time is spent on acquiring the additional MS3 scans (Figure b). The drastically lower sampling
of precursors for fragmentation will consequently lead to the reduced
detection of crosslinked peptides, which subsequently results in a
lower number of crosslink identifications. This is exacerbated by
many MS3 spectra being acquired for crosslinker-modified and even
for unmodified linear peptides (Figure c). Despite this excessive MS3 triggering, for only
41% of the CSMs, MS3 was triggered correctly on both peptide doublets
(Figure d). This is
also reflected in the wider spread of sequence coverage for the worse
fragmented peptide (Figure a), which is crucial for the unambiguous identification of
both linked peptides.[19] Note also that
for this peptide, the sequence coverage is not significantly increased
in CID + MS3 over sHCD.In this dataset, the speed of sHCD compensates
for its slightly
lower sequence coverage. sHCD also shows a more symmetric fragmentation
of both peptides (Figure a), as the MS3 approach is limited by its dependency on triggering
on the correct doublets (Figure d). Further development of MS3 approaches should focus
on a more sensitive and selective MS3 selection, which in part is
governed by the yield of the crosslinker cleavage.
Peptide Doublets
for Quality Control
While some database
search algorithms have been built around inference of peptide masses
from doublets, others have been built without relying on them. Unarguably,
peptide masses are useful information. In an attempt to quantify their
value, we investigated the target–decoy CSMs (as a representation
of the random matches) for the occurrence of peptide doublets. Because
heteromeric CSMs are the focus of most biological research questions
and are also more challenging to identify, we focused on those for
the analysis.A substantial fraction of random matches has matching
peptide doublets (>47% of heteromeric target–decoy CSMs, Figure a). However, their
extent varies considerably between the datasets. The highest proportion
of doublets among target–decoy CSMs is found in the Ribosome
dataset (75 or 70% for sHCD and CID, respectively). The E. coli dataset contains at least one doublet in
66% of the target–decoy CSMs, while this proportion decreases
to 47% for the Synapse dataset. The amount of identified doublets
present in target–decoy matches seems less dependent on the
acquisition method, and more on the sample and database.
Figure 3
Peptide doublets
as a quality control metric for heteromeric identifications.
(a) Percentage of heteromeric target–decoy CSMs that contain
one or two peptide doublets across datasets at 5% CSM-level FDR. (b)
Proportion of heteromeric CSMs at 5% CSM-level FDR when filtering
spectra to contain a peptide doublet compared to unfiltered data.
(c) Score distribution of heteromeric matches in the E. coli dataset. Distribution of targets and target–decoy
matches with and without filtering for peptide doublets. Dashed lines
show the resulting score cutoffs at 5% FDR. (d) FDR (interpolated
values for visualization) of unfiltered and peptide doublet filtered E. coli data. Synapse (Syn); Ribosome (Ribo).
Peptide doublets
as a quality control metric for heteromeric identifications.
(a) Percentage of heteromeric target–decoy CSMs that contain
one or two peptide doublets across datasets at 5% CSM-level FDR. (b)
Proportion of heteromeric CSMs at 5% CSM-level FDR when filtering
spectra to contain a peptide doublet compared to unfiltered data.
(c) Score distribution of heteromeric matches in the E. coli dataset. Distribution of targets and target–decoy
matches with and without filtering for peptide doublets. Dashed lines
show the resulting score cutoffs at 5% FDR. (d) FDR (interpolated
values for visualization) of unfiltered and peptide doublet filtered E. coli data. Synapse (Syn); Ribosome (Ribo).Although heteromeric target–decoy CSMs contain
peptide doublets,
they do so less often than the heteromeric target–target matches
(21–49 percentage points less for a single doublet and 33–50
percentage points less for both doublets, Figure S4). Based on this difference, we investigated the effect of
using this metric as a quality filter. We prefiltered the search results
to those spectra that contain at least one peptide with a detected
doublet and then reestimated 5% CSM-level FDR. The gains using this
approach are very much dependent on the complexity of the dataset
(Figure b). Unsurprisingly,
the Synapse dataset, which had the least target-decoys containing
a matching doublet, shows the largest gains using this approach (19%).
However, the E. coli dataset only gains
5% in heteromeric CSMs, even though there is a large difference in
the proportion of peptide doublets between target and false matches
(97% vs 66%; Figures S4 and 3a). This led us to investigate the score distribution of doublet-containing
matches in more detail (Figure c).The vast majority of high-scoring target–target
CSMs contain
at least one doublet and are therefore not removed, while targets
without a matched peptide doublet tend to have lower scores. In this
lower-scoring region, there is a steep increase in target–decoy
matches, which is only slightly reduced by pre-filtering for a doublet.
The effect becomes more apparent when looking at the FDR at different
score thresholds. While the increase in error is not as steep for
the filtered matches as for the unfiltered, it still grows exponentially
(Figure d). This holds
true also for the Ribosome datasets and to a lesser extent for the
Synapse dataset (Figures S5–S7).The moderate gains of using doublets for post-search filtering
also suggest that using them during the search will offer only moderate
gains. Presumably, spectra of high quality, which contain doublets,
also tend to contain sufficient peptide fragment peaks so that identification
is possible without relying on peptide mass information.
Comparison
of a Cleavable to a Noncleavable Crosslinker
Noncleavable
crosslinkers are widely believed to be unsuitable for
complex samples.[14,20,21] This bases on an assumption: not knowing the individual peptide
masses for a crosslinked peptide would require an exhaustive combination
of all peptides in the database. This would lead to an explosion of
the search space. However, there are multiple large-scale studies
that have successfully employed a noncleavable crosslinker despite
these assumptions.[4,12,22,23] These are based on a detailed understanding
of how crosslinked peptides fragment[24] that
offered a computational solution to knowing the individual peptide
masses which was then implemented in the search algorithm xiSEARCH3. In light of the successful usages of both types of crosslinkers,
we decided to compare their spectral information to understand any
costs and benefits. In addition to DSSO, the published E. coli dataset also contains data from the noncleavable
crosslinker BS3. As the data for both crosslinkers were prepared and
acquired in a very comparable manner, this dataset offers an opportunity
to directly compare the effects of BS3 to DSSO on a complex mixture
analysis. Importantly, because of its size and the high number of
CSMs identified, the dataset is well suited for statistical evaluation.A manual side-by-side comparison of CSMs identified in both datasets
suggests DSSO to have richer spectra with more fragments. Especially,
fragments containing the crosslinking site appear to be more present,
mostly as fragments containing an A/S/T stub of DSSO (Figures a and S8). We then performed a statistical evaluation of this observation
over common CSMs of the two crosslinkers (Figure b). This confirmed that DSSO led indeed to
a significantly higher sequence coverage than BS3. While the coverage
of linear fragments is very similar between the two crosslinkers,
the coverage of link site-containing fragments is significantly higher
for DSSO, especially for the worse fragmenting peptide (Figure S9). Link site-containing fragments contain
the full second peptide (+P) or, additionally for cleavable crosslinkers,
just a cleaved crosslinker stub. Indeed, A/S/T stub fragments are
the major source of link site-containing fragments for DSSO, while
+P coverage is lower than that of BS3. This means that the increased
sequence coverage for DSSO stems exclusively from cleaved crosslinker
fragments.
Figure 4
Comparison of noncleavable crosslinker BS3 to the MS-cleavable
crosslinker DSSO. (a) Example MS2 spectrum of a high-scoring CSM identified
in both datasets. (Top) CSM from the BS3 dataset. (Bottom) The same
peptide m/z-species identified in
the DSSO dataset. Unique fragments are highlighted in bold. (b) Sequence
coverage of all, linear and link site-containing fragments of all
common CSMs (n = 12919). For DSSO, link site-containing
fragments are additionally separated into fragments containing the
full second peptide (+P) or only the cleaved crosslinker stub (A/S/T).
Boxplots depict the median (middle line), upper and lower quartiles
(boxes), and 1.5 times the interquartile range (whiskers). Asterisks
indicate significance calculated by a two-sided Wilcoxon signed rank
(p-value < 0.0001: ****). (c) Target–target
and target–decoy score distributions of heteromeric CSMs for
BS3 and DSSO. Scores were normalized to their respective score cutoff
at 10% FDR. (d) Number of heteromeric CSMs passing 5% CSM-level FDR
for BS3 and DSSO. As a control, DSSO was additionally searched as
a noncleavable crosslinker and also filtered for the presence of peptide
doublets.
Comparison of noncleavable crosslinker BS3 to the MS-cleavable
crosslinker DSSO. (a) Example MS2 spectrum of a high-scoring CSM identified
in both datasets. (Top) CSM from the BS3 dataset. (Bottom) The same
peptide m/z-species identified in
the DSSO dataset. Unique fragments are highlighted in bold. (b) Sequence
coverage of all, linear and link site-containing fragments of all
common CSMs (n = 12919). For DSSO, link site-containing
fragments are additionally separated into fragments containing the
full second peptide (+P) or only the cleaved crosslinker stub (A/S/T).
Boxplots depict the median (middle line), upper and lower quartiles
(boxes), and 1.5 times the interquartile range (whiskers). Asterisks
indicate significance calculated by a two-sided Wilcoxon signed rank
(p-value < 0.0001: ****). (c) Target–target
and target–decoy score distributions of heteromeric CSMs for
BS3 and DSSO. Scores were normalized to their respective score cutoff
at 10% FDR. (d) Number of heteromeric CSMs passing 5% CSM-level FDR
for BS3 and DSSO. As a control, DSSO was additionally searched as
a noncleavable crosslinker and also filtered for the presence of peptide
doublets.The better sequence coverage of
DSSO-linked peptides improves the
separation of true from false CSMs (Figure c). For heteromeric matches, DSSO has a larger
area under the curve, and especially more high-scoring targets, effectively
leading to an increase in heteromeric CSMs. While for BS3 3308 heteromeric
CSMs were identified, the DSSO dataset resulted in more than twice
as many (7316, +121%) (Figure d). For self-CSMs, only 29% more CSMs were identified with
DSSO than with BS3 (Figure S10), indicating
that self-CSMs are approaching exhaustive coverage at the given experimental
detection limit. Similar results were seen when including retention
time data of heteromeric and self-CSMs.[22]To investigate the effect of the cleaved crosslinker fragments
on the overall crosslink search performance, we performed another
search in which the DSSO crosslinker was treated as noncleavable.
In this search, only 1866 heteromeric CSMs were identified (−74%).
Filtering these results for doublet-containing results, as described
before, increased identifications to 3064. This is, however, still
a loss of 58% of CSMs compared to the search considering DSSO as cleavable.
Also in other datasets, crosslinker cleavability was central to search
success (Figure S11). Collectively, these
observations demonstrate that A/S/T stub fragments play a central
role in the success of DSSO for crosslinking mass spectrometry, especially
for more complex samples.
Conclusions
Our
work finds a surprisingly limited value of doublet information
stemming from crosslinker cleavage for the identification of crosslinks.
Nonetheless, we find cleavable crosslinkers to lead to the identification
of substantially more heteromeric CSMs. We pinpoint improved sequence
coverage as the major contributor to this. This has implications for
how to conduct crosslinking studies and the future development of
the methodology. First, as many suspected but possibly not for the
right reasons, cleavable crosslinkers are preferable for crosslink
mixture analyses.Second, sHCD is the recommended acquisition
method as it achieves
almost the same sequence coverage as CID-MS3, but is much faster.
CID-MS3 currently lacks speed, specificity, and sensitivity. Consequently,
future developments of crosslinkers and acquisition methods should
focus primarily on sequence information, without compromising acquisition
speed. Current choices governing acquisition schemes rely on experimental
comparisons, to which we add a methodological understanding of the
key parameters that govern crosslink identification. With this, we
hope to pave the way for simplified, cost-effective, and standardized
workflows that a wider number of labs can use.
Authors: Francis J O'Reilly; Liang Xue; Andrea Graziadei; Ludwig Sinn; Swantje Lenz; Dimitry Tegunov; Cedric Blötz; Neil Singh; Wim J H Hagen; Patrick Cramer; Jörg Stülke; Julia Mahamid; Juri Rappsilber Journal: Science Date: 2020-07-31 Impact factor: 47.728