Sven H Giese1, Adam Belsom1,2, Ludwig Sinn1, Lutz Fischer1,2, Juri Rappsilber1,2. 1. Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany. 2. Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH93BF , United Kingdom.
Abstract
Cross-linking mass spectrometry draws structural information from covalently linked peptide pairs. When these links do not match to previous structural models, they may indicate changes in protein conformation. Unfortunately, such links can also be the result of experimental error or artifacts. Here, we describe the observation of noncovalently associated peptides during liquid chromatography-mass spectrometry analysis, which can easily be misidentified as cross-linked. Strikingly, they often mismatch to the protein structure. Noncovalently associated peptides presumably form during ionization and can be distinguished from cross-linked peptides by observing coelution of the corresponding linear peptides in MS1 spectra, as well as the presence of the individual (intact) peptide fragments in MS2 spectra. To suppress noncovalent peptide formations, increasingly disruptive ionization settings can be used, such as in-source fragmentation.
Cross-linking mass spectrometry draws structural information from covalently linked peptide pairs. When these links do not match to previous structural models, they may indicate changes in protein conformation. Unfortunately, such links can also be the result of experimental error or artifacts. Here, we describe the observation of noncovalently associated peptides during liquid chromatography-mass spectrometry analysis, which can easily be misidentified as cross-linked. Strikingly, they often mismatch to the protein structure. Noncovalently associated peptides presumably form during ionization and can be distinguished from cross-linked peptides by observing coelution of the corresponding linear peptides in MS1 spectra, as well as the presence of the individual (intact) peptide fragments in MS2 spectra. To suppress noncovalent peptide formations, increasingly disruptive ionization settings can be used, such as in-source fragmentation.
The preservation of noncovalent
associations in electrospray ionization (ESI) has been widely used
in the field of native mass spectrometry to study protein interactions.
Major achievements of native mass spectrometry include analyzing the
topology and stoichiometry of multiprotein complexes and the binding
of small molecules to proteins.[1−3] The key premise of the field is
that the observed noncovalent interactions in the gas phase are based
on biologically relevant interactions in the aqueous phase.[4]Another mass spectrometric field that investigates
(non)covalent
interactions of proteins is cross-linking mass spectrometry (CLMS).[5−7] Here, spatially close amino acid residues in native proteins are
covalently linked. This preserves spatial information throughout the
subsequent non-native analytical process, comprising trypsin digestion
of the proteins into peptides and their chromatographic separation
for mass spectrometric detection. A key premise of this field is that
the observed peptide interactions in the gas phase are exclusively
based on covalent links. Note that, for synthetic peptides, gas-phase
peptide–peptide complexes have been observed recently,[8] suggesting that not only proteins but also peptides
can remain associated during mass spectrometric analysis.In
theory, one can construct peptide pairs where mass information
alone cannot differentiate between covalent linkage and noncovalent
association. A peptide pair can reach the same mass either by cross-linking
or by noncovalent association if one of the two peptides carries a
loop-link, that is, the frequent case of a cross-linker reacting with
two amino acid residues so near in sequence that they fall into a
tryptic peptide (Figure S1, Supporting
Information). The concept of mass equivalence between cross-linked
and non-cross-linked peptides has been exploited during data analysis,
when using standard proteomics software for the analysis of cross-linked
peptides, including Mascot[9] to identify
cross-linked peptides[10] and quantitation
software.[11,12] If such noncovalent associations physically
arise, current cross-link analysis could be fooled into misidentifying
analytical artifacts as spatial information.We observed surprising
differences when comparing the identified
cross-links using data acquired on two different mass spectrometers:
a hybrid linear ion trap-Orbitrap mass spectrometer (LTQ Orbitrap
Velos, Thermo Fisher Scientific) and a hybrid quadrupole-Orbitrap
mass spectrometer (Q Exactive, Thermo Fisher Scientific). This led
us to investigate the formation of noncovalent peptide associations
with and without cross-linking. We analyzed cross-linked human serum
albumin (HSA). Using only the monomeric protein band obtained from
sodium dodecyl sulfatepolyacrylamide gel electrophoresis allowed
identified links to be validated against an available three-dimensional
structural model as “ground truth” to reveal suspicious
peptide pairs for detailed interrogation. We then extended this data
analysis to a four-protein mix without employing cross-linking to
test if the noncovalent association is cross-linker-specific.
Materials
and Methods
Data Acquisition
HSA Acquisition and Sample Preparation
Human blood
serum (20 μg aliquots, 1 μg/μL) was cross-linked
using cross-linker-to-protein, weight-to-weight (w/w) ratios of 1:1
and 2:1. Aliquots of human serum diluted with cross-linking buffer
(20 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES)–OH,
20 mM NaCl, 5 mM MgCl2, pH 7.8) were incubated with sulfosuccinimidyl
4,4′-azipentanoate (sulfo-SDA) (Thermo Scientific Pierce, Rockford,
IL), in a reaction volume of 30 μL for 1 h at room temperature.
The diazirine group was then photoactivated by UV irradiation, for
either 10, 20, 40, or 60 min using a UVP CL-1000 UV Cross-linker (UVP
Inc.). Cross-linked samples were separated using gel electrophoresis,
with bands corresponding to monomeric HSA excised and then reduced
with dithiothreitol, alkylated with iodoacetamide, and digested using
trypsin following standard protocols.[10] Peptides were then desalted using C18 StageTips[13] and eluted with 80% acetonitrile, 20% water, and 0.1% trifluoroacetic
acid (TFA).Peptides were analyzed on either a hybrid linear
ion trap/Orbitrap mass spectrometer (LTQ Orbitrap Velos, Thermo Fisher
Scientific) or a hybrid quadrupole/Orbitrap mass spectrometer (Q Exactive,
Thermo Fisher Scientific). In both cases, peptides were loaded directly
onto a spray analytical column (75 μm inner diameter, 8 μm
opening, 250 mm length; New Objectives, Woburn, MA) packed with C18
material (ReproSil-Pur C18-AQ 3 μm; Dr. Maisch GmbH, Ammerbuch-Entringen,
Germany) using an air pressure pump (Proxeon Biosystems).[14]
Orbitrap Velos Analysis
Mobile phase
A consisted of
water and 0.1% formic acid. Peptides were loaded using a flow rate
of 0.7 μL/min and eluted at 0.3 μL/min, using a gradient
with a 1 min linear increase of mobile phase B (acetonitrile and 0.1%
v/v formic acid) from 1% to 9%, increasing linearly to 35% B in 169
min, with a subsequent linear increase to 85% B over 5 min. Eluted
peptides were sprayed directly into the hybrid linear ion trap-Orbitrap
mass spectrometer. MS data were acquired in the data-dependent mode,
detecting in the Orbitrap at 100 000 resolution. The eight
most intense ions in the MS spectrum for each acquisition cycle, with
a precursor charge state of +3 or greater, were isolated with a m/z window of 2 Th and fragmented in the
linear ion trap with collision-induced dissociation (CID) at a normalized
collision energy of 35. Subsequent (MS2) fragmentation spectra were
then recorded in the Orbitrap at a resolution of 7500. Dynamic exclusion
was enabled with single repeat count for 90 s.
Q Exactive
Analysis
Mobile phase A consisted of water
and 0.1% formic acid. Mobile phase B consisted of 80% v/v acetonitrile
and 0.1% formic acid. Peptides were loaded at a flow rate of 0.5 μL/min
and eluted at 0.2 μL/min, using a gradient increasing linearly
from 2% B to 40% B in 169 min, with a subsequent linear increase to
95% B over 11 min. Eluted peptides were sprayed directly into the
hybrid quadrupole-Orbitrap mass spectrometer. MS data (400–1600 m/z) were acquired in the data-dependent
mode, detecting in the Orbitrap at 60 000 resolution. The ten
most intense ions in the MS spectrum, with a precursor charge state
of +3 or greater, were isolated with a m/z window of 2 Th and fragmented by higher-energy collision-induced
dissociation (HCD) at a normalized collision energy of 28. Subsequent
(MS2) fragmentation spectra were recorded in the Orbitrap at a resolution
of 30 000. Dynamic exclusion was enabled with a single repeat
count for 60 s.
HSA, equinemyoglobin, ovotransferrin from chicken
(all from Sigma-Aldrich,
St. Louis, MO), and creatine kinase from rabbit (Roche, Basel, Switzerland)
were dissolved in 8 M urea with 50 mM ammonium bicarbonate to a concentration
of 2 mg/mL each. The proteins were reduced by adding dithiothreitol
at 2.5 mM followed by an incubation for 30 min at 20 °C. Subsequently,
the samples were derivatized using iodoacetamide at 5 mM concentration
for 20 min in the dark at 20 °C. The samples were diluted 1:5
with 50 mM ammonium bicarbonate and digested with trypsin (Pierce
Biotechnology, Waltham, MA) at a protease-to-protein ratio of 1:100
(w/w) during a 16-h incubation period at 37 °C. Then the digestion
was stopped by adding 10% TFA at a concentration of 0.5%. The digests
were cleaned up using the StageTip protocol.[13] The samples were eluted from the C18 phase, partially evaporated
using a vacuum concentrator, and resuspended in mobile phase A (0.1%
formic acid). Two micrograms of tryptic digests were loaded directly
onto a 50 cm EASY-Spray column (Thermo Fisher) packed with C18 stationary
phase and equilibrated to 2% of mobile phase B (80% acetonitrile,
0.1% formic acid) running at a flow of 0.3 μL/min. Peptides
were eluted by increasing mobile phase B content from 2 to 37.5% over
120 min, followed by ramping to 45% and to 95% within 5 min each.
After a washing period of 5 min, the column was re-equilibrated to
2% B. The eluting peptides were sprayed into a Q Exactive High-field
(HF) Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific,
Bremen, Germany). The mass spectrometric measurements in data-dependent
mode were acquired as follows: a full scan from 400 to 1600 m/z with a resolution of 120 000
was recorded to find suitable peptide candidates which were subsequently
quadrupole-isolated within a m/z window of 2 Th and fragmented by HCD at a normalized collision energy
of 28, with fragmentation spectra recorded in the Orbitrap at a resolution
of 30 000. Precursors with charge states from 3 to 6 were selected
for isolation. Dynamic exclusion was set to 15 s. Each cycle allowed
up to ten peptides to be fragmented before a new full scan was triggered.
The effect of in-source collisional activation (ISCID) on the formation
of noncovalently bound peptides was investigated by setting voltages
from 0 to 20 eV in 5 eV increments for each individual run. Each value
tested was probed in shuffled triplicates.
Data Processing
Raw files for cross-linking searches
were processed using MaxQuant[15] (v. 1.6.1.0)
to benefit from the implemented precursor m/z and charge correction. Resulting peak files in APL format
were used to identify peptides in Xi[16] (v.
1.6.739). The database search with Xi used the following parameters:
MS tolerance, 6 ppm; MS2 tolerance, 15 ppm; missed cleavages, 3; enzyme,
trypsin; fixed modifications, carbamidomethylation (cm, +57.02 Da);
variable modifications, oxidation methionine (ox, +15.99 Da). For
sulfo-SDA, the cross-linker mass 82.04 Da and the modifications SDA-loop
(+82.04 Da) and SDA-hyd (+100.05 Da) were used.[17] False discovery rate (FDR) estimation was done using xiFDR[18] (v. 1.1.26.58), using either 5% link FDR (without
boosting) or a 5% peptide spectrum match (PSM) FDR. The Euclidean
cross-link distances within HSA were estimated from mapping the peptide
sequences to the three-dimensional structure when possible (PDB: 1AO6(19)).Searches for noncovalently associated peptides
(NAP) in the absence of cross-linkers were also conducted using Xi
with a feature to search for noncovalently associated peptides. FDR
analysis was done at a 5% PSM level using the formula FDR = (TD –
DD)/TT,[18] after removing all PSMs with
a score less than 1. FDRs were then transformed to q-values, defined as the minimal FDR at which a PSM would pass the
threshold.[20]Linear peptide identifications
from cross-linked acquisitions were
done using MaxQuant. We added the above-defined SDA-loop and SDA-hyd
modifications to the configuration file and allowed up to five modifications
on a peptide together with a maximum of five missed cleavages. Resulting
peptide identifications were filtered at the default FDR of 1%. Non-cross-linked
acquisitions were searched with default settings treating each replicate
as a different experiment in the experimental design.RT profiles
for a given m/z were
extracted using the MS1 (peak picked) raw data after conversion to
mzML using msconvert.[21] The postprocessing
was done in Python using pyOpenMS.[22] RT
profiles were defined as intensity values for a given m/z for the monoisotopic peak and two isotope peaks.
During the developed look-up strategy, the precursor m/z of the identified cross-linked peptide, the m/z of the α peptide, and the m/z of the β peptide were searched
in the MS1 data. The precursor mass matches only the sum of the individual
peptides in a noncovalently associated peptide if one of the two peptides
is SDA-loop-modified. Therefore, the MS1 data was screened for m/z traces of the individual peptides with
and without an added SDA-loop modification. Similarly, all charge
states up to the precursor charge were used. The m/z trace with the largest number of peaks was eventually
selected for each individual peptide. The m/z seeds were all treated similarly; in a RT window of 180
s, the given m/z was searched with
a 20 ppm tolerance. If the m/z was
found, the intensity was extracted. Resulting RT profiles where smoothened
by a moving average with 15 points. For further data processing and
visualization, the RT profile with the most peaks (either monoisotopic,
first isotope, or second isotope peak) was selected.Statistical
analysis and data processing were performed using Python
and the scientific package SciPy.[23] Unless
otherwise noted, we performed significant tests using one-sided Mann–Whitney-U-Tests
with α = 0.05 and continuity correction. We used the following
encoding for p-values: ns, not significant; *, ≤0.05;
**, ≤0.01; ***, ≤0.001. Along with the significance
tests, we provided effect size estimates based on Cohen’s d(24) with pooled standard deviations,
which uses the following classification: small, |d| ≥ 0.2; medium, |d| ≥ 0.5; and large,
|d| ≥ 0.8.The mass spectrometry raw
files, peak lists, search engine results,
MaxQuant parameter files, and FASTA files have been deposited to the
ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository[25] with
the data set identifier PXD010895.
Results and Discussion
The results are divided into 4 parts: (1) Describes the results
from HSA cross-linking using sulfosuccinimidyl 4,4′-azipentanoate
(sulfo-SDA) and then analysis with a Q Exactive (QE) and LTQ Orbitrap
Velos (Velos) mass spectrometer; (2) describes the MS2 properties
of the detected long-distance links (LDL) with the QE and introduces
the hypothesis of noncovalently associated peptides (NAP) enduring
ESI; (3) summarizes intensity and retention time (RT) properties of
the identified PSMs; and (4) shows that noncovalently associated peptides
also occur in the absence of cross-linking.
Instrument Comparison Revealing
a High Number of Suspicious
Cross-links in Q Exactive Data
We started by comparing the
results from cross-linking HSA with sulfo-SDA using two different
mass spectrometers: a Velos and a QE. Cross-linked peptides were identified
using Xi with subsequent FDR filtering using xiFDR at a 5% link-level
FDR. To independently assess the quality of the results, we evaluated
how the identified cross-links matched to the available crystal structure
of HSA. At 5% link FDR, we identified 449 (QE) and 240 (Velos) links,
of which 430 and 231 could be mapped to the available sequence in
the structural model, respectively. The distance distributions of
the mapped cross-links looked similar for links below 22 Å (Figure a). However, for
long distances, the link distributions looked different. The QE data
shows a much higher percentage of links exceeding the 25 Å cutoff,
which is the empirically defined distance limit of SDA cross-linking.[26] This leads to 18% long-distance links (LDL)
for the QE data compared to 2% for the Velos data (Figure a inlet). Since the protein
monomer band was analyzed, the possibility that the LDL were derived
from cross-linked homo-oligomers can be largely neglected. One possibility
is that the deeper analysis on the QE, which is faster and more sensitive
than the Velos, detected a rare protein conformational state. However,
a previous analysis of SDA-cross-linked HSA on the Velos yielded 500
identified links (5% link FDR), with comparatively few LDL (6%).[27] Also, data on the much faster and more sensitive
Fusion Lumos did not return in our hands such proportion of LDLs (data
not shown). This suggests that the QE data does not cover conformational
flexibility of the protein. Instead, the QE data appears to suffer
from a systematic error that leads to many false identifications.
Importantly, this bias affects only target sequences as it is not
controlled by the FDR estimation. If these LDL were indeed based on
false identifications, one could suspect that they were identified
based on weak data and thus derived from low-scoring PSMs. We therefore
compared the highest scoring PSM for each link above and below 25
Å (Figure b).
Remarkably, the LDL showed an even higher average score than the within-distance
links. This difference was small and not significant, but it was still
surprising that the two classes had a similar score distribution.
Next, we manually investigated LDL PSMs to identify characteristics
that might lead to a mechanistic explanation of these links.
Figure 1
Quality control
after cross-link identification at a 5% link FDR.
(a) Results from cross-linking HSA with sulfo-SDA acquired on an Q
Exactive and an LTQ Orbitrap Velos mass spectrometer. The line at
25 Å indicates the distance cutoff for links classified as long
distance. The inlet shows the fraction of long-distance links (LDL)
in each data set. (b) Score comparison between within-distance linkzs
and LDL. LDL showed no significant (ns) deviation from the within-distance
links (two-sided Mann–Whitney-U-test at α = 0.05).
Quality control
after cross-link identification at a 5% link FDR.
(a) Results from cross-linking HSA with sulfo-SDA acquired on an Q
Exactive and an LTQ Orbitrap Velos mass spectrometer. The line at
25 Å indicates the distance cutoff for links classified as long
distance. The inlet shows the fraction of long-distance links (LDL)
in each data set. (b) Score comparison between within-distance linkzs
and LDL. LDL showed no significant (ns) deviation from the within-distance
links (two-sided Mann–Whitney-U-test at α = 0.05).
Long-Distance Links Lacking
Support for Being Cross-Linked
After suspecting a systematic
identification error in QE data,
we manually inspected annotated LDL spectra. We noticed that many
spectra frequently contained unexplained fragment peaks of high intensity.
For example, in the displayed spectrum (Figure a, upper panel), most of the high-intensity
peaks are explained but not the base peak. This PSM was matched with
a very low precursor error of 0.44 ppm and had a very good sequence
coverage in general. However, while many of the linear fragments were
identified, no cross-linked fragments were matched. While there is
convincing evidence that the two identified peptides are correct,
there is a lack of fragment evidence that these peptides were indeed
cross-linked.
Figure 2
Spectral characteristic of noncovalently associated peptides.
(a)
Comparison of the same scan (scan 34887, raw file *V127_F*) searched
with cross-link settings (upper panel) and searched with a noncovalent
association setting (lower panel). (b) Comparison of the explained
intensity in the MS2 spectrum from all PSMs that passed the 5% link-level
FDR. (c) Comparison of cross-linker-containing fragments and linear
fragments in the same set of PSMs as in (b). Number of observations
for (b) and (c): ≤25 Å 2599 PSMs and >25 Å 326
PSMs.
Spectral characteristic of noncovalently associated peptides.
(a)
Comparison of the same scan (scan 34887, raw file *V127_F*) searched
with cross-link settings (upper panel) and searched with a noncovalent
association setting (lower panel). (b) Comparison of the explained
intensity in the MS2 spectrum from all PSMs that passed the 5% link-level
FDR. (c) Comparison of cross-linker-containing fragments and linear
fragments in the same set of PSMs as in (b). Number of observations
for (b) and (c): ≤25 Å 2599 PSMs and >25 Å 326
PSMs.We tested our manual observations
more systematically by comparing
the explained intensity in the MS2 spectrum across all PSMs that passed
the 5% link FDR (Figure b). There is already a twofold increase in the median explained intensity
(EI) of the within-distance links (20% EI) and the LDL (10% EI). This
trend is also supported by a significant MWU test (one-sided, α
= 0.05) and a large Cohen’s d effect size
(d = 0.95). One possible explanation is that the
spectra that yield LDL are simply of poor quality. This can happen
when, for example, peptides of similar m/z were coisolated, the precursor was of low intensity, or
the peptide simply did not fragment or ionize very well. But as shown
in Figure b, the search
engine scores of LDL were slightly higher than the scores from within-distance
links. Therefore, poor spectral quality is not a likely reason for
the large proportion of LDL. However, the number of matched cross-linked
and linear fragments was significantly lower for the long-distance
matches compared to that for the within-distance matches (Figure c).Recently,
it has been proposed that SDA-formed bonds are very susceptible
to MS cleavage when involving a carboxylic acid functional group.[28] In these cases, the annotated spectra would
also show a low EI and a low number of cross-linked fragments with
our search settings. However, it is unclear why such a reaction should
preferentially lead to LDLs. Therefore, we hypothesized that the respective
peptide pairs were not actually cross-linked but were noncovalently
associated. Nevertheless, we investigated this in larger detail by
following the approach of Iacobucci et al.[28] and performed a cleavable cross-linker search on the Velos and the
QE acquisitions (Figure S2). A large portion
of the identifications from the cleavable cross-linker search on the
QE (38%) were long-distance links (presumably noncovalent peptide
associations). However, the distribution of links that match the crystal
structure revealed a preference for short distances, thereby indicating
that MS cleavage of the cross-linker can indeed be observed. So, our
data support both as parallel processes MS-cleavable SDA links and
noncovalent peptide complexes.It would be interesting to investigate
sequence determinants of
noncovalent association. Unfortunately, the lack of ground truth and
the low number of observations make it difficult to investigate sequence-specific
features that lead to noncovalent peptide complexes. While cross-links
should preferentially fall below the distance cutoff, noncovalent
peptide associations should distribute randomly across the distance
histogram. Therefore, some links that match the crystal structure
will also arise from noncovalent associations. Those links falling
above the distance cutoff were too low in number for a statistical
enrichment analysis.
Low Intense Noncovalently Associated Peptides
Arising from Two
Coeluting Peptides
As shown above, LDL frequently achieved
high scores and there was good evidence based on the MS2 fragmentation
that the peptides were correctly identified. Had the peptides paired
noncovalently, this could happen either in solution or during the
ESI process. In the latter case, one would expect the individual peptides
to overlap in their chromatographic elution forming a noncovalent
pair during their coelution. In contrast, for cross-linked peptides
one would not expect any systematic coelution. Therefore, we investigated
the elution of the individual peptides for all identifications (5%
PSM FDR) following a look-up strategy that started from the MS2 trigger
time of the cross-linked PSM (Figure a, for details see Materials and Methods).
Figure 3
Analysis strategy and properties of LDL PSMs. (a) Noncovalent peptide
search. On the basis of a cross-linked PSM (1), the individual peptide
sequences are searched in the MS1 such that the summed mass equals
the precursor mass of the identified cross-link (2). (b) Maximum intensity
(along the m/z trace) for the identified
cross-link and the m/z of the two
individual peptides for links ≤25 Å and >25 Å.
(c)
Spearman correlation of intensity profiles of the cross-link and the
two individual peptides based on m/z matching in a RT window. (d, e) Examples of intensity profiles of
two LDL. Filled stars mark the isolation time point of the precursor
that yielded the identified cross-link. Scaling factors for lower
intensity curves are written above the respective curves (e.g., ×10
equals a factor of 10). Additional information about the PSMs can
be retrieved through the uploaded results in PRIDE through the PSMIDs
7678478210 (d) and 7678602613 (e). (f) RT difference comparison of
LDL and within-distance links.
Analysis strategy and properties of LDL PSMs. (a) Noncovalent peptide
search. On the basis of a cross-linked PSM (1), the individual peptide
sequences are searched in the MS1 such that the summed mass equals
the precursor mass of the identified cross-link (2). (b) Maximum intensity
(along the m/z trace) for the identified
cross-link and the m/z of the two
individual peptides for links ≤25 Å and >25 Å.
(c)
Spearman correlation of intensity profiles of the cross-link and the
two individual peptides based on m/z matching in a RT window. (d, e) Examples of intensity profiles of
two LDL. Filled stars mark the isolation time point of the precursor
that yielded the identified cross-link. Scaling factors for lower
intensity curves are written above the respective curves (e.g., ×10
equals a factor of 10). Additional information about the PSMs can
be retrieved through the uploaded results in PRIDE through the PSMIDs
7678478210 (d) and 7678602613 (e). (f) RT difference comparison of
LDL and within-distance links.We successfully extracted 1458 mass traces for PSMs of links
within
the distance cutoff and 238 mass traces for PSMs of LDLs. For these
PSMs, we then compared the maximum intensity along the mass trace
for the cross-link m/z and the two
individual peptides m/z (Figure b) within a window
of ±90 s. Interestingly, the MS1 signals of long-distance links
had significantly lower intensities than links fitting to the crystal
structure, albeit with small effect size. In contrast, the MS1 intensities
attributed to the individual peptides of LDL were higher by almost
2 orders of magnitude within the elution window compared to the control
(peptides observed in cross-links). This indicates a preference for
coelution of individual peptides with linked peptide pairs in the
case of LDL but not within-distance links.The high signal intensity
of individual peptides of LDL around
the elution of the LDL peptide made us wonder if they coelute. We
investigated the correlation of elution profiles more systematically
by computing the Spearman correlation over the extracted ion chromatogram
(XIC). While the absolute correlation is neither very high for the
within-distance links nor for the LDL, the important feature is the
difference between the two classes (Figure c). The correlations of two single peptide m/z’s with each other—but
also individually with the cross-linked m/z—are all significantly larger for the long-distance
links compared to those for the within-distance links (p-value ≤0.001). The fact that the absolute value of the correlation
is moderate is not surprising as it would be a precondition of noncovalent
association that the individual peptides elute at an overlapping but
not necessarily identical time, as is also seen from two examples
of coeluting and associating peptides (Figure d,e). In the first example, all three m/z species start eluting at a similar
time point. One of them is very abundant (MS1 intensity 1e9), reaching
saturation and showing a long elution tail. This covers the complete
elution time of the second peptide. As expected for an association
product of the two, the LDL peptide then coincides with the elution
of the second peptide. In a second example, the two individual peptides
partially coelute, and the LDL peptide is observed during the time
of their overlapping elution.To our surprise, some cross-links
that match the protein structure
showed correlating MS1 intensities with their linear counterparts,
despite a narrow matching time window. Retention on a reversed phase
is usually very sensitive such that even peptide pairs with different
cross-link sites show a different elution time.[26] We therefore suspected the coeluting MS1 intensities to
be the baseline signal of our look-up strategy, which is solely based
on m/z values and lacks confirmation
through identification data. Hence, we checked for the RTs from the
individual linear peptides relative to the cross-links based on identifications
instead of m/z matching alone. We
compared the cross-link identifications with the closest RT from the
linear identified peptides (with equal modifications and equal composition).
The absolute difference of the individual RTs was mostly close to
0 min for the LDL PSMs and approximately uniformly distributed for
within-distance PSMs (Figure f). The added control (random pairings of RTs from linear
identified peptides that were also part of a cross-linked peptide)
closely resembles the within-distance PSM distribution. However, only
50% of the PSMs have a RT difference smaller than 10 min. The remaining
PSMs have a large RT difference which reduces the possibility of coelution.
Interestingly, PSMs with a RT difference smaller than 10 min have
an average score of 10.0 (n = 32), while the remaining
PSMs (n = 32) have an average score of 6.7. Possibly,
the lower score indicates imprecise peptide identifications and thus
wrong RT times. In addition, matches with large RT differences can
still originate from wrong identifications. Like target-decoy matches
in a cross-link, in a NAP one of the peptides could be correct and
the other might be a random match. In these cases, the RT difference
would also be randomly distributed.
In-Source Fragmentation
Reduction of the Number of Noncovalently
Associated Peptides
On the basis of the results above, one
would predict NAPs to form even without prior cross-linking. The phenomenon
should depend on only peptide concentration and their affinities.
We therefore investigated a four-protein mix without any cross-linker
addition and wondered if noncovalently associated peptides could be
identified. Note that here we changed to a Q Exactive high field.
Indeed, we identified 24 noncovalent peptide associations (Figure ). The formation
of NAP is thus also observable in linear proteomics that do not involve
any cross-linking chemistry. However, the number of NAP identifications
is low and unlikely to affect linear proteomics.
Figure 4
Noncovalently associated
peptide identifications in non-cross-linked
samples. (a) Number of PSMs after 5% PSM FDR in a noncovalent search
and linear identifications (1% FDR). Peptide m/z fraction refers to occurrences where the individual peptide
or precursor peaks are found in multiple charge states in the MS2
spectrum. (b) Noncovalent peptide identification with charge state
3, individual peptide peaks (P) were identified with charge 2 (822.41 m/z) and charge 1 (1643.82 m/z). (c) MS1-derived peptide feature for the PSM
displayed in (b). Top panel shows the summed intensity over the m/z bins. Bottom panel shows the m/z over the RT color-coded by the intensity.
Right panel shows the summed intensity over the RT.
Noncovalently associated
peptide identifications in non-cross-linked
samples. (a) Number of PSMs after 5% PSM FDR in a noncovalent search
and linear identifications (1% FDR). Peptide m/z fraction refers to occurrences where the individual peptide
or precursor peaks are found in multiple charge states in the MS2
spectrum. (b) Noncovalent peptide identification with charge state
3, individual peptide peaks (P) were identified with charge 2 (822.41 m/z) and charge 1 (1643.82 m/z). (c) MS1-derived peptide feature for the PSM
displayed in (b). Top panel shows the summed intensity over the m/z bins. Bottom panel shows the m/z over the RT color-coded by the intensity.
Right panel shows the summed intensity over the RT.Since the involved forces leading to an interaction
are expected
to be rather weak, employing in-source collision-induced dissociation
(ISCID) should reduce the number of identified NAPs. Using an ISCID
of 0, 5, 10, 15, and 20 eV, we find 24, 11, 11, 6, and 3 NAP identifications
at 5% PSM FDR (Figure a). Increasing the ISCID from 0 to 20 results in a 90% decrease of
NAPs identifications. As a control, we also investigated how linear
peptide identifications were affected by these voltages for ISCID
and observed only a minor detrimental effect. Predominantly, we saw
self-associations of the same peptide with all ISCID settings (88%,
64%, 73%, 33%, and 67%) for 0, 5, 10, 15, and 20 eV ISCID). Also,
in cross-linked HSA we saw many self-links of peptides, which initially
perplexed us as these would indicate protein dimerization despite
us having isolated and analyzed the monomer. These cross-linked peptides
now pose strong candidates for NAPs as well. This indicates that special
care must be taken when homomultimers are investigated via CLMS. Note
that homomultimers are not necessarily identified through cross-links
of the same peptide in both instances of the protein. Cross-links
involving overlapping peptide sequences can also indicate homomultimerization
(see Figure S4).We noticed a feature
of MS2 spectra of NAPs that may help identify
them in the future. The intact peptide peaks in multiple charge states
up to the NAP’s precursor charge state are frequently observed
and are of high intensity (Figure a,b). We encountered this in 62% of cases for the ISCID
data set of 0 eV. We are unaware of such charge-reduced precursor
ions in HCD fragmentation spectra of linear peptides and do not see
a single occurrence in our linear peptide data. This adds to NAPs
being revealed at MS1 level through their overlapping elution with
the individual linear peptides. It is unclear if NAP can be avoided
altogether. However, critical assessment of the ionization settings
appears to be advisable for CLMS analyses.For the analysis
of proteins via native MS, one should be aware
that these unspecific associations might be possible too, even under
“normal” LC conditions as we have used here. The exact
conditions that support the formation of NAPs are not known.[29] However, previous studies found that electrostatic
interactions lead to increased stability of noncovalent complexes,[30,31] but also solvent composition and ionization settings[29,32] are crucial. Likely, any parameter influencing the ionization such
as instrument architecture and flow rates play a role. We therefore
tested the influence of three flow rates on the formation of NAPs
but found no differences within our experimental setup (Figure S3).For cross-linking mass spectrometry
experiments, NAPs pose a challenge.
Cross-linking experiments using SDA or similar reagents are more susceptible
to NAP identifications since the cross-linker can form loop-links
on lysine residues, resulting in the same modification mass as a cross-linked
peptide pair. However, the formation of NAPs does not depend on the
cross-linker since we also observed their formation in non-cross-linked
samples. Therefore, in theory, other cross-linkers will also lead
to NAPs. A critical assessment of the specific instrument ionization
settings is thus crucial for successful analysis of CLMS experiments.
If the possible presence of NAPs is ignored, they will lead to wrong
distance constraints. Even though structural-modeling approaches are
to some extent robust to the number of false positives,[27] the influence of a systematic source of false
positives is unknown. Experiments that aim to reconstruct the rough
topology of protein complexes are at high risk of false conclusions
being drawn from these false “cross-links”. Wrong interprotein links and wrong intraprotein/loop-links
might lead to inconclusive results. Therefore, we strongly suggest
reducing the possibility of NAPs, either by optimizing acquisition
settings or heuristic post-acquisition filters.
Significance
of Noncovalently Associated Peptides
We
observe NAPs here during the analysis of an SDA-cross-linked protein.
While SDA is of central importance to high-density CLMS and the development
of cross-linking for protein structure determination, this is a very
young research area with currently few followers. Nevertheless, NAPs
do not require the presence of SDA as we show by our analysis of a
standard four-protein mix, without any cross-linking. The possible
impact of NAPs goes into several directions, where few NAPs could
make an impact. Self-association of loop-linked peptides would also
occur with cross-linkers such as BS3 or DSS, leading to the possibility
of misidentifying NAPs as cross-links. This would then lead to a false
biological conclusion, namely, that a protein self-associates to form
homodimers. Cleavable cross-linkers have the advantage that if a full
set of signature peaks is observed, NAP formation can be ruled out.
Unfortunately, the set of signature peaks is not always complete.[33] Second, our analysis showed that NAPs yield
excellent spectra, often better than cross-linked peptides. When not
considering NAPs, these good spectra can match only one of the associated
peptides correctly, while for the second one the mass would be off
by the assumed presence of a cross-link. This can lead only to a false
target–target (TT) hit or target–decoy (TD) hit. Indeed,
we found in our analysis an example (Figure ) where a high-scoring TD from a cross-link
search matched a TT during a NAP search with improved confidence.
In routine analyses of protein complexes relatively few cross-links
are being detected, so few high-scoring TDs may noticeably reduce
the identified links. This was not the case in our analysis but should
not be dismissed outright and warrants further attention. Finally,
the presence of biologically not functional peptide–peptide
complexes in the gas phase suggests that also the analysis of much
larger proteins with many more interaction possibilities may lead
to such nonbiological associations. Consequently, native mass spectrometry
may require the development of appropriate controls as has been suggested
before.[4]
Figure 5
Butterfly plot of the same spectrum with
different possible explanations.
Upper panel shows the annotation from a cross-linking search (target–decoy
identification). Lower panel shows the annotation from a noncovalent
search (target–target identification). Q Exactive acquisition:
raw file, *V127_K*; scan, 50038.
Butterfly plot of the same spectrum with
different possible explanations.
Upper panel shows the annotation from a cross-linking search (target–decoy
identification). Lower panel shows the annotation from a noncovalent
search (target–target identification). Q Exactive acquisition:
raw file, *V127_K*; scan, 50038.
Conclusion
Self-associations of peptides in solution
has been shown to yield
stable oligomers that endure the ionization process.[32] In addition, the preservation of noncovalent associations
throughout ESI is exploited by native mass spectrometry. Here, we
show that peptides with very similar chromatographic RT behavior can
also remain together during the ionization process under normal liquid
chromatography conditions as they are used in bottom-up proteomics.
This implies that the association process can be unspecific and occur
during normal LC-MS analysis. At the very least, the CLMS field should
be aware of this. Pointing at ionization parameters and post-acquisition
tests, we hope to assist the field in spotting and counteracting this
effect.
Authors: Lars Kolbowski; Swantje Lenz; Lutz Fischer; Ludwig R Sinn; Francis J O'Reilly; Juri Rappsilber Journal: Anal Chem Date: 2022-05-25 Impact factor: 8.008
Authors: Petra S J Ryl; Michael Bohlke-Schneider; Swantje Lenz; Lutz Fischer; Lisa Budzinski; Marchel Stuiver; Marta M L Mendes; Ludwig Sinn; Francis J O'Reilly; Juri Rappsilber Journal: J Proteome Res Date: 2019-12-19 Impact factor: 4.466
Authors: Lisandro H Otero; Sabrina Foscaldi; Giuliano T Antelo; Germán L Rosano; Serena Sirigu; Sebastián Klinke; Lucas A Defelipe; Maximiliano Sánchez-Lamas; Giovanni Battocchio; Valeria Conforte; Adrián A Vojnov; Leonard M G Chavas; Fernando A Goldbaum; Maria-Andrea Mroginski; Jimena Rinaldi; Hernán R Bonomi Journal: Sci Adv Date: 2021-11-24 Impact factor: 14.136
Authors: Swantje Lenz; Ludwig R Sinn; Francis J O'Reilly; Lutz Fischer; Fritz Wegner; Juri Rappsilber Journal: Nat Commun Date: 2021-06-11 Impact factor: 14.919