Coeluting peptides are still a major challenge for the identification and validation of MS/MS spectra, but carry great potential. To tackle these problems, we have developed the here presented CharmeRT workflow, combining a chimeric spectra identification strategy implemented as part of the MS Amanda algorithm with the validation system Elutator, which incorporates a highly accurate retention time prediction algorithm. For high-resolution data sets this workflow identifies 38-64% chimeric spectra, which results in up to 63% more unique peptides compared to a conventional single search strategy.
Coeluting peptides are still a major challenge for the identification and validation of MS/MS spectra, but carry great potential. To tackle these problems, we have developed the here presented CharmeRT workflow, combining a chimeric spectra identification strategy implemented as part of the MS Amanda algorithm with the validation system Elutator, which incorporates a highly accurate retention time prediction algorithm. For high-resolution data sets this workflow identifies 38-64% chimeric spectra, which results in up to 63% more unique peptides compared to a conventional single search strategy.
Entities:
Keywords:
MS/MS; chimeric spectra; database search; mixed spectra; retention time prediction; tandem mass spectrometry; validation
Advancements in mass
spectrometer instrument precision and acquisition
time[1,2] made mass spectrometry the primary instrument
in proteomics analyses. The interpretation of the measured spectra
is often performed using a database search algorithm.[3−6] Most database search algorithms stick to the “one-spectrum-one-peptide”
paradigm, although the occurrence of coeluting peptides and the accompanied
challenges of chimeric spectra have been widely studied.[7−9] Even though several solutions for processing chimeric spectra already
exist,[10−14] they are still often not used in an everyday proteomics workflow.
In addition, the validation of more than one peptide match per spectrum
(here called mPSM) is an important task,[15] as the confidence score for the most abundant peptide in a spectrum
is not easily comparable to the score of a second coeluting peptide
also present in the spectrum. However, through ignoring this valuable
information a large amount of unique peptides remains unidentified,
as recent studies show that about 50% of all spectra contain more
than one peptide.[7,15]In general, the dynamic
range of proteins is a big challenge in
proteomics experiments.[16] Detecting highly
abundant proteins is a lot simpler than identifying the least abundant
part of the proteome.[16,17] Many approaches have been conducted
to increase proteome coverage and enable deep proteome analysis,[18−25] being more or less straightforward and affordable techniques for
an everyday proteomics workflow.We here propose a combination
of identifying chimeric spectra and
validating detected mPSMs using retention time prediction, jointly
leading to a significant increase in validated unique peptides for
each data set accompanied by higher coverage of low abundant proteins:
the CharmeRT workflow.
Methods
CharmeRT Workflow
The first part of the CharmeRT workflow
identifies chimeric spectra using a second search approach in our
database search engine MS Amanda.[26] The
second part of CharmeRT validates the identified PSMs of first and
second searches using Elutator, a newly developed tool based on the
principles of Percolator,[27] featuring a
new approach for retention time prediction. An overview of the workflow
can be seen in Figure .
Figure 1
Overview of the CharmeRT workflow. After a first search round with
MS Amanda, spectra are cleaned, potential interfering precursors are
identified, and spectra are submitted to a second search round. Resulting
PSMs of the first and the second search are validated by Elutator
using a retention time model.
Overview of the CharmeRT workflow. After a first search round with
MS Amanda, spectra are cleaned, potential interfering precursors are
identified, and spectra are submitted to a second search round. Resulting
PSMs of the first and the second search are validated by Elutator
using a retention time model.
Chimeric Spectra Search in MS Amanda
To identify multiple
peptides per spectrum, a second search approach was implemented in
the database search engine MS Amanda. For each spectrum, all peaks
of the highest scoring peptide identified in the first search are
removed. Basis of this removal are the selected fragment ions in the
search, additionally neutral loss ions can be removed as well. As
interfering peptides may have the same c- or n-terminal amino acid
due to the used enzyme, leading to a shared y1/b1 ion in a mixed spectrum,
y1 ions can optionally be kept, and b1 ions are not considered at
all by MS Amanda. Tests showed that all other potentially shared peaks
can be neglected, as they are very unlikely. We identified an average
overlap of 0.7%, see Supplemental Table S2. Corresponding MS1 spectra are investigated and potential interfering
precursors are determined, optionally performing a preceding deisotoping
of the MS1 spectrum. There are several ways to treat precursor peaks
where the charge state cannot be determined: not considering them,
testing various selectable charge states, or only testing the most
abundant ones of them at different charge states. All spectra are
submitted to a further search lap testing each of the identified precursors
with the option to research the original precursor. For each spectrum,
multiple second search hits, i.e., the best n PSMs
for the top m precursors, are reported.
mPSM Validation
in Elutator
The second part of the
CharmeRT workflow is realized by Elutator, a new tool for validating
identified mPSMs. Elutator is based on the principles of Percolator[27] and validates mPSMs using a set of features
optimized for the analysis of MS Amanda results. A complete list of
all used features is given in the supplemental data (Supplemental Table S1), including the deviation of an estimated
peptide elution retention time (RT) from the actual value, as well
as recalibrated masses for precursor and fragment ions. The most important
features are explained in the next sections.
Elutator Retention Time
Prediction Model
An important
factor in the context of validating mPSMs is the difference between
predicted and measured retention times. Several approaches already
exist to construct RT prediction models.[28−30] However, the
use of these models for validation is often limited due to specific
requirements, such as, a significant amount of training data and
correct handling of chemical modifications. We have therefore developed
a new retention time prediction algorithm: Elutator’s RT model
is based on the SSRCalc[30] model and estimates
the hydrophobicity index of peptides based on their sequences and
chemical modifications, which can be linearly mapped to retention
time. It was significantly redesigned and extended for better performance
but preserves most of the features and ideas of the original SSRCalc
algorithm. The features used for predicting the model include peptide
length, certain properties for special amino acids (e.g., Proline),
the isoelectric charge, properties for short peptides, or parameters
for hydrophobic amino acid patterns likely forming helices, and are
similar to the features described by Krokhin.[31]An important improvement compared to the original model of
SSRCalc for retention time prediction is the consideration of neighboring
effects of amino residuals being not restricted to nearest neighbors
only. Experiments showed a statistically significant effect of amino
residual interactions even for residuals separated by several positions
in the polypeptide chain. A detailed description on how we model these
interactions is given in the Supporting Information.The described features are used in an optimized nonlinear
retention
time model implemented in Elutator. The original formulation of the
model was given by Krokhin for SSRCalc.[31] The parameters (coefficients) of the used model are optimized using
Newton’s method of minimizing the sum of squares of retention
time deviations for all peptides in the training sets. A detailed
information on the model calculation is given in the supplemental
data. The optimization procedure assumes simultaneous training over
several different data sets measured under similar elution conditions
(gradient duration, chemical composition of eluents, column temperature,
etc.). To avoid overfitting, the retention time model has been trained
using 94122 highly reliable PSMs (FDR threshold was 0.001) corresponding
to 44 271 unique sequences obtained from in-house measured
data sets of different organisms: trypsin digested human (HeLa), mouse,
yeast, B. subtilis, E. coli, phosphorylated peptides
from TiO2 enriched human cell lysate, and chymotrypsin
digested human data set. After a preliminary optimization, we removed
0.1% of the outliers, corresponding to the number of expected false
matches, and repeated the optimization. By considering additional
peptide properties, such as the interactions of neighboring amino
residuals in the peptide chain, we considerably increased the RT prediction
accuracy (Figure ).
A similar accuracy of retention time prediction was achieved for phosphorylated
and unmodified peptides (see Supplemental Figure S1).
Figure 2
Comparison of elution retention time prediction models: (a) Elutator,
(b) BioLCCC,[29] (c) SSRCalc,[30] and (d) Elude.[28] Depending
upon the model design the output is either an absolute retention time
or a relative hydrophobicity index, which can be linearly mapped to
the retention time in a particular data set. We here compare the correlation
of predicted and measured retention times of data set I, which is
important for validation. R2 is the coefficient
of determination, and is the dispersion of the error in minutes.
As Elude cannot be trained on multiple raw files, we here used 50%
randomly chosen PSMs over all raw files for training and the others
for testing.
Comparison of elution retention time prediction models: (a) Elutator,
(b) BioLCCC,[29] (c) SSRCalc,[30] and (d) Elude.[28] Depending
upon the model design the output is either an absolute retention time
or a relative hydrophobicity index, which can be linearly mapped to
the retention time in a particular data set. We here compare the correlation
of predicted and measured retention times of data set I, which is
important for validation. R2 is the coefficient
of determination, and is the dispersion of the error in minutes.
As Elude cannot be trained on multiple raw files, we here used 50%
randomly chosen PSMs over all raw files for training and the others
for testing.For practical usage,
the applicability of the trained model on
data sets measured under a different chromatographic setup is of high
interest. Elutator maps the predicted hydrophobicity index to the
observed retention time by applying a linear fitting for all peptides
in a single HPLC run. This allows for an application to data sets
with different setups. We investigated this using a publicly available
externally measured HeLa data set.[32] The
accuracy of the retention time prediction is lower for the external
data set, as can be seen through the correlation coefficient R2. Nevertheless, as demonstrated in Supplemental Figure S2, using retention time
prediction also here leads to a higher number of PSMs. Smaller retention
time dispersion for the external data set can be explained by the
shorter gradient (90 min versus 180 min for the in-house data set).
The smaller gradient duration leads to a proportional decrease of
retention time deviations. Alternatively, a new model can be easily
trained for specific elution conditions using the Elutator RT Trainer
(see Availability).
Combined
Retention Time Score
Besides the deviation
of the predicted RT to the measured RT, Elutator also uses the combined
retention time score as feature for mPSM validation. It includes the
PSM score of the search engine and the retention time deviation obtained
from the retention time model. To calculate a combined score, the
MS Amanda score is recalibrated on the posterior error using linear
regression to define coefficients a and b using the modelwhere f(A) is the probability for a match with score A to
be false (i.e., local FDR), and A is the MS Amanda
score.After this calibration, the combined score is calculated
using the following scoring function:where σ is the dispersion
of the predicted
retention time, calculated considering highly reliable matches (FDR
= 0.001), T is the duration of the linear part of
the gradient, erf is the Gauss error function, and ε is defined
as , where Δt is the
retention time deviation from the predicted value for the scored peptide.
Calibration of Mass Differences
The aim of calibrating
mass differences is to eliminate constant biases in mass measurements
for precursors and fragments to enhance the mass resolution and is
included as additional feature for mPSM validation. In Elutator this
calibration is based on theoretically known masses of highly reliable
matches of the first search (FDR = 0.001, calculated on MS Amanda
score).Recalibration can be done for measured deviations of m/z values, , as well as for relative mass deviations, Δmppm. Elutator uses the following approximation
of mass deviations over retention time t and m/z to determine the calibration coefficients a, b, and c:Results of mass recalibration for a human data set[32] are presented in Supplemental Figure S3. This data set was analyzed with lock mass disabled (available
in Q Exactive instruments, Thermo Fisher Scientific). Constant bias
and variable error seemed to be similar in this case. Activating the
lock mass option partly eliminates a constant bias, but increases
a variable error because it is based on measuring the mass of known
ions present in the spectrum. Therefore, we suggest that disabling
the lock mass is preferable for better mass resolution when PSM validation
by Elutator is used.
Longest Consecutive Series A + B + Y
We introduce a
combined consecutive sequence of N- and C-terminal ions as additional
feature for validation, namely the sequence of a, b, and y ions, which typically constitute
HCD/CID spectra. PSMs with scores close to the FDR threshold contain
relatively few matched fragment peaks; therefore, y ions are likely not able to form any consecutive sequence. However,
longer sequences can be potentially constructed by taking into account a and b ions, which fill gaps between y fragments (see Supplemental Figure S4).
Experiments
In House Data Generation
Samples were reduced and alkylated
using dithiothreotiol (1 μg DTT per 20 μg protein) and
iodacetamide (5 μg per 20 μg protein). Proteins were predigested
with Lys-C at 30 °C for 2 h (1 μg Lys-C per 50 μg
protein in 6 M urea and 12 mM Triethylammonium bicarbonate buffer
(100 mM Ammonium bicarbonate (ABC) buffer for mouse samples)) and
digested overnight with trypsin (Promega, Trypsin Gold, Mass spectrometry
grade) at 37 °C (1 μg trypsin per 30 μg protein,
0.8 M urea in 45 mM Triethylammonium bicarbonate buffer (mouse: 2
M urea with 100 mM ABC buffer)); digestion was stopped by adding concentrated
TFA to a pH of approximately 2. Phosphorylated peptides were enriched
following the in-house TiO2 enrichment protocol,[33] HeLapeptides were obtained following the in-house
HeLa protocol.[34]The HPLC system
used was an UltiMate 3000 HPLC RSLC nano system coupled to an Q Exactive
mass spectrometer (Thermo Fisher Scientific, Bremen, Germany), equipped
with a Proxeon nanospray source (Proxeon, Odense, Denmark). Peptides
were loaded onto a trap column (Thermo Fisher Scientific, Bremen,
Germany, PepMap C18, 5 mm × 300 μm ID, 5 μm particles,
100 Å pore size) at a flow rate of 25 μL/min using 0.1%
TFA as mobile phase. After 10 minutes the trap column was switched
in line with the analytical column (Thermo Fisher Scientific, Bremen,
Germany, PepMap C18, 500 mm × 75 μm ID, 3 μm, 100
Å). Peptides were eluted using a flow rate of 230 nL/min. The
eluting peptides were directly analyzed using hybrid quadrupole-orbitrap
mass spectrometers (Q Exactive or Q Exactive Hybrid, Thermo Fisher).
The Q Exactive mass spectrometer was operated in data-dependent mode
using a full scan (m/z range 350–1650Th,
nominal resolution of 70 000, target value 1E6) followed by MS/MS
scans of the 12 most abundant ions. MS/MS spectra were acquired at
a resolution of 17 500 using normalized collision energy 30%, isolation
widths of 2, 4, or 8, and the target value was set to 5E4. Precursor
ions selected for fragmentation (charge state 2 and higher) were put
on a dynamic exclusion list for 10 s. Additionally, the underfill
ratio was set to 20%, resulting in an intensity threshold of 2E4.
Data Set Description
To assess the quality of the CharmeRT
workflow, we applied it to several different data sets (3 replicates
each, measured on Thermo Q Exactive or Q Exactive Hybrid): several
in-house HeLa tryptic digests with different isolation widths and
different gradient times (data sets A-F, I), an in-house phospho-enriched
HeLa tryptic digest (data set G), and an external HeLa tryptic digest[32] (data set H).HeLa tryptic digest, in-house measurement
(Thermo Q Exactive Hybrid, 1 h gradient (A) and 3 h gradient (B),
2 m/z isolation width, 1 μg, Figure ).
Figure 3
Comparison of identification results of HeLa data sets measured
with various isolation widths and gradient times analyzed with the
CharmeRT workflow. We analyzed triplicates of tryptic HeLa samples
for 2 m/z, 4 m/z, and 8 m/z isolation
width, each either at a gradient time of 1 h or 3 h. Results are given
for 1% FDR calculated at peptide level, showing the (a) number of
identified PSMs in the first and in the second search and (b) number
of unique peptides identified only in the first, only in the second,
and in both searches.
HeLa tryptic digest, in-house measurement
(Thermo Q Exactive Hybrid, 1 h gradient (C) and 3 h gradient (D),
4 m/z isolation width, 1 μg, Figure ).HeLa tryptic digest, in-house measurement
(Thermo Q Exactive Hybrid, 1 h gradient (E) and 3 h gradient (F),
8 m/z isolation width, 1 μg, Figure ).HeLa tryptic digest, in-house measurement,
phospho enrichment (Thermo Q Exactive, 3 h gradient, 2 m/z isolation width, 100 ng, Figure and Figure S1, TiO2 enrichment of phosphorylated peptides).
Figure 4
Comparison of MS Amanda and Elutator with other scoring methods
and validation tools. Comparison was performed using (a) an external
HeLa data set obtained from Michalski et al.[32] (data set H) and (b) an in-house data set of human HeLa after TiO2 enrichment of phosphorylated peptides (data set G). The FDR
threshold of 1% was calculated at PSM level for consistency between
different search tools, which typically operate at PSM level. In cases
where several high confident matches were reported for the same spectrum,
the match with best q-value was selected such that
the number of PSMs corresponds to the number of confidently identified
spectra. Elutator includes features derived from a peptide elution
retention time prediction model. Model training was performed on in–house
data sets, the same model was applied to in-house and external data
sets.
HeLa tryptic digest, external
measurement[32] (Thermo Q Exactive, 90 min
gradient, 4 m/z isolation width,
5 μg, Figure and Figure S2).HeLa tryptic digest, in-house measurement
(Thermo Q Exactive, 3 h gradient, 2 m/z isolation width, 100 ng, Figures and S2)Comparison of identification results of HeLa data sets measured
with various isolation widths and gradient times analyzed with the
CharmeRT workflow. We analyzed triplicates of tryptic HeLa samples
for 2 m/z, 4 m/z, and 8 m/z isolation
width, each either at a gradient time of 1 h or 3 h. Results are given
for 1% FDR calculated at peptide level, showing the (a) number of
identified PSMs in the first and in the second search and (b) number
of unique peptides identified only in the first, only in the second,
and in both searches.Comparison of MS Amanda and Elutator with other scoring methods
and validation tools. Comparison was performed using (a) an external
HeLa data set obtained from Michalski et al.[32] (data set H) and (b) an in-house data set of humanHeLa after TiO2 enrichment of phosphorylated peptides (data set G). The FDR
threshold of 1% was calculated at PSM level for consistency between
different search tools, which typically operate at PSM level. In cases
where several high confident matches were reported for the same spectrum,
the match with best q-value was selected such that
the number of PSMs corresponds to the number of confidently identified
spectra. Elutator includes features derived from a peptide elution
retention time prediction model. Model training was performed on in–house
data sets, the same model was applied to in-house and external data
sets.
Database Search Parameters
When possible, runs have
been performed in Proteome Discoverer 1.4, using Mascot version 2.2.7,
MS Amanda v 1.4.14.9288, and Elutator v 1.14.1.236. For results obtained
with pParse, all raw files have been preprocessed with pParse version
2.0.8 and resulting files submitted to PD 1.4. MaxQuant results were
obtained with version 1.5.5.1, and all settings were set to default
values as this lead to the best performance.The following parameter
settings have been used for MS Amanda, Mascot, and MaxQuant: swissprot
database 2016–06 (human/mouse) including the “cRAP”
contaminants database; trypsin as enzyme; 2 missed cleavages; Carbamidomethyl(C)
as fixed PTM; Oxidation(M) and (for the phosphorylated data set) Phospho(S,T)
as variable modifications. For MS Amanda and Mascot 10 ppm precursor
mass tolerance and 0.02 Da fragment mass tolerance were used.We applied the following additional settings specific for MS Amanda,
where second search has been enabled: MS1 spectrum deisotoping set
to false; keep y1 ion, remove water losses, remove
ammonia losses, and exclude first precursor set to true; top 5 results
per precursor in Figures and 4/top 10 results per precursor
for Supplemental Figure S7.For Mascot
we set the peptide cutoff score to 0.The Elutator FDR threshold
was set to 1% on peptide level for results
in Figure and on
PSM level for the experiments in Figure . For results in Figure , the match with the best q-value was selected in a case when several high confident matches
were reported for the same spectrum, such that the number of PSMs
corresponds to the number of confidently identified spectra. For all
results obtained using Percolator, numbers were obtained applying
an extra Proteome Discoverer node “Multi-confident PSMs fix”,
available at http://ms.imp.ac.at/?goto=charmert. MaxQuant results were filtered manually.
Results
CharmeRT Performance
To demonstrate the performance
of the CharmeRT workflow, we analyzed HeLa samples using different
isolation widths during acquisition. In standard mass spectrometry
experiments, very narrow isolation widths (≤2 m/z) are applied to decrease the probability of coeluting
peptides. However, being able to reliably identify multiple coeluting
peptides per spectra reveals new possibilities for peptide identification
and acquisition. By using broader isolation widths, we were able to
considerably increase the numbers of identified peptides at a constant
FDR (Figure ).Applying the second search approach increased the number of reliable
identifications for all tested isolation widths and gradient times.
Even for narrow isolation widths (2 m/z) and small gradient times (1 h) we observed a considerable number
of validated chimeric spectra, which increased the number of identified
unique peptides by 41% (5360 unique peptides). As expected, the amount
of reliably identified PSMs and peptides in the first search decreases
by 2–15% for broad isolation widths (8 m/z, 14219 PSMs (1 h)/23138 PSMs (3 h)) compared to narrow
isolation widths (2 m/z, 14 506
PSMs (1 h)/27 340 PSMs (3 h)), as spectra complexity increases.
This is alleviated by the chimeric approach, which identified almost
the same number of unique peptides (20 438 (1 h)/28 550
(3 h) unique peptides) compared to the 2 m/z isolation width runs (18 566 (1 h)/31 346
(3 h) unique peptides). In our tests an isolation width of 4 m/z combined with a longer gradient resulted
in the highest number of identified peptides (33 138 unique
peptides) and the deepest insight into the investigated sample. This
results not only in further evidence for already identified proteins,
but also in additional proteins unidentified before (Supplemental Figure S6). Similar results can be achieved for
an external data set:[35] analyzing label-free
data acquired at 1.4 m/z isolation
width we see an average increase in PSMs of 75%, whereas for a TMT
data set measured at a very narrow isolation width of 0.4 m/z only a small amount of chimeric spectra
can be identified (see Supplemental Figure S5).On average, 38% of the reliably identified spectra at 2 m/z isolation width (1 h gradient) were
chimeric spectra (Supplemental Figure S7). This number increases to 53% at an isolation width of 4 m/z (3 h gradient). Additionally, on average,
almost 20% of all reliably identified spectra at 4 m/z contain more than two peptides. Several examples
of randomly drawn identified chimeric spectra of data set D are given
in Supplemental Figures S11–S18.
Comparison to State of the Art Approaches
The combination
of chimeric spectra identification and mPSM validation using the power
of accurate retention time prediction increased the number of identified
PSMs (38373 PSMs (HeLa)/5463 PSMs (enriched phospho data set)) by
up to 129% and considerably outperformed all other methods (Figure , Supplemental Table S3). Compared to the widely used combination
of Mascot and Percolator (17 916 PSMs (HeLa)/4088 PSMs (enriched
phospho data set)), CharmeRT was able to identify 34–114% more
PSMs and 25–62% more unique peptides. Mascot and Percolator
can be additionally improved by using pParse,[36] which enables the detection of mixed spectra (23 841 PSMs
(HeLa)/4488 PSMs (enriched phospho data set)). Still, CharmeRT identified
22–61% more PSMs than this combination.Compared to a
single search strategy, the CharmeRT approach was able to identify
52–90% more PSMs and 23–45% more peptides. In addition,
29–36% of all validated peptides identified in the first search
could be confirmed using the second search. The efficacy of Elutator
was much higher for matches identified in the second search, as the
spectrum quality for coeluting peptides is lower and therefore the
effect of including auxiliary information used in Elutator is higher:
the increase in PSMs was 17–51% for the first search and 106–149%
for the second search (see Supplemental Table S3 and Supplemental Figure S8).
The overall positive effect of retention time prediction appeared
to be 8–15%. Notably, the RT prediction model was applied to
the externally measured data sets without any additional training.Only a minor amount of mixed spectra can be identified when the
second search approach is used on phosphorylated sample. The validation
through Elutator leads to 25% additionally identified PSMs in this
case for the conventional single search compared to Mascot + Percolator.
Chemical modifications hamper spectrum identification due to an increased
combinatorial search space. However, only a small number of mixed
spectra is expected in this case, as the enrichment of phosphorylated
peptides with, for example, titanium dioxide (TiO2) reduces
the overall complexity of the sample.We hypothesized that the
additional peptides identified in the
second search correspond to lower abundant proteins, which typically
are difficult to be identified in standard shotgun workflows.[16,17] If this hypothesis could be confirmed, the dynamic range of mass
spectrometry measurements could effectively be expanded. To validate
our assumption, we used publicly available RNA expression profiles
of HeLa proteins.[37] High reliable peptides
identified in a single raw file (data set D) with a global peptide
level FDR of 1% from first and second search were used to infer 4696
protein groups (Proteome Discoverer 1.4, no additional filters).For 4435 (94%) proteins, nonzero HeLa RNA expressions were found.
The remaining proteins mainly correspond to contaminant proteins or
proteins absent in the RNA expression database (Supplemental Table S4). Of the expressed proteins, 885 (20%)
were identified exclusively in the second search. The statistical
distributions of expression levels of proteins identified in the first
search and second search strongly indicate that activating second
search shifts the sensitivity toward lower abundant proteins (Figure and Supplemental Figure S9). As the correlation between
protein and RNA abundance is only about 40%,[38,39] we support this finding by additionally analyzing a publicly available
spike in data set[40] (see Supplemental Figure S10).
Figure 5
Comparison of protein expression values.
Proteins identified in
the second search (red) correspond in a higher proportion to low abundant
proteins compared to proteins already identified in the first search
(blue). Overall expression values for HeLa cells (gray) have been
taken from ProteinAtlas.[37]
Comparison of protein expression values.
Proteins identified in
the second search (red) correspond in a higher proportion to low abundant
proteins compared to proteins already identified in the first search
(blue). Overall expression values for HeLa cells (gray) have been
taken from ProteinAtlas.[37]
Discussion
We have shown that already
in experiments with narrow isolation
widths (2 m/z, 1 h and 3 h gradient)
a large number of chimeric spectra exists (39%), indicating that coeluting
peptides are a common issue in tandem mass spectra identification.
Still, chimeric spectra generally remain unconsidered, as standard
peptide identification workflows stick to the one-peptide-one-spectrum
approach. By combining chimeric spectra identification and appropriate
validation with retention time prediction, this challenge can be turned
into a major chance. We are able to identify almost up to three-times
as many PSMs as compared to a standard workflow, leading to an increase
of identified unique peptides of up to 63% at 1% FDR (peptide level).
The CharmeRT workflow allows the use of wider isolation widths, which
enable a deeper insight into measured samples. This indicates a possible
expansion suitable for data-independent measurements (DIA). More importantly,
CharmeRT increases the proteome coverage at unaltered acquisition
time, enabling the identification of low abundant proteins at no extra
cost, except for algorithmic runtime. As proteins with regulatory
functions often occur at low abundance,[41] identifying them is essentially important for understanding and
investigating cell mechanisms. By applying CharmeRT, we are able to
expand the sensitivity range of mass spectrum analysis.
Availability
CharmeRT is freely available at http://ms.imp.ac.at/?goto=charmert for Proteome Discoverer 1.4 and 2.2. A version for Proteome Discoverer
2.3 and a standalone version are currently in progress and will be
available soon. In addition, a tool for training RT models on user
specific in-house columns is provided.
Authors: Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén Journal: Science Date: 2015-01-23 Impact factor: 47.728
Authors: Stephane Houel; Robert Abernathy; Kutralanathan Renganathan; Karen Meyer-Arendt; Natalie G Ahn; William M Old Journal: J Proteome Res Date: 2010-08-06 Impact factor: 4.466
Authors: Dorte B Bekker-Jensen; Christian D Kelstrup; Tanveer S Batth; Sara C Larsen; Christa Haldrup; Jesper B Bramsen; Karina D Sørensen; Søren Høyer; Torben F Ørntoft; Claus L Andersen; Michael L Nielsen; Jesper V Olsen Journal: Cell Syst Date: 2017-06-07 Impact factor: 10.304
Authors: Juan Antonio Vizcaíno; Attila Csordas; Noemi del-Toro; José A Dianes; Johannes Griss; Ilias Lavidas; Gerhard Mayer; Yasset Perez-Riverol; Florian Reisinger; Tobias Ternent; Qing-Wei Xu; Rui Wang; Henning Hermjakob Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971
Authors: Philipp Mertins; D R Mani; Kelly V Ruggles; Michael A Gillette; Karl R Clauser; Pei Wang; Xianlong Wang; Jana W Qiao; Song Cao; Francesca Petralia; Emily Kawaler; Filip Mundt; Karsten Krug; Zhidong Tu; Jonathan T Lei; Michael L Gatza; Matthew Wilkerson; Charles M Perou; Venkata Yellapantula; Kuan-lin Huang; Chenwei Lin; Michael D McLellan; Ping Yan; Sherri R Davies; R Reid Townsend; Steven J Skates; Jing Wang; Bing Zhang; Christopher R Kinsinger; Mehdi Mesri; Henry Rodriguez; Li Ding; Amanda G Paulovich; David Fenyö; Matthew J Ellis; Steven A Carr Journal: Nature Date: 2016-05-25 Impact factor: 49.962
Authors: Mak A Saito; Erin M Bertrand; Megan E Duffy; David A Gaylord; Noelle A Held; William Judson Hervey; Robert L Hettich; Pratik D Jagtap; Michael G Janech; Danie B Kinkade; Dagmar H Leary; Matthew R McIlvin; Eli K Moore; Robert M Morris; Benjamin A Neely; Brook L Nunn; Jaclyn K Saunders; Adam I Shepherd; Nicholas I Symmonds; David A Walsh Journal: J Proteome Res Date: 2019-03-12 Impact factor: 4.466
Authors: Patrick T West; Samantha L Peters; Matthew R Olm; Feiqiao B Yu; Haley Gause; Yue Clare Lou; Brian A Firek; Robyn Baker; Alexander D Johnson; Michael J Morowitz; Robert L Hettich; Jillian F Banfield Journal: Microbiome Date: 2021-06-21 Impact factor: 14.650