Literature DB >> 29863353

CharmeRT: Boosting Peptide Identifications by Chimeric Spectra Identification and Retention Time Prediction.

Viktoria Dorfer¹, Sergey Maltsev, Stephan Winkler¹, Karl Mechtler.

Abstract

Coeluting peptides are still a major challenge for the identification and validation of MS/MS spectra, but carry great potential. To tackle these problems, we have developed the here presented CharmeRT workflow, combining a chimeric spectra identification strategy implemented as part of the MS Amanda algorithm with the validation system Elutator, which incorporates a highly accurate retention time prediction algorithm. For high-resolution data sets this workflow identifies 38-64% chimeric spectra, which results in up to 63% more unique peptides compared to a conventional single search strategy.

Entities: CellLine Chemical Disease Gene Species

Keywords: MS/MS; chimeric spectra; database search; mixed spectra; retention time prediction; tandem mass spectrometry; validation

Mesh：

Substances：
Peptides

Year: 2018 PMID： 29863353 PMCID： PMC6079931 DOI： 10.1021/acs.jproteome.7b00836

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Advancements in mass spectrometer instrument precision and acquisition time[1,2] made mass spectrometry the primary instrument in proteomics analyses. The interpretation of the measured spectra is often performed using a database search algorithm.[3−6] Most database search algorithms stick to the “one-spectrum-one-peptide” paradigm, although the occurrence of coeluting peptides and the accompanied challenges of chimeric spectra have been widely studied.[7−9] Even though several solutions for processing chimeric spectra already exist,[10−14] they are still often not used in an everyday proteomics workflow. In addition, the validation of more than one peptide match per spectrum (here called mPSM) is an important task,[15] as the confidence score for the most abundant peptide in a spectrum is not easily comparable to the score of a second coeluting peptide also present in the spectrum. However, through ignoring this valuable information a large amount of unique peptides remains unidentified, as recent studies show that about 50% of all spectra contain more than one peptide.[7,15] In general, the dynamic range of proteins is a big challenge in proteomics experiments.[16] Detecting highly abundant proteins is a lot simpler than identifying the least abundant part of the proteome.[16,17] Many approaches have been conducted to increase proteome coverage and enable deep proteome analysis,[18−25] being more or less straightforward and affordable techniques for an everyday proteomics workflow. We here propose a combination of identifying chimeric spectra and validating detected mPSMs using retention time prediction, jointly leading to a significant increase in validated unique peptides for each data set accompanied by higher coverage of low abundant proteins: the CharmeRT workflow.

Methods

CharmeRT Workflow

The first part of the CharmeRT workflow identifies chimeric spectra using a second search approach in our database search engine MS Amanda.[26] The second part of CharmeRT validates the identified PSMs of first and second searches using Elutator, a newly developed tool based on the principles of Percolator,[27] featuring a new approach for retention time prediction. An overview of the workflow can be seen in Figure .

Figure 1

Overview of the CharmeRT workflow. After a first search round with MS Amanda, spectra are cleaned, potential interfering precursors are identified, and spectra are submitted to a second search round. Resulting PSMs of the first and the second search are validated by Elutator using a retention time model.

Chimeric Spectra Search in MS Amanda

To identify multiple peptides per spectrum, a second search approach was implemented in the database search engine MS Amanda. For each spectrum, all peaks of the highest scoring peptide identified in the first search are removed. Basis of this removal are the selected fragment ions in the search, additionally neutral loss ions can be removed as well. As interfering peptides may have the same c- or n-terminal amino acid due to the used enzyme, leading to a shared y1/b1 ion in a mixed spectrum, y1 ions can optionally be kept, and b1 ions are not considered at all by MS Amanda. Tests showed that all other potentially shared peaks can be neglected, as they are very unlikely. We identified an average overlap of 0.7%, see Supplemental Table S2. Corresponding MS1 spectra are investigated and potential interfering precursors are determined, optionally performing a preceding deisotoping of the MS1 spectrum. There are several ways to treat precursor peaks where the charge state cannot be determined: not considering them, testing various selectable charge states, or only testing the most abundant ones of them at different charge states. All spectra are submitted to a further search lap testing each of the identified precursors with the option to research the original precursor. For each spectrum, multiple second search hits, i.e., the best n PSMs for the top m precursors, are reported.

mPSM Validation in Elutator

The second part of the CharmeRT workflow is realized by Elutator, a new tool for validating identified mPSMs. Elutator is based on the principles of Percolator[27] and validates mPSMs using a set of features optimized for the analysis of MS Amanda results. A complete list of all used features is given in the supplemental data (Supplemental Table S1), including the deviation of an estimated peptide elution retention time (RT) from the actual value, as well as recalibrated masses for precursor and fragment ions. The most important features are explained in the next sections.

Elutator Retention Time Prediction Model

An important factor in the context of validating mPSMs is the difference between predicted and measured retention times. Several approaches already exist to construct RT prediction models.[28−30] However, the use of these models for validation is often limited due to specific requirements, such as, a significant amount of training data and correct handling of chemical modifications. We have therefore developed a new retention time prediction algorithm: Elutator’s RT model is based on the SSRCalc[30] model and estimates the hydrophobicity index of peptides based on their sequences and chemical modifications, which can be linearly mapped to retention time. It was significantly redesigned and extended for better performance but preserves most of the features and ideas of the original SSRCalc algorithm. The features used for predicting the model include peptide length, certain properties for special amino acids (e.g., Proline), the isoelectric charge, properties for short peptides, or parameters for hydrophobic amino acid patterns likely forming helices, and are similar to the features described by Krokhin.[31] An important improvement compared to the original model of SSRCalc for retention time prediction is the consideration of neighboring effects of amino residuals being not restricted to nearest neighbors only. Experiments showed a statistically significant effect of amino residual interactions even for residuals separated by several positions in the polypeptide chain. A detailed description on how we model these interactions is given in the Supporting Information. The described features are used in an optimized nonlinear retention time model implemented in Elutator. The original formulation of the model was given by Krokhin for SSRCalc.[31] The parameters (coefficients) of the used model are optimized using Newton’s method of minimizing the sum of squares of retention time deviations for all peptides in the training sets. A detailed information on the model calculation is given in the supplemental data. The optimization procedure assumes simultaneous training over several different data sets measured under similar elution conditions (gradient duration, chemical composition of eluents, column temperature, etc.). To avoid overfitting, the retention time model has been trained using 94122 highly reliable PSMs (FDR threshold was 0.001) corresponding to 44 271 unique sequences obtained from in-house measured data sets of different organisms: trypsin digested human (HeLa), mouse, yeast, B. subtilis, E. coli, phosphorylated peptides from TiO2 enriched human cell lysate, and chymotrypsin digested human data set. After a preliminary optimization, we removed 0.1% of the outliers, corresponding to the number of expected false matches, and repeated the optimization. By considering additional peptide properties, such as the interactions of neighboring amino residuals in the peptide chain, we considerably increased the RT prediction accuracy (Figure ). A similar accuracy of retention time prediction was achieved for phosphorylated and unmodified peptides (see Supplemental Figure S1).

Figure 2

Comparison of elution retention time prediction models: (a) Elutator, (b) BioLCCC,[29] (c) SSRCalc,[30] and (d) Elude.[28] Depending upon the model design the output is either an absolute retention time or a relative hydrophobicity index, which can be linearly mapped to the retention time in a particular data set. We here compare the correlation of predicted and measured retention times of data set I, which is important for validation. R2 is the coefficient of determination, and is the dispersion of the error in minutes. As Elude cannot be trained on multiple raw files, we here used 50% randomly chosen PSMs over all raw files for training and the others for testing. For practical usage, the applicability of the trained model on data sets measured under a different chromatographic setup is of high interest. Elutator maps the predicted hydrophobicity index to the observed retention time by applying a linear fitting for all peptides in a single HPLC run. This allows for an application to data sets with different setups. We investigated this using a publicly available externally measured HeLa data set.[32] The accuracy of the retention time prediction is lower for the external data set, as can be seen through the correlation coefficient R2. Nevertheless, as demonstrated in Supplemental Figure S2, using retention time prediction also here leads to a higher number of PSMs. Smaller retention time dispersion for the external data set can be explained by the shorter gradient (90 min versus 180 min for the in-house data set). The smaller gradient duration leads to a proportional decrease of retention time deviations. Alternatively, a new model can be easily trained for specific elution conditions using the Elutator RT Trainer (see Availability).

Combined Retention Time Score

Besides the deviation of the predicted RT to the measured RT, Elutator also uses the combined retention time score as feature for mPSM validation. It includes the PSM score of the search engine and the retention time deviation obtained from the retention time model. To calculate a combined score, the MS Amanda score is recalibrated on the posterior error using linear regression to define coefficients a and b using the modelwhere f(A) is the probability for a match with score A to be false (i.e., local FDR), and A is the MS Amanda score. After this calibration, the combined score is calculated using the following scoring function:where σ is the dispersion of the predicted retention time, calculated considering highly reliable matches (FDR = 0.001), T is the duration of the linear part of the gradient, erf is the Gauss error function, and ε is defined as , where Δt is the retention time deviation from the predicted value for the scored peptide.

Calibration of Mass Differences

The aim of calibrating mass differences is to eliminate constant biases in mass measurements for precursors and fragments to enhance the mass resolution and is included as additional feature for mPSM validation. In Elutator this calibration is based on theoretically known masses of highly reliable matches of the first search (FDR = 0.001, calculated on MS Amanda score). Recalibration can be done for measured deviations of m/z values, , as well as for relative mass deviations, Δmppm. Elutator uses the following approximation of mass deviations over retention time t and m/z to determine the calibration coefficients a, b, and c: Results of mass recalibration for a human data set[32] are presented in Supplemental Figure S3. This data set was analyzed with lock mass disabled (available in Q Exactive instruments, Thermo Fisher Scientific). Constant bias and variable error seemed to be similar in this case. Activating the lock mass option partly eliminates a constant bias, but increases a variable error because it is based on measuring the mass of known ions present in the spectrum. Therefore, we suggest that disabling the lock mass is preferable for better mass resolution when PSM validation by Elutator is used.

Longest Consecutive Series A + B + Y

We introduce a combined consecutive sequence of N- and C-terminal ions as additional feature for validation, namely the sequence of a, b, and y ions, which typically constitute HCD/CID spectra. PSMs with scores close to the FDR threshold contain relatively few matched fragment peaks; therefore, y ions are likely not able to form any consecutive sequence. However, longer sequences can be potentially constructed by taking into account a and b ions, which fill gaps between y fragments (see Supplemental Figure S4).

Experiments

In House Data Generation

Samples were reduced and alkylated using dithiothreotiol (1 μg DTT per 20 μg protein) and iodacetamide (5 μg per 20 μg protein). Proteins were predigested with Lys-C at 30 °C for 2 h (1 μg Lys-C per 50 μg protein in 6 M urea and 12 mM Triethylammonium bicarbonate buffer (100 mM Ammonium bicarbonate (ABC) buffer for mouse samples)) and digested overnight with trypsin (Promega, Trypsin Gold, Mass spectrometry grade) at 37 °C (1 μg trypsin per 30 μg protein, 0.8 M urea in 45 mM Triethylammonium bicarbonate buffer (mouse: 2 M urea with 100 mM ABC buffer)); digestion was stopped by adding concentrated TFA to a pH of approximately 2. Phosphorylated peptides were enriched following the in-house TiO2 enrichment protocol,[33] HeLa peptides were obtained following the in-house HeLa protocol.[34] The HPLC system used was an UltiMate 3000 HPLC RSLC nano system coupled to an Q Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany), equipped with a Proxeon nanospray source (Proxeon, Odense, Denmark). Peptides were loaded onto a trap column (Thermo Fisher Scientific, Bremen, Germany, PepMap C18, 5 mm × 300 μm ID, 5 μm particles, 100 Å pore size) at a flow rate of 25 μL/min using 0.1% TFA as mobile phase. After 10 minutes the trap column was switched in line with the analytical column (Thermo Fisher Scientific, Bremen, Germany, PepMap C18, 500 mm × 75 μm ID, 3 μm, 100 Å). Peptides were eluted using a flow rate of 230 nL/min. The eluting peptides were directly analyzed using hybrid quadrupole-orbitrap mass spectrometers (Q Exactive or Q Exactive Hybrid, Thermo Fisher). The Q Exactive mass spectrometer was operated in data-dependent mode using a full scan (m/z range 350–1650Th, nominal resolution of 70 000, target value 1E6) followed by MS/MS scans of the 12 most abundant ions. MS/MS spectra were acquired at a resolution of 17 500 using normalized collision energy 30%, isolation widths of 2, 4, or 8, and the target value was set to 5E4. Precursor ions selected for fragmentation (charge state 2 and higher) were put on a dynamic exclusion list for 10 s. Additionally, the underfill ratio was set to 20%, resulting in an intensity threshold of 2E4.

Data Set Description

To assess the quality of the CharmeRT workflow, we applied it to several different data sets (3 replicates each, measured on Thermo Q Exactive or Q Exactive Hybrid): several in-house HeLa tryptic digests with different isolation widths and different gradient times (data sets A-F, I), an in-house phospho-enriched HeLa tryptic digest (data set G), and an external HeLa tryptic digest[32] (data set H). HeLa tryptic digest, in-house measurement (Thermo Q Exactive Hybrid, 1 h gradient (A) and 3 h gradient (B), 2 m/z isolation width, 1 μg, Figure ).

Figure 3

Comparison of identification results of HeLa data sets measured with various isolation widths and gradient times analyzed with the CharmeRT workflow. We analyzed triplicates of tryptic HeLa samples for 2 m/z, 4 m/z, and 8 m/z isolation width, each either at a gradient time of 1 h or 3 h. Results are given for 1% FDR calculated at peptide level, showing the (a) number of identified PSMs in the first and in the second search and (b) number of unique peptides identified only in the first, only in the second, and in both searches.

HeLa tryptic digest, in-house measurement (Thermo Q Exactive Hybrid, 1 h gradient (C) and 3 h gradient (D), 4 m/z isolation width, 1 μg, Figure ). HeLa tryptic digest, in-house measurement (Thermo Q Exactive Hybrid, 1 h gradient (E) and 3 h gradient (F), 8 m/z isolation width, 1 μg, Figure ). HeLa tryptic digest, in-house measurement, phospho enrichment (Thermo Q Exactive, 3 h gradient, 2 m/z isolation width, 100 ng, Figure and Figure S1, TiO2 enrichment of phosphorylated peptides).

Figure 4

Comparison of MS Amanda and Elutator with other scoring methods and validation tools. Comparison was performed using (a) an external HeLa data set obtained from Michalski et al.[32] (data set H) and (b) an in-house data set of human HeLa after TiO2 enrichment of phosphorylated peptides (data set G). The FDR threshold of 1% was calculated at PSM level for consistency between different search tools, which typically operate at PSM level. In cases where several high confident matches were reported for the same spectrum, the match with best q-value was selected such that the number of PSMs corresponds to the number of confidently identified spectra. Elutator includes features derived from a peptide elution retention time prediction model. Model training was performed on in–house data sets, the same model was applied to in-house and external data sets.

HeLa tryptic digest, external measurement[32] (Thermo Q Exactive, 90 min gradient, 4 m/z isolation width, 5 μg, Figure and Figure S2). HeLa tryptic digest, in-house measurement (Thermo Q Exactive, 3 h gradient, 2 m/z isolation width, 100 ng, Figures and S2) Comparison of identification results of HeLa data sets measured with various isolation widths and gradient times analyzed with the CharmeRT workflow. We analyzed triplicates of tryptic HeLa samples for 2 m/z, 4 m/z, and 8 m/z isolation width, each either at a gradient time of 1 h or 3 h. Results are given for 1% FDR calculated at peptide level, showing the (a) number of identified PSMs in the first and in the second search and (b) number of unique peptides identified only in the first, only in the second, and in both searches. Comparison of MS Amanda and Elutator with other scoring methods and validation tools. Comparison was performed using (a) an external HeLa data set obtained from Michalski et al.[32] (data set H) and (b) an in-house data set of human HeLa after TiO2 enrichment of phosphorylated peptides (data set G). The FDR threshold of 1% was calculated at PSM level for consistency between different search tools, which typically operate at PSM level. In cases where several high confident matches were reported for the same spectrum, the match with best q-value was selected such that the number of PSMs corresponds to the number of confidently identified spectra. Elutator includes features derived from a peptide elution retention time prediction model. Model training was performed on in–house data sets, the same model was applied to in-house and external data sets.

Database Search Parameters

When possible, runs have been performed in Proteome Discoverer 1.4, using Mascot version 2.2.7, MS Amanda v 1.4.14.9288, and Elutator v 1.14.1.236. For results obtained with pParse, all raw files have been preprocessed with pParse version 2.0.8 and resulting files submitted to PD 1.4. MaxQuant results were obtained with version 1.5.5.1, and all settings were set to default values as this lead to the best performance. The following parameter settings have been used for MS Amanda, Mascot, and MaxQuant: swissprot database 2016–06 (human/mouse) including the “cRAP” contaminants database; trypsin as enzyme; 2 missed cleavages; Carbamidomethyl(C) as fixed PTM; Oxidation(M) and (for the phosphorylated data set) Phospho(S,T) as variable modifications. For MS Amanda and Mascot 10 ppm precursor mass tolerance and 0.02 Da fragment mass tolerance were used. We applied the following additional settings specific for MS Amanda, where second search has been enabled: MS1 spectrum deisotoping set to false; keep y1 ion, remove water losses, remove ammonia losses, and exclude first precursor set to true; top 5 results per precursor in Figures and 4/top 10 results per precursor for Supplemental Figure S7. For Mascot we set the peptide cutoff score to 0. The Elutator FDR threshold was set to 1% on peptide level for results in Figure and on PSM level for the experiments in Figure . For results in Figure , the match with the best q-value was selected in a case when several high confident matches were reported for the same spectrum, such that the number of PSMs corresponds to the number of confidently identified spectra. For all results obtained using Percolator, numbers were obtained applying an extra Proteome Discoverer node “Multi-confident PSMs fix”, available at http://ms.imp.ac.at/?goto=charmert. MaxQuant results were filtered manually.

Results

CharmeRT Performance

To demonstrate the performance of the CharmeRT workflow, we analyzed HeLa samples using different isolation widths during acquisition. In standard mass spectrometry experiments, very narrow isolation widths (≤2 m/z) are applied to decrease the probability of coeluting peptides. However, being able to reliably identify multiple coeluting peptides per spectra reveals new possibilities for peptide identification and acquisition. By using broader isolation widths, we were able to considerably increase the numbers of identified peptides at a constant FDR (Figure ). Applying the second search approach increased the number of reliable identifications for all tested isolation widths and gradient times. Even for narrow isolation widths (2 m/z) and small gradient times (1 h) we observed a considerable number of validated chimeric spectra, which increased the number of identified unique peptides by 41% (5360 unique peptides). As expected, the amount of reliably identified PSMs and peptides in the first search decreases by 2–15% for broad isolation widths (8 m/z, 14219 PSMs (1 h)/23138 PSMs (3 h)) compared to narrow isolation widths (2 m/z, 14 506 PSMs (1 h)/27 340 PSMs (3 h)), as spectra complexity increases. This is alleviated by the chimeric approach, which identified almost the same number of unique peptides (20 438 (1 h)/28 550 (3 h) unique peptides) compared to the 2 m/z isolation width runs (18 566 (1 h)/31 346 (3 h) unique peptides). In our tests an isolation width of 4 m/z combined with a longer gradient resulted in the highest number of identified peptides (33 138 unique peptides) and the deepest insight into the investigated sample. This results not only in further evidence for already identified proteins, but also in additional proteins unidentified before (Supplemental Figure S6). Similar results can be achieved for an external data set:[35] analyzing label-free data acquired at 1.4 m/z isolation width we see an average increase in PSMs of 75%, whereas for a TMT data set measured at a very narrow isolation width of 0.4 m/z only a small amount of chimeric spectra can be identified (see Supplemental Figure S5). On average, 38% of the reliably identified spectra at 2 m/z isolation width (1 h gradient) were chimeric spectra (Supplemental Figure S7). This number increases to 53% at an isolation width of 4 m/z (3 h gradient). Additionally, on average, almost 20% of all reliably identified spectra at 4 m/z contain more than two peptides. Several examples of randomly drawn identified chimeric spectra of data set D are given in Supplemental Figures S11–S18.

Comparison to State of the Art Approaches

The combination of chimeric spectra identification and mPSM validation using the power of accurate retention time prediction increased the number of identified PSMs (38373 PSMs (HeLa)/5463 PSMs (enriched phospho data set)) by up to 129% and considerably outperformed all other methods (Figure , Supplemental Table S3). Compared to the widely used combination of Mascot and Percolator (17 916 PSMs (HeLa)/4088 PSMs (enriched phospho data set)), CharmeRT was able to identify 34–114% more PSMs and 25–62% more unique peptides. Mascot and Percolator can be additionally improved by using pParse,[36] which enables the detection of mixed spectra (23 841 PSMs (HeLa)/4488 PSMs (enriched phospho data set)). Still, CharmeRT identified 22–61% more PSMs than this combination. Compared to a single search strategy, the CharmeRT approach was able to identify 52–90% more PSMs and 23–45% more peptides. In addition, 29–36% of all validated peptides identified in the first search could be confirmed using the second search. The efficacy of Elutator was much higher for matches identified in the second search, as the spectrum quality for coeluting peptides is lower and therefore the effect of including auxiliary information used in Elutator is higher: the increase in PSMs was 17–51% for the first search and 106–149% for the second search (see Supplemental Table S3 and Supplemental Figure S8). The overall positive effect of retention time prediction appeared to be 8–15%. Notably, the RT prediction model was applied to the externally measured data sets without any additional training. Only a minor amount of mixed spectra can be identified when the second search approach is used on phosphorylated sample. The validation through Elutator leads to 25% additionally identified PSMs in this case for the conventional single search compared to Mascot + Percolator. Chemical modifications hamper spectrum identification due to an increased combinatorial search space. However, only a small number of mixed spectra is expected in this case, as the enrichment of phosphorylated peptides with, for example, titanium dioxide (TiO2) reduces the overall complexity of the sample. We hypothesized that the additional peptides identified in the second search correspond to lower abundant proteins, which typically are difficult to be identified in standard shotgun workflows.[16,17] If this hypothesis could be confirmed, the dynamic range of mass spectrometry measurements could effectively be expanded. To validate our assumption, we used publicly available RNA expression profiles of HeLa proteins.[37] High reliable peptides identified in a single raw file (data set D) with a global peptide level FDR of 1% from first and second search were used to infer 4696 protein groups (Proteome Discoverer 1.4, no additional filters). For 4435 (94%) proteins, nonzero HeLa RNA expressions were found. The remaining proteins mainly correspond to contaminant proteins or proteins absent in the RNA expression database (Supplemental Table S4). Of the expressed proteins, 885 (20%) were identified exclusively in the second search. The statistical distributions of expression levels of proteins identified in the first search and second search strongly indicate that activating second search shifts the sensitivity toward lower abundant proteins (Figure and Supplemental Figure S9). As the correlation between protein and RNA abundance is only about 40%,[38,39] we support this finding by additionally analyzing a publicly available spike in data set[40] (see Supplemental Figure S10).

Figure 5

Comparison of protein expression values. Proteins identified in the second search (red) correspond in a higher proportion to low abundant proteins compared to proteins already identified in the first search (blue). Overall expression values for HeLa cells (gray) have been taken from ProteinAtlas.[37]

Discussion

We have shown that already in experiments with narrow isolation widths (2 m/z, 1 h and 3 h gradient) a large number of chimeric spectra exists (39%), indicating that coeluting peptides are a common issue in tandem mass spectra identification. Still, chimeric spectra generally remain unconsidered, as standard peptide identification workflows stick to the one-peptide-one-spectrum approach. By combining chimeric spectra identification and appropriate validation with retention time prediction, this challenge can be turned into a major chance. We are able to identify almost up to three-times as many PSMs as compared to a standard workflow, leading to an increase of identified unique peptides of up to 63% at 1% FDR (peptide level). The CharmeRT workflow allows the use of wider isolation widths, which enable a deeper insight into measured samples. This indicates a possible expansion suitable for data-independent measurements (DIA). More importantly, CharmeRT increases the proteome coverage at unaltered acquisition time, enabling the identification of low abundant proteins at no extra cost, except for algorithmic runtime. As proteins with regulatory functions often occur at low abundance,[41] identifying them is essentially important for understanding and investigating cell mechanisms. By applying CharmeRT, we are able to expand the sensitivity range of mass spectrum analysis.

Availability

CharmeRT is freely available at http://ms.imp.ac.at/?goto=charmert for Proteome Discoverer 1.4 and 2.2. A version for Proteome Discoverer 2.3 and a standalone version are currently in progress and will be available soon. In addition, a tool for training RT models on user specific in-house columns is provided.

41 in total

1. pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra.

Authors: Zuo-Fei Yuan; Chao Liu; Hai-Peng Wang; Rui-Xiang Sun; Yan Fu; Jing-Fen Zhang; Le-Heng Wang; Hao Chi; You Li; Li-Yun Xiu; Wen-Ping Wang; Si-Min He
Journal: Proteomics Date: 2011-12-20 Impact factor: 3.984

2. Proteomics. Tissue-based map of the human proteome.

Authors: Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal: Science Date: 2015-01-23 Impact factor: 47.728

Review 3. The challenge of the proteome dynamic range and its implications for in-depth proteomics.

Authors: Roman A Zubarev
Journal: Proteomics Date: 2013-03 Impact factor: 3.984

4. MixGF: spectral probabilities for mixture spectra from more than one peptide.

Authors: Jian Wang; Philip E Bourne; Nuno Bandeira
Journal: Mol Cell Proteomics Date: 2014-09-15 Impact factor: 5.911

5. Mass-spectrometry-based draft of the human proteome.

Authors: Mathias Wilhelm; Judith Schlegl; Hannes Hahne; Amin Moghaddas Gholami; Marcus Lieberenz; Mikhail M Savitski; Emanuel Ziegler; Lars Butzmann; Siegfried Gessulat; Harald Marx; Toby Mathieson; Simone Lemeer; Karsten Schnatbaum; Ulf Reimer; Holger Wenschuh; Martin Mollenhauer; Julia Slotta-Huspenina; Joos-Hendrik Boese; Marcus Bantscheff; Anja Gerstmair; Franz Faerber; Bernhard Kuster
Journal: Nature Date: 2014-05-29 Impact factor: 49.962

6. Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies.

Authors: Stephane Houel; Robert Abernathy; Kutralanathan Renganathan; Karen Meyer-Arendt; Natalie G Ahn; William M Old
Journal: J Proteome Res Date: 2010-08-06 Impact factor: 4.466

7. An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes.

Authors: Dorte B Bekker-Jensen; Christian D Kelstrup; Tanveer S Batth; Sara C Larsen; Christa Haldrup; Jesper B Bramsen; Karina D Sørensen; Søren Høyer; Torben F Ørntoft; Claus L Andersen; Michael L Nielsen; Jesper V Olsen
Journal: Cell Syst Date: 2017-06-07 Impact factor: 10.304

8. DeMix workflow for efficient identification of cofragmented peptides in high resolution data-dependent tandem mass spectrometry.

Authors: Bo Zhang; Mohammad Pirmoradian; Alexey Chernobrovkin; Roman A Zubarev
Journal: Mol Cell Proteomics Date: 2014-08-06 Impact factor: 5.911

9. 2016 update of the PRIDE database and its related tools.

Authors: Juan Antonio Vizcaíno; Attila Csordas; Noemi del-Toro; José A Dianes; Johannes Griss; Ilias Lavidas; Gerhard Mayer; Yasset Perez-Riverol; Florian Reisinger; Tobias Ternent; Qing-Wei Xu; Rui Wang; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

10. Proteogenomics connects somatic mutations to signalling in breast cancer.

Authors: Philipp Mertins; D R Mani; Kelly V Ruggles; Michael A Gillette; Karl R Clauser; Pei Wang; Xianlong Wang; Jana W Qiao; Song Cao; Francesca Petralia; Emily Kawaler; Filip Mundt; Karsten Krug; Zhidong Tu; Jonathan T Lei; Michael L Gatza; Matthew Wilkerson; Charles M Perou; Venkata Yellapantula; Kuan-lin Huang; Chenwei Lin; Michael D McLellan; Ping Yan; Sherri R Davies; R Reid Townsend; Steven J Skates; Jing Wang; Bing Zhang; Christopher R Kinsinger; Mehdi Mesri; Henry Rodriguez; Li Ding; Amanda G Paulovich; David Fenyö; Matthew J Ellis; Steven A Carr
Journal: Nature Date: 2016-05-25 Impact factor: 49.962

16 in total

1. Bolt: a New Age Peptide Search Engine for Comprehensive MS/MS Sequencing Through Vast Protein Databases in Minutes.

Authors: Amol Prakash; Shadab Ahmad; Swetaketu Majumder; Conor Jenkins; Ben Orsburn
Journal: J Am Soc Mass Spectrom Date: 2019-08-26 Impact factor: 3.109

2. Understanding and Eliminating the Detrimental Effect of Thiamine Deficiency on the Oleaginous Yeast Yarrowia lipolytica.

Authors: Caleb Walker; Seunghyun Ryu; Richard J Giannone; Sergio Garcia; Cong T Trinh
Journal: Appl Environ Microbiol Date: 2020-01-21 Impact factor: 4.792

3. DeepLC can predict retention times for peptides that carry as-yet unseen modifications.

Authors: Robbin Bouwmeester; Ralf Gabriels; Niels Hulstaert; Lennart Martens; Sven Degroeve
Journal: Nat Methods Date: 2021-10-28 Impact factor: 28.547

Review 4. Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing.

Authors: Mak A Saito; Erin M Bertrand; Megan E Duffy; David A Gaylord; Noelle A Held; William Judson Hervey; Robert L Hettich; Pratik D Jagtap; Michael G Janech; Danie B Kinkade; Dagmar H Leary; Matthew R McIlvin; Eli K Moore; Robert M Morris; Benjamin A Neely; Brook L Nunn; Jaclyn K Saunders; Adam I Shepherd; Nicholas I Symmonds; David A Walsh
Journal: J Proteome Res Date: 2019-03-12 Impact factor: 4.466

5. Global Identification of Post-Translationally Spliced Peptides with Neo-Fusion.

Authors: Zach Rolfs; Stefan K Solntsev; Michael R Shortreed; Brian L Frey; Lloyd M Smith
Journal: J Proteome Res Date: 2018-10-31 Impact factor: 4.466

6. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics.

Authors: Matthew The; Lukas Käll
Journal: Nat Commun Date: 2020-06-26 Impact factor: 14.919

7. DART-ID increases single-cell proteome coverage.

Authors: Albert Tian Chen; Alexander Franks; Nikolai Slavov
Journal: PLoS Comput Biol Date: 2019-07-01 Impact factor: 4.475

8. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis.

Authors: Bo Wen; Kai Li; Yun Zhang; Bing Zhang
Journal: Nat Commun Date: 2020-04-09 Impact factor: 14.919

9. Chimera Spectrum Diagnostics for Peptides Using Two-Dimensional Partial Covariance Mass Spectrometry.

Authors: Taran Driver; Nikhil Bachhawat; Leszek J Frasinski; Jonathan P Marangos; Vitali Averbukh; Marina Edelson-Averbukh
Journal: Molecules Date: 2021-06-18 Impact factor: 4.411

10. Genetic and behavioral adaptation of Candida parapsilosis to the microbiome of hospitalized infants revealed by in situ genomics, transcriptomics, and proteomics.

Authors: Patrick T West; Samantha L Peters; Matthew R Olm; Feiqiao B Yu; Haley Gause; Yue Clare Lou; Brian A Firek; Robyn Baker; Alexander D Johnson; Michael J Morowitz; Robert L Hettich; Jillian F Banfield
Journal: Microbiome Date: 2021-06-21 Impact factor: 14.650