Literature DB >> 31890549

Let me infuse this for you - A way to solve the first YPIC challenge.

Britta Eggers¹, Sandra Pacharra¹, Martin Eisenacher¹, Katrin Marcus¹, Julian Uszkoreit¹.

Abstract

In a common proteomics analysis today, the origins of our sample in the vial are known and therefore a database dependent approach to identify the containing peptides can be used. The first YPIC challenge though provided us with 19 synthetic peptides, which together formed an English sentence. For the identification of these peptides, a de-novo approach was used, which brought us together with an internet search engine to the hidden sentence. But only having the sentence was not sufficient for us, we also wanted to identify as many as possible of the spectra in our data. Therefore, we created and refined a database approach from the de-novo method and finally could identify the peptide-sentence with a good overlap.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31890549 PMCID： PMC6924283 DOI： 10.1016/j.euprot.2019.07.007

Source DB: PubMed Journal: EuPA Open Proteom ISSN： 2212-9685

Introduction

The EuPA (European Proteomics Association) Young Proteomics Investigators Club (YPIC) prepared a challenge for its members. The task sounded very simple in the beginning: you will be provided by a solution of 19 synthetic peptides, which together form an English sentence. The participants of the challenge were free to choose the mass spectrometrical proteomics approach of their choice to find out this sentence and identify the peptides in the vial. But the devil was in the detail: while most commonly a database approach in proteomics is used to identify the peptides, this was not possible to do here, as we had no known biological species representing “English language” and no hint, what sentence the peptides might build. Therefore, a less widely used de-novo approach had to be used for the spectra annotation. Finally, with mixing de-novo and traditional database approaches, we were able to find the hidden sentence from a well-known book, though unfortunately we were not able to identify all peptides with our measurements.

Material and methods

Sample preparation

The provided synthetic peptides were kept in 30% acetonitrile (CAN). The description provided us with the information, that roughly 0.5 nmol/peptide were assigned in the mixture and in total 19 peptides were combined in the peptide mixture. Prior to MS analysis the peptide mixture was further diluted to ensure proper measurements and to prevent overloading of the column. Peptides measurements were performed with 15 fold and 30 fold dilution of the sample, leading to a concentration of approximately 15.83 pmol/μL (15 fold) and 7,9 pmol/μL (30 fold).

Label free data dependent acquisition

The Nano HPLC analysis was performed on an UltiMate 3000 RSLC nano LC system (Dionex, Idstein, Germany) as described in [1]. The HPLC system was online-coupled to the nano ESI source of a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Germany). Full MS spectra were scanned in a range between 350 and 1,400 m/z with a resolution of 60,000 at 200 m/z for the detection of precursor ions (AGC target 3 × 106, 80 ms maximum injection time). The spray voltage was set to 1,500 V (+), and the capillary temperature to 275 °C. Lock mass polydimethylcyclosiloxane (445.120 m/z) was used for recalibration. The m/z values initiating MS/MS were set on a dynamic exclusion list for 30 s, and the top ten most intensive ions (charge state +2, +3, +4) were selected for fragmentation. MS/MS fragments were generated by high-energy collision-induced dissociation (HCD) with a normalized collision energy (NCE) of 28%, fixed first mass of 100.0 m/z and an isolation window of 1.6 m/z. The fragments were analysed in an orbitrap analyser with a resolution of 30,000 at 200 m/z (AGC 1x106, maximum injection time 120 ms). In total, we only used two LC–MS/MS measurements for the analysis of the YPIC samples.

Direct infusion analysis

Samples were loaded in a 250 μL Hamilton syringe and injected by a syringe pump (flow rate 3 μL/min) into the HESI source and were measured for 2.5 min with a full MS dd MS² method on a QExactive HF mass spectrometer (Thermo Scientific) In the ESI-MS/MS analysis, full MS spectra were scanned in a range between 350 and 1,400 m/z with a resolution of 60,000 at 200 m/z for the detection of precursor ions (AGC target 3 × 106, 80 ms maximum injection time). The spray voltage was set to 1,500 V (+), and the capillary temperature to 275 °C. Lock mass polydimethylcyclosiloxane (445.120 m/z) was again used for internal recalibration. The m/z values initiating MS/MS were set on a dynamic exclusion list for 5 s, and the top ten most intensive ions (all charge states except unassigned and 1) were selected for fragmentation experiments with an NCE of either 25%, 28% or 30% were used as well as a stepped gradient going from 26% to 27% up to 29% NCE. For the YPIC analysis we used five of these direct infusion measurements.

Data analysis

By the given task and the fact, that the synthetic peptides were actual English words and no common peptides, it was clear we had to use a de novo spectrum identification method instead of the usual database searches. It might have been interesting to create a FASTA file for a complete English dictionary using the given translation code and special modifications for some letters, though finally we decided to use the free and open source tool DeNovoGUI [2] (version 1.15.11) for the data interpretation. Prior to the actual analysis, the recorded RAW files were converted into mzML and MGF using ProteoWizard’s msConvert [3]. The resulting MGF of one of the intuitively best-looking LC–MS/MS run was fed into DeNovoGUI using the pNovo [4] and Novor [5] algorithms. As search parameters we initially set the strict default settings for the QExactive HF: parent mass tolerance of 5 ppm, fragment mass tolerance of 20 mmu. As we were told, that there might be some modifications which should also be interpreted as special letters in the final solution, we allowed the variable modifications acetylated lysine, phosphorylated serine and methylated arginine, besides the obligatory oxidation of methionine. With these settings, DeNovoGUI was able to identify 6616 spectra, of which the most identifications were rather bad. Nevertheless, we went further and inspected the best hits visually. Here, it was good to know that we actually had only 19 peptides in the mixture, therefore in an ideal world we would have had only 19 different parent masses to inspect. Even though there were more masses in the results due to fragments and maybe synthesising artefacts, the number of these spectra was limited and therefore feasible for a first inspection. One thing we found out relatively fast was also the fact, that the Novor results seemed to be more accurate than the pNovo results. So, after sorting the results by the Novor score, we ended up in finding our first peptide: The spectra corresponding to 465.75 m/z had a good identification of the sequence WLTHFAR. As leucine and isoleucine are known to be indistinguishable by LC–MS/MS, some thinking brought us to the sequence WITHFAR. At this point, we set a threshold to identify at least five sequences by further inspection. For this, the results were sorted by m/z and Novor score and further inspected. After some time, we found four more, relatively well annotated sequences: SENSITIVEMkRE, SkEVENTHATkF, THEMETHkDIS and ANYkTHERMETHkD. These five sequences obviously translated to the English phrases “with far”, “sensitive more”, “so even that of”, “the method is” and “any other method”. As we were stuck and our eyes hurt from inspecting spectra, we came up with the idea, that five of 19 peptides might be enough to find the actual sentence, if it is from a known source. Therefore, we used a well-known internet search engine, typed in the phrases and found the following sentences: "I feel sure that there are many problems in chemistry, which could be solved with far greater ease by this than any other method. The method is surprisingly sensitive - more so than even that of spectrum analysis, requires an infinitesimal amount of material, and does not require this to be specially purified." These paragraph was from the book “Rays of Positive Electricity and Their Application to Chemical Analyses” by Sir J. J. Thomson [6]. The rest of the analysis was straightforward: Taking the sentences, all possible peptides with a length of 5–50 amino acids were created and put into a FASTA file for searching by X!Tandem [7] with the same settings as described for DeNovoGUI. As cleavage enzyme we set Trypsin, but with up to 10 allowed missed cleavages. Constraining the peptides and leaving out the parts, which we already had identified, we had 377 peptides in our FASTA database. We now searched all our MS/MS files, the LC based and the direct infusions, with this database and cut all identifications at the 0.01 X!Tandem expectation score. The peptides were further filtered to have at least 10 spectra per sequence. The longest continuous sequences were taken and added to the database, meaning if “ANALYSISREQ” and “ANALYSISREQRIRES” was found, only “ANALYSISREQRIRES” was added. With this, the original sentences were in-silico digested again, taking the newly found sequences as ground truth. This process was iterated five times, always enhancing the peptides in the database.

Results

With the described method of using X!Tandem and enhancing the database with peptide sequences, we finally ended up with a FASTA containing 19 entries, one for each expected peptide. Even though we found m/z traces in our data for all of these peptides, we could confidently identify only twelve of the peptides, three could be identified only sparsely with opening up the tolerances and one more only by fragments of the peptide (see Table 1). Three peptides could not be identified at all with our recorded data. Maybe, here other techniques like PRM or the injection of higher amounts could have helped.

Table 1

Rank (by number of identified spectra)	Peptide	Text in original sentence
1	ANALYSISREQRIRES	ANALYSIS, REQUIRES
2	SENSITIVEMKRE	SENSITIVE - MORE
3	SKEVENTHATKF	SO than EVEN THAT OF
4	THEMETHKDIS	THE METHOD IS
5	WITHFAR	WITH FAR
6	ANDDKESNKTREQRIRE	AND DOES NOT REQUIRE
7	THISTKSE	THIS TO BE
8	PRRIFIED	PURIFIED
9	SYTHISTHAN	BY THIS THAN
10	ANYKTHERMETHKD	ANY OTHER METHOD
11	AMKRNTKFMATERIAL	AMOUNT OF MATERIAL
12	AREMANYPRKSLEMSIN	ARE MANY PROBLEMS IN
13	SRRPRISINGLY	SURPRISINGLY
14	IFEELSRRETHATTHERE	I FEEL SURE THAT THERE
15	SPECIALLY	SPECIALLY

The peptides which could finally be identified by MS/MS spectra. In bold the peptides, which were also spotted in the de-novo analysis are highlighted. The peptide on rank 13–15 (in italics) could only be identified after widening the parent and fragment mass tolerances. With these identifications we could not completely cover the whole sentence or identify all 19 peptides in the provided solution. But finally we are rather sure, that the peptides were created to form the preamble of J. J. Thomson’s book.

Discussion

While we could identify the sentence and most of the peptides after we finally had the hint and a database, the identification using current de-novo software was rather disappointing. Even a retrospective analysis of the data was rather inconclusive and did not show us all peptides. Though one thing was striking to the eye: the best de-novo identified peptides were “tryptic” peptides, meaning the ones ending with an R or K, most prominently the sequence WITHFAR. This was most probably due to the fact, that the algorithm tries to take a tryptic digestion in the background and needs the resulting b-ion as a starting point. This then further hints, that the algorithms are actually not performing bad in real life data, but only had a hard time with the provided synthetic peptides. Another difficult task, the protein inference [8,9] from de-novo identified peptides, could not be applied in this challenge, but would also be something worth analysing in more depth.

Conclusions

Overall, the task to identify 19 purified non-tryptic synthetic peptides was not as easy as it seemed to be. We needed some visual inspection of the data and some refinement of databases to confidently identify only three of the peptides, and slightly identify another three of them. Nevertheless, the task given by the YPIC was a very interesting one and no one of the authors had to try de-novo approaches before. Overall, we were happy to be able to find the hidden sentence in the peptides by only applying open source software and are looking forward to the next challenge.

8 in total

1. TANDEM: matching proteins with tandem mass spectra.

Authors: Robertson Craig; Ronald C Beavis
Journal: Bioinformatics Date: 2004-02-19 Impact factor: 6.937

2. pNovo: de novo peptide sequencing and identification using HCD spectra.

Authors: Hao Chi; Rui-Xiang Sun; Bing Yang; Chun-Qing Song; Le-Heng Wang; Chao Liu; Yan Fu; Zuo-Fei Yuan; Hai-Peng Wang; Si-Min He; Meng-Qiu Dong
Journal: J Proteome Res Date: 2010-05-07 Impact factor: 4.466

3. PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.

Authors: Julian Uszkoreit; Alexandra Maerkens; Yasset Perez-Riverol; Helmut E Meyer; Katrin Marcus; Christian Stephan; Oliver Kohlbacher; Martin Eisenacher
Journal: J Proteome Res Date: 2015-06-10 Impact factor: 4.466

4. Protein Inference Using PIA Workflows and PSI Standard File Formats.

Authors: Julian Uszkoreit; Yasset Perez-Riverol; Britta Eggers; Katrin Marcus; Martin Eisenacher
Journal: J Proteome Res Date: 2018-12-05 Impact factor: 4.466

5. A cross-platform toolkit for mass spectrometry and proteomics.

Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908

6. Low-bias phosphopeptide enrichment from scarce samples using plastic antibodies.

Authors: Jing Chen; Sudhirkumar Shinde; Markus-Hermann Koch; Martin Eisenacher; Sara Galozzi; Thilo Lerari; Katalin Barkovits; Prabal Subedi; Rejko Krüger; Katja Kuhlmann; Börje Sellergren; Stefan Helling; Katrin Marcus
Journal: Sci Rep Date: 2015-07-01 Impact factor: 4.379

7. Novor: real-time peptide de novo sequencing software.

Authors: Bin Ma
Journal: J Am Soc Mass Spectrom Date: 2015-06-30 Impact factor: 3.109

8. DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

Authors: Thilo Muth; Lisa Weilnböck; Erdmann Rapp; Christian G Huber; Lennart Martens; Marc Vaudel; Harald Barsnes
Journal: J Proteome Res Date: 2014-01-07 Impact factor: 4.466

8 in total