Literature DB >> 31890551

Math, science, history, unraveling the mystery-That all started with de novo!

Ekaterina Ilgisonis¹, Olga Kiseleva¹, Ksenia Kuznetsova¹.

Abstract

This work on solving the mystery of words encoded by amino acids in peptides was derived by the YPIC-EuPA Challenge. We received a dry synthetic peptide sample and performed a mass spectrometric analysis followed by de novo peptide sequencing. As a result, a part of "Rays of positive electricity and their application to chemical analyses" by J.J.Tomson was found to be encoded in the peptides of the sample. The words were first revealed from the peptides, that matched by Google search to find the answer. After that, the answer was validated using a standard proteomic search against a database constructed from the quotation found.

Entities: Chemical Species

Keywords: Challenge; De novosequencing; EuPA; Peptides; YPIC

Year: 2019 PMID： 31890551 PMCID： PMC6924286 DOI： 10.1016/j.euprot.2019.07.011

Source DB: PubMed Journal: EuPA Open Proteom ISSN： 2212-9685

Introduction

Since, by now, the variety and complexity of the human proteome has not been studied completely, proteome investigators often face the necessity to analyze mixtures with unknown contents. To our mind, the most efficient approach to this is de novo sequencing from high- resolution LC–MS/MS data. The YPIC-EuPA Challenge turned out to be a great chance not only to improve our practical deciphering skills, but also to refresh the history of mass spectrometry (Fig. 1).

Fig. 1

Abstract graph.

Materials and methods

The mixture consisting of 19 synthetic peptides obtained from the Challenge organizing team was analyzed using Dionex Ultimate 3000 (Thermo Fisher Scientific) connected to a Hybrid Ion Trap-Orbitrap Elite mass spectrometer (Thermo Fisher Scientific), equipped with a nanoelectrospray ion source (Thermo Scientific). Peptides were loaded onto the trap column Zorbax 300SB-C18 (C18 5 μm 0.3 mm inner diameter and 5 mm length, Agilent Technologies, USA) and washed for 5 min at a flow rate of 10 μl/min. Peptide separation was performed on a RP-HPLC Zorbax 300SB-C18 column (C18 3.5 μm 75 μm inner diameter and 150 mm length, Agilent Technologies, USA) using a linear gradient from 5% to 60% solvent B (0.1% formic acid, 80% acetonitrile) over 30 min at a flow rate of 0.4 μl/min. CID has been used as a fragmentation method. Both MS and MS/MS spectra have been obtained in an orbitrap analyzer. Resolution was set at 60,000 (m/z400) for MS and 15,000 (m/z400) for MS/MS scans. The mass spectra have been analyzed using the trial version of PEAKS (Bioinformatics solutions Inc.) [1] and SearchGUI [2] with the parameters described in the next section.

Results

For de novo sequencing we used the trial version of PEAKS, all the results were exported to a CSV-file and transferred into an MS Excel table (Supplementary 1). We changed all the identified PTMs, mentioned in the description of the Challenge: Methylation of R (R(+14.02)) → U Acetylation of K (K(+42.01)) → O Phosphorylation of S (S(+79.97)) → B After that we changed all Ls to Is, because it is hard to distinguish them using mass- spectrometry [3]. All obtained results were processed manually, because the results of de novo sequencing may be jumbled [4] and the human brain has an ability to read jumbled words [5]. Thus, we tried to identify real words in the variety of sequences of the identified peptides. All the decoded words are highlighted in the Supplementary 1 with green color. The discovered words were non specific, but google search revealed a J.J. Thomson citation (see Fig. 2) from the book “Rays of positive electricity and their application to chemical analyses”.

Fig. 2

The print screen of a Google search result revealing the quotation from J.J. Tomson. All the words identified by PEAKS are highlighted with blue color.

Validation

For validation of the found citation we have created a. fasta file (Supplementary 2), containing only 1 protein with sequence, equivalent to the phrase. We used SearchGUI for the search [2]. Using SearchGUI we have identified the most part of the text fragments. Nevertheless words “small amount”, “infinitesimal amount” have not been identified.

Instead of conclusion

We are grateful to the Challenge organizers for the chance to participate in such scientific riddle. And though we were given quite a lot of hints, like modifications and linguistic meaning of amino acids, this was not a piece of cake. Terrific task of highly complicated human proteome exploration remains a real challenge for lion-hearted.

Data

All the experimental data, de novo sequencing results and the fasta file used are available here at the Mendeley public repository under the title of this article.

5 in total

1. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry.

Authors: Bin Ma; Kaizhong Zhang; Christopher Hendrie; Chengzhi Liang; Ming Li; Amanda Doherty-Kirby; Gilles Lajoie
Journal: Rapid Commun Mass Spectrom Date: 2003 Impact factor: 2.419

2. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches.

Authors: Marc Vaudel; Harald Barsnes; Frode S Berven; Albert Sickmann; Lennart Martens
Journal: Proteomics Date: 2011-01-31 Impact factor: 3.984

3. Letter position coding across modalities: braille and sighted reading of sentences with jumbled words.

Authors: Manuel Perea; María Jiménez; Miguel Martín-Suesta; Pablo Gómez
Journal: Psychon Bull Rev Date: 2015-04

4. Distinguishing between Leucine and Isoleucine by Integrated LC-MS Analysis Using an Orbitrap Fusion Mass Spectrometer.

Authors: Yongsheng Xiao; Malgorzata M Vecchi; Dingyi Wen
Journal: Anal Chem Date: 2016-10-14 Impact factor: 6.986

5. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification.

Authors: Jing Zhang; Lei Xin; Baozhen Shan; Weiwu Chen; Mingjie Xie; Denis Yuen; Weiming Zhang; Zefeng Zhang; Gilles A Lajoie; Bin Ma
Journal: Mol Cell Proteomics Date: 2011-12-20 Impact factor: 5.911

5 in total