Literature DB >> 29596608

SimPhospho: a software tool enabling confident phosphosite assignment.

Veronika Suni1,2, Tomi Suomi2, Tomoya Tsubosaka3, Susumu Y Imanishi3, Laura L Elo2, Garry L Corthals4.   

Abstract

Motivation: Mass spectrometry combined with enrichment strategies for phosphorylated peptides has been successfully employed for two decades to identify sites of phosphorylation. However, unambiguous phosphosite assignment is considered challenging. Given that site-specific phosphorylation events function as different molecular switches, validation of phosphorylation sites is of utmost importance. In our earlier study we developed a method based on simulated phosphopeptide spectral libraries, which enables highly sensitive and accurate phosphosite assignments. To promote more widespread use of this method, we here introduce a software implementation with improved usability and performance.
Results: We present SimPhospho, a fast and user-friendly tool for accurate simulation of phosphopeptide tandem mass spectra. Simulated phosphopeptide spectral libraries are used to validate and supplement database search results, with a goal to improve reliable phosphoproteome identification and reporting. The presented program can be easily used together with the Trans-Proteomic Pipeline and integrated in a phosphoproteomics data analysis workflow. Availability and implementation: SimPhospho is open source and it is available for Windows, Linux and Mac operating systems. The software and its user's manual with detailed description of data analysis as well as test data can be found at https://sourceforge.net/projects/simphospho/. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29596608      PMCID: PMC6061695          DOI: 10.1093/bioinformatics/bty151

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Protein phosphorylation is a post-translation modification, which plays a vital role in the regulation of many cellular processes including cell cycle, growth, apoptosis and signal transduction pathways. The leading technology to discover and confirm phosphorylation is tandem mass spectrometry. Due to ongoing developments in enrichment and separation techniques, faster scanning mass spectrometers and data analysis tools, it is now possible to identify tens of thousands of phosphopeptides. Despite its critical importance, phosphopeptide data analysis involves additional unmet challenges compared to analysis of unmodified peptides due to the more complex spectra that are harder to interpret (Zhang ). In addition to reliable identification of phosphorylated peptides and proteins, information on the exact phosphorylation sites is essential to understand the interaction and regulation of signaling pathways. Given that specific phosphorylation events function as molecular switches (Vaga ), accurate assignment of the precise site of phosphorylation is of utmost importance. One approach to identify phosphorylation sites involves searching a sequence database followed by analysis using designated localization tools (Beausoleil ; Olsen ; Taus ). Another employs searching a spectral library (Bodenmiller ; Hummel ; Suni ). While it offers better scoring compared to sequence database search scores (Lam ; Suni ), the downside of this approach has been the lack of readily-available and highly-accurate reference spectral libraries. Recently we reported a method for phosphosite validation that takes advantage of the sensitivity of spectral library searching by overcoming the lack of phosphopeptide libraries (Suni ). More specifically, our strategy builds libraries of simulated phosphopeptide spectra based on spectra of unmodified peptides. These simulated phosphopeptide spectra are then used as a reference in the spectral search of observed phosphopeptides. However, the prototype implementation of the simulation method was a command line application, available only for Windows, with no configuration option. Here, we present further development of our approach through a new tool called SimPhospho. It is the only publically available tool for simulation of higher-energy collisional dissociation (HCD) phosphopeptide spectra. Simulated phosphopeptide spectra by SimPhospho in combination with spectral library searching enable high accuracy and confident phosphosite validation in a comprehensive manner. It follows our previously described workflow but is superior to the prototype version in terms of usability, configuration options and performance. These improvements allow us to optimize various conditions of phosphopeptide simulation, as presented in this study.

2 Implementation

The SimPhospho software (Fig. 1A) was implemented in C++ using components of the Proteowizard project (Kessner ) and an XML library (Thomason) and includes a user interface based on the Qt framework (The Qt Company). Two XML files (Keller ) serve as an input to SimPhospho: (i) a .pep.xml file that contains the sequence database search results [e.g. Mascot (Matrix Science), X! Tandem (Craig and Beavis, 2004) or COMET (Eng )], validated by PeptideProphet (Keller ), and (ii) an .mzXML file that contains mass spectra. First, SimPhospho processes the .pep.xml file. For every peptide identification that contains serine, threonine or tyrosine residues in its sequence, singly phosphorylated peptide isoforms are created and theoretical masses of fragment ions to be phosphorylated are calculated. These masses are then searched in the corresponding spectrum in the .mzXML file and masses and intensities of the found ion peaks are modified for simulating a phosphorylation as described below. The program outputs two files: (i) a .pep.xml file that contains the sequences and modification sites of phosphopeptides, and (ii) an .mzXML file that contains the simulated spectra of the phosphopeptides.
Fig. 1.

(A) Screenshot of SimPhospho displaying the main features of software: simulation options, including ion intensity values, peak match precision, types of ions used for simulation, data filtering switch, as well as output statistics and progress bar. (B) Optimization of intensity values of simulated peaks. To determine the optimal default parameters for SimPhospho, we tested different ion intensity combinations for phosphoric acid neutral loss ions and for intact ions compared to original fragment ion intensities in spectra of nonphosphorylated peptides. We saw the largest number of correctly assigned phosphosites at <1% FLR achieved with a combination of 50 and 50% intensities for intact ions and neutral loss ions for pS and pT, and 50% intensities for intact ions for pY. Data for 100 and 10% (pS, pT) and 100% (pY) are given for reference, as this combination was chosen for the prototype program (Suni ). See Supplementary Figure S1 for other intensity combinations

(A) Screenshot of SimPhospho displaying the main features of software: simulation options, including ion intensity values, peak match precision, types of ions used for simulation, data filtering switch, as well as output statistics and progress bar. (B) Optimization of intensity values of simulated peaks. To determine the optimal default parameters for SimPhospho, we tested different ion intensity combinations for phosphoric acid neutral loss ions and for intact ions compared to original fragment ion intensities in spectra of nonphosphorylated peptides. We saw the largest number of correctly assigned phosphosites at <1% FLR achieved with a combination of 50 and 50% intensities for intact ions and neutral loss ions for pS and pT, and 50% intensities for intact ions for pY. Data for 100 and 10% (pS, pT) and 100% (pY) are given for reference, as this combination was chosen for the prototype program (Suni ). See Supplementary Figure S1 for other intensity combinations To demonstrate the performance of the SimPhospho program and to determine the optimal default parameters, we tested different ion intensity combinations for simulating phosphopeptide spectra. In the earlier paper we selected 100% of intensity for phosphoric acid neutral loss ions (e.g. y-H3PO4) and 10% for intact ions (e.g. y-ion) compared to original fragment ions for pSer and pThr, and 100% of intensity for intact ions for pTyr. For more details on the simulation rules, refer to (Imanishi ; Suni ). When testing the simulation criteria using SpectraST 5.0 on an HCD dataset of synthetic phosphopeptides with known phosphosites (Suni ), we now observed that the largest number of correctly assigned phosphosites at 1% false localization rate (FLR) was achieved with a combination of 50% intensities (Fig. 1B). Other tested combinations are shown in Supplementary Figure S1. Users can select which proteins, peptides or scans are used for simulation by adding a .filter text file that lists protein names, peptide sequences or scan numbers of interest. For instance, filtering by scan numbers can be useful when applying identification score cutoffs to the input data. In addition to activating a filter, the other options users can specify are ion types used for simulation (a-, b-, y-ions, ammonia and water losses) and intensities of simulated peaks for intact ions and phosphoric acid neutral loss ions (Fig. 1A). The run time is 50 times faster than when using our prototype program (Suni ): simulating 13 000 phosphopeptides from 4000 peptides takes under one minute on a workstation equipped with an Intel Core i5 CPU, 2.30 GHz, 16 GB RAM, Windows 7, 64-bit.

3 Results

SimPhospho is a fast and easy to use tool for simulation of phosphopeptide tandem mass spectra. The program output files can be used directly to build a spectral library using SpectraST (Lam ) as either stand-alone version or through Trans-Proteomic Pipeline (TPP) (Keller ). This simulated reference spectral library of phosphosites is suitable for phosphopeptide identification, and for validation of phosphopeptides and phosphosites identified by sequence database search programs. Simulated spectral libraries can be searched by stand-alone SpectraST, in TPP, or in Proteome Discoverer (Thermo Fisher Scientific) using SpectraST node. Visualization of the simulated spectra as well as the search results is possible via TPP viewer or in Proteome Discoverer. The updated version of the software has the following advantages compared to the prototype program: (i) graphical user interface in addition to command line options, (ii) faster data processing, (iii) improved simulation parameters, (iv) optional simulation features, (v) possibility to select a subset of input spectra or peptides to be used for simulation and (vi) cross-platform support. We anticipate that these improvements will facilitate further adoption of this phosphosite validation method, especially in the large scale studies, ultimately leading to fewer false-positive results in the public domain. SimPhospho is available for Windows, Linux and Mac operating systems at https://sourceforge.net/projects/simphospho/. The next version of the software is expected to support simulation of spectra of multiply phosphorylated peptides. Click here for additional data file.
  15 in total

1.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors:  Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal:  Anal Chem       Date:  2002-10-15       Impact factor: 6.986

2.  TANDEM: matching proteins with tandem mass spectra.

Authors:  Robertson Craig; Ronald C Beavis
Journal:  Bioinformatics       Date:  2004-02-19       Impact factor: 6.937

3.  Universal and confident phosphorylation site localization using phosphoRS.

Authors:  Thomas Taus; Thomas Köcher; Peter Pichler; Carmen Paschke; Andreas Schmidt; Christoph Henrich; Karl Mechtler
Journal:  J Proteome Res       Date:  2011-11-10       Impact factor: 4.466

4.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization.

Authors:  Sean A Beausoleil; Judit Villén; Scott A Gerber; John Rush; Steven P Gygi
Journal:  Nat Biotechnol       Date:  2006-09-10       Impact factor: 54.908

5.  Confident site localization using a simulated phosphopeptide spectral library.

Authors:  Veronika Suni; Susumu Y Imanishi; Alessio Maiolica; Ruedi Aebersold; Garry L Corthals
Journal:  J Proteome Res       Date:  2015-03-27       Impact factor: 4.466

Review 6.  Protein analysis by shotgun/bottom-up proteomics.

Authors:  Yaoyang Zhang; Bryan R Fonslow; Bing Shan; Moon-Chang Baek; John R Yates
Journal:  Chem Rev       Date:  2013-02-26       Impact factor: 60.622

7.  Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.

Authors:  Jesper V Olsen; Blagoy Blagoev; Florian Gnad; Boris Macek; Chanchal Kumar; Peter Mortensen; Matthias Mann
Journal:  Cell       Date:  2006-11-03       Impact factor: 41.582

8.  Reference-facilitated phosphoproteomics: fast and reliable phosphopeptide validation by microLC-ESI-Q-TOF MS/MS.

Authors:  Susumu Y Imanishi; Vitaly Kochin; Saima E Ferraris; Aurélie de Thonel; Hanna-Mari Pallari; Garry L Corthals; John E Eriksson
Journal:  Mol Cell Proteomics       Date:  2007-05-17       Impact factor: 5.911

9.  ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites.

Authors:  Jan Hummel; Michaela Niemann; Stefanie Wienkoop; Waltraud Schulze; Dirk Steinhauser; Joachim Selbig; Dirk Walther; Wolfram Weckwerth
Journal:  BMC Bioinformatics       Date:  2007-06-23       Impact factor: 3.169

10.  Phosphoproteomic analyses reveal novel cross-modulation mechanisms between two signaling pathways in yeast.

Authors:  Stefania Vaga; Marti Bernardo-Faura; Thomas Cokelaer; Alessio Maiolica; Christopher A Barnes; Ludovic C Gillet; Björn Hegemann; Frank van Drogen; Hoda Sharifian; Edda Klipp; Matthias Peter; Julio Saez-Rodriguez; Ruedi Aebersold
Journal:  Mol Syst Biol       Date:  2014-12-09       Impact factor: 11.429

View more
  1 in total

1.  Optimization of TripleTOF spectral simulation and library searching for confident localization of phosphorylation sites.

Authors:  Ayano Takai; Tomoya Tsubosaka; Yasuhiro Hirano; Naoki Hayakawa; Fumitaka Tani; Pekka Haapaniemi; Veronika Suni; Susumu Y Imanishi
Journal:  PLoS One       Date:  2019-12-02       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.