| Literature DB >> 25475148 |
Dave C H Lee1, Andrew R Jones, Simon J Hubbard.
Abstract
Analysis of the phosphoproteome by MS has become a key technology for the characterization of dynamic regulatory processes in the cell, since kinase and phosphatase action underlie many major biological functions. However, the addition of a phosphate group to a suitable side chain often confounds informatic analysis by generating product ion spectra that are more difficult to interpret (and consequently identify) relative to unmodified peptides. Collectively, these challenges have motivated bioinformaticians to create novel software tools and pipelines to assist in the identification of phosphopeptides in proteomic mixtures, and help pinpoint or "localize" the most likely site of modification in cases where there is ambiguity. Here we review the challenges to be met and the informatics solutions available to address them for phosphoproteomic analysis, as well as highlighting the difficulties associated with using them and the implications for data standards.Entities:
Keywords: Bioinformatics; Data processing and analysis; Phosphoproteomics; Technology
Mesh:
Substances:
Year: 2015 PMID: 25475148 PMCID: PMC4384807 DOI: 10.1002/pmic.201400372
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1Ambiguity in site assignment of phosphopeptides. The phosphopeptide above generates a product ion spectrum from which it is challenging to unambiguously determine the true site determining ions. In this particular case, two b ions highlighted in green boxes are consistent with serine at position 7 in the peptide being modified, or alternately, the threonine at position 9 could be modified yielding a characteristic y9 ion (green box, lower panel). Experts inspecting the spectrum were divided on which is the most likely interpretation. The possibility that both peptides were present is also not excluded, since they would have the same precursor ion m/z value (figure adapted from ABRF web site, http://www.abrf.org/index.cfm/group.show/ProteomicsInformaticsResearchGroup.53.htm).
Site localization algorithms
| Name | Class of algorithm | Core algorithm | Activation methods supported | Notes and availability | Interface | Prerequisites and/or dependencies | Report alternate sites | References |
|---|---|---|---|---|---|---|---|---|
| Ascore | PBL | Cumulative binomial probability | CID | First to implement site-determining ions | Commandline | Requires a pepXML containing PSM information and corresponding MS/MS in individual.dta formatted files | No | [ |
| SloMo/TuboSloMo | PBL | Cumulative Poisson distribution | CID, ETD | Was the first PBL available for ETD-derived data | Commandline | Requires a pepXML containing PSM information and corresponding MS/MS in individual.dta formatted files | Yes | [ |
| PhosphoRS | PBL | Cumulative binomial probability | CID, ETD, HCD | Has been built and tested on CID-, ETD-, and HCD-MS/MS | Commandline (v1.0) | Version 1.0 requires a custom XML format containing both PSM information and corresponding MS/MS. No specific search engine is preferred, as long as the necessary PSM details can be extracted | Yes (All) | [ |
| LuciPhor | PBL | Log odds | CID, HCD | First algorithm to implement FLR estimate | Commandline | Uses Transproteomic-pipeline (TPP) supported search engines (Mascot, X!Tandem, and SEQUEST/COMET), processed via xinteract to pepXML file. Available under Linux OS | Yes | [ |
| MaxQuant PTM score | PBL | Exact binomial probability | All | Also includes site occupancy when quantification information is available, scored based on equation used in Olsen et al. [ | GUI | None | Yes (All) | [ |
| Mascot Delta | SED | Score difference between first- and second-ranked isomers | All | Commandline | Mascot.dat files are required. Many groups have written code (including Mascot) to process.dat files | Yes (All) | [ | |
| ProteinProspector (SLIP) | SED | Score difference between first- and second-ranked isomers | All | Webserver | Requires (free) registration on ProteinProspector webserver | Yes (all) | [ | |
| PhosphoScore | Node cost delta between best and second-best candidates | Directed acyclic graph using intensity and mass deviations to weight nodes | CID | GUI or commandline | Specific to SEQUEST search engine and explicitly requires.OUT (SEQUEST results) and.dta (peaklists) | No | [ | |
| PhosphoScan | PBL | Cumulative binomial probability | CID | Available upon request to authors | GUI | Standalone (GUI) tool | Yes | [ |
| D-Score | SED | Difference between top- and second-ranked site isomers (posterior error probability (PEP)) | All | Standardized localization metric enabling comparison with other search engines. Proof-of-principle paper suggesting the applicability of universal deltas with PEP. No “tool” is currently available | NA | Computation of posterior error probabilities for first- and second-ranked hits required for delta PEP | Yes (all) | [ |
| PhosSA | DP (delta between first and second best site candidates) | Dynamic programming using sum intensity of matched site-determining ions to find best site candidates | CID, HCD | GUI | Compatible with SEQUEST, Mascot search engines and ProteomeDiscoverer | Yes (status assigned to all candidate peptides) | [ |
GUIs: graphical user interfaces.
At least second candidate sites are also provided. In principle, all candidate sites are reported by most PBL tools, usually up to and including a maximum of two sites per peptide.