| Literature DB >> 20651112 |
David M A Martin1, Isabelle R E Nett, Franck Vandermoere, Jonathan D Barber, Nicholas A Morrice, Michael A J Ferguson.
Abstract
MOTIVATION: Complex patterns of protein phosphorylation mediate many cellular processes. Tandem mass spectrometry (MS/MS) is a powerful tool for identifying these post-translational modifications. In high-throughput experiments, mass spectrometry database search engines, such as MASCOT provide a ranked list of peptide identifications based on hundreds of thousands of MS/MS spectra obtained in a mass spectrometry experiment. These search results are not in themselves sufficient for confident assignment of phosphorylation sites as identification of characteristic mass differences requires time-consuming manual assessment of the spectra by an experienced analyst. The time required for manual assessment has previously rendered high-throughput confident assignment of phosphorylation sites challenging.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20651112 PMCID: PMC2922888 DOI: 10.1093/bioinformatics/btq341
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.An example of misidentification of the correct phosphorylation site. MASCOT identifies the phosphorylation site as pS16 (peptide score 118), though the expected y5-98 ion is much weaker than the weak y5 ion (blue). The second ranked hit (pY17, score 102) is preferred by the experienced analyst with a strong y5 ion match (red) giving a continuous y-ion ladder. The spectrum was annotated with Prophossi and modified. The threshold for ion inclusion is indicated by a blue bar on the y-axis.
Analysis Criteria for automated validation of PSMs
| Prefilter criteria | ||
| P1 | Forward hit | Only hits against forward, non-redundant sequences are selected |
| P2 | Mass accuracy | Only hits within 0.1 Da (or 0.1 + 1 Da) of the parent ion |
| P3 | Phospho-PTM | Only hits containing a putative Phospho PTM are selected |
| P4 | Within 20 points | Only hits which are within 20 MASCOT score points of the top ranked hit for that query are selected. |
| P5 | Over FDR threshold | Only peptides with a MASCOT score over the calculated FDR 1% threshold are selected. |
| Validation Criteria—Phosphopeptide assignment | ||
| 1 | 4 in a row | At least four sequential y- or b-series ions are present. This indicates good coverage of the peptide. |
| 2 | 5 of 6 | 5 out of 6 sequential b- or y-series ions are present. This indicates good coverage of the peptide. |
| 3 | 3 desphospho ions | At least three y- or b-series ions with a phosphate loss are present. The phosphate ester bond tends to be more labile than the peptide bond. |
| 4 | Proline-directed fragmentation | The imino bond to the N-terminal side of a proline residue is particularly labile. If the sequence contains a Proline residue then at least one of the imino bonds should give a fragment ion with at least 50% maximum intensity and much stronger than the relatively weakly cleaved amide bond C-terminal to Proline. |
| 5 | 6 of top 10 ions | 6 of the 10 most intense ions should be assigned to y- or b-series ions. |
| Validation criteria—phosphosite assignment | ||
| 6 | Phosphate transitions | To assign the site specifically, at least one ion unique to that peptide species must be observed. This is aided by the high rate of phosphate loss from pSer and pThr residues. |
| 7 | PhosphoTyrosine | Mass differences corresponding to pTyr should be observed between identified peaks. |
Fig. 2.Workflow for automated annotation of phosphosites. Experimental LC–MS/MS data is gathered (1) and processed using platform specific software (2) to give a generic peak list file (3). This file, is used as the input to MASCOT (4), which generates a results file (5) containing all the PSMs. This file is parsed into the MLRV relational database (6) and the FDR for the search determined (7). A PSM-quality prefilter is applied (8) and suitable PSMs are exported to the TryPP-DB (9) where they are linked to the source peak list file used for the search. The observed MS/MS spectrum is extracted from the peak list file (10), filtered by an intensity threshold (11) and compared with a calculated fragmentation spectrum (12) for the peptide under examination. Observed ions are assigned to series (13) allowing the curation rules to be applied (14).
Kinase dataset statistics
| Site Observations | Unique Sites | Unique Proteins | Peptides Observed | |
|---|---|---|---|---|
| All peptides | 643 | 259 | 74 | 501 |
| Rank 1 and 2 | 449 | 213 | 72 | 355 |
| Rank 1 | 289 | 159 | 72 | 230 |
The subset of peptides from both Orbitrap and Q-Star experiments that were annotated both manually and by the automated system are described. An observed peptide is a single PSM. A PSM may contain more than one observed site. Each protein may contain many PSMs. Each site may be observed in many peptide observations.
Fig. 3.All manually curated peptide–spectrum matches containing at least one phosphorylated residue were ordered according to their MASCOT (red), SEQUEST (black) or SEQUEST + Peptide Prophet (green) score. Dotted lines indicate the performance of all matches ordered by search engine score. Solid lines indicate the performance of the subset of matches positively curated by ProPhosSI. The increased area under the curve for the solid lines indicates better performance by ProPhosSI.
Automated assignments
| Dataset | Phosphopeptides | Phosphosites | ||
|---|---|---|---|---|
| Pass | Fail | Pass | Fail | |
| Orbitrap | 1617 | 1992 | 557 | 252 |
| Q-Star | 2101 | 2521 | 939 | 456 |
Automated assignment to the dataset by the methodology described. A single verified site observation at any peptide rank which met the appropriate quality criteria was considered sufficient to call as a phosphosite.
Automated versus manual peptide and phosphosite assignments
| Automated Curation with ProPhosSI | ||||
|---|---|---|---|---|
| Phosphopeptides | Phosphosites | |||
| Pass | Fail | Pass | Fail | |
| Orbitrap (all PSM) | ||||
| Manual Curation | ||||
| Pass | 60 | 17 | 41 | 5 |
| Fail | 12 | 79 | 6 | 13 |
| Q-Star (all PSM) | ||||
| Manual Curation | ||||
| Pass | 101 | 52 | 69 | 17 |
| Fail | 19 | 161 | 17 | 33 |
| Orbitrap (Rank 1 PSM) | ||||
| Manual Curation | ||||
| Pass | 32 | 5 | 31 | 3 |
| Fail | 6 | 37 | 2 | 2 |
| Q-star (Rank 1 PSM) | ||||
| Manual curation | ||||
| Pass | 53 | 30 | 53 | 15 |
| Fail | 9 | 58 | 2 | 2 |
The results from independent manual curation were compared with results from the automated validation. Each individual PSM and site observation is considered for each experiment and additionally for the subset of data that only includes top ranked MASCOT PSMs.
Aggregated phosphosite assignments (by site)
| Automated assignment | ||
|---|---|---|
| Pass | Fail | |
| All sites (total 129) | ||
| Manual assignment | ||
| Pass | 80 | 11 |
| Fail | 14 | 24 |
| Sites with two or more positive automatic assignments (total 60 sites versus 129) | ||
| Manual assignment | ||
| Pass | 52 | 39 |
| Fail | 8 | 30 |
| Sites with Rank 2 peptides or higher (109 sites) | ||
| Manual assignment | ||
| Pass | 72 | 13 |
| Fail | 8 | 14 |
| Sites with Rank 1 peptides (80 sites) | ||
| Manual assignment | ||
| Pass | 62 | 14 |
| Fail | 3 | 1 |
Each unique site is considered, taking all observations of that site. If any observation of a phosphosite is validated in any experiment then that site is called as validated.
Fig. 4.A manually verified PSM that ProPhosSI fails to validate. Many ion labels are not shown for clarity. Evidence for phosphorylation at S1 arises from the b2 ion (a). ProPhosSI requires ion transitions over a phosphosite and so requires more than one ion. Evidence for phosphorylation at S4 arises from the uniquely assigned des-phospho y10 [2+] ion. ProPhosSI does not consider 2+ ions as they can in many cases be assigned to more than one fragment.