| Literature DB >> 25653679 |
Alessandra Tiengo1, Lorenzo Pasotti1, Nicola Barbarini1, Paolo Magni1.
Abstract
Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of "molecular switch." The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites.Entities:
Year: 2015 PMID: 25653679 PMCID: PMC4309027 DOI: 10.1155/2015/382869
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Summary of PhosphoHunter procedure. Block A (implemented via the create_database.pl script): a database in FASTA format is used to create a target database according to appropriate digestion rules and other parameters provided in an input file, such as the number of allowed consecutive missing cleavages. The database is then used to obtain a decoy database and a single composite database is obtained from target and decoy databases. Block B (implemented via the merge.pl script): individual dta files, corresponding to experimental spectra, are merged into a single dta file. Block C: experimental spectra are normalized and processed by discarding charges higher than 4, low-intensity peaks, and peptides not showing neutral loss. The intensity threshold of neutral loss is specified in the input file. Block D: theoretical and processed spectra are compared according to a scoring function and a list of phosphopeptides with scores is associated with each spectrum. Block E: for each spectrum, a p-value is computed for each element of the list and only the peptides with a p-value below a specific threshold, defined in the input file (relation not shown in the figure), are kept in the final list. Blocks C, D, and E are all implemented via the phosphopeptide_ID.pl script.
Number of MS/MS spectra selected through the analysis of the neutral loss peaks.
| Intensity threshold | Selected MS/MS spectra | |||
|---|---|---|---|---|
| Phosphorylated | Nonphosphorylated | Total | % | |
| 0 | 750 | 488 | 1238 | 60.6 |
| 10 | 727 | 239 | 966 | 75.3 |
| 20 | 703 | 147 | 850 | 82.7 |
| 30 | 664 | 98 | 762 | 87.1 |
| 40 | 634 | 70 | 704 | 90.1 |
| 50 | 603 | 49 | 652 | 92.5 |
| 60 | 579 | 34 | 613 | 94.5 |
| 70 | 548 | 24 | 572 | 95.8 |
| 80 | 520 | 19 | 539 | 96.5 |
| 90 | 493 | 14 | 507 | 97.2 |
| 100 | 464 | 11 | 475 | 97.7 |
Intensity threshold values (from 0 to 100) and number of MS/MS spectra which pass the neutral loss check, subdivided by phosphorylated and nonphosphorylated peptides. Percent of phosphorylated peptides over the total number of selected spectra.
Performance of PhosphoHunter in terms of phosphorylated spectra detection (1500 MS/MS spectra analyzed: 750 phosphorylated peptides + 750 nonphosphorylated peptides) without statistical validation.
| Intensity threshold | TP | FP | FN | TN | Sens | Spec | Acc | FDR | Prec |
|
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 750 | 488 | 0 | 262 | 1.000 | 0.349 | 0.675 | 0.394 | 0.606 | 0.755 |
| 10 | 727 | 239 | 23 | 511 | 0.969 | 0.681 | 0.825 | 0.247 | 0.753 | 0.847 |
| 20 | 703 | 147 | 47 | 603 | 0.937 | 0.804 | 0.871 | 0.173 | 0.827 | 0.879 |
| 30 | 664 | 98 | 86 | 652 | 0.885 | 0.869 | 0.877 | 0.129 | 0.871 | 0.878 |
| 40 | 634 | 70 | 116 | 680 | 0.845 | 0.907 | 0.876 | 0.099 | 0.901 | 0.872 |
| 50 | 603 | 49 | 147 | 701 | 0.804 | 0.935 | 0.869 | 0.075 | 0.925 | 0.860 |
| 60 | 579 | 34 | 171 | 716 | 0.772 | 0.955 | 0.863 | 0.055 | 0.945 | 0.850 |
| 70 | 548 | 24 | 202 | 726 | 0.731 | 0.968 | 0.849 | 0.042 | 0.958 | 0.829 |
| 80 | 520 | 19 | 230 | 731 | 0.693 | 0.975 | 0.834 | 0.035 | 0.965 | 0.807 |
| 90 | 493 | 14 | 257 | 736 | 0.657 | 0.981 | 0.819 | 0.028 | 0.972 | 0.784 |
| 100 | 464 | 11 | 286 | 739 | 0.619 | 0.985 | 0.802 | 0.023 | 0.977 | 0.758 |
TP: true positive, spectra of phosphorylated peptides passing the neutral loss check and for which at least a hit was found; FP: false positive, spectra of nonphosphorylated peptides passing the neutral loss check and for which at least a hit was found; FN: false negative; TN: true negative; Sens: sensitivity or recall, TP/(TP + FN); Spec: specificity, TN/(TN + FP); Acc: accuracy, (TP + TN)/all spectra; FDR: false discovery rate, FP/(TP + FP); Prec: precision, TP/(TP + FP); F : F-measure, 2 ∗ precision ∗ recall/(precision + recall).
Performance of PhosphoHunter in terms of phosphorylated spectra detection (1500 MS/MS spectra analyzed, 750 phosphorylated peptides + 750 nonphosphorylated peptides) with the statistical validation.
| Intensity threshold | TP | FP | FN | TN | Sens | Spec | Acc | FDR | Prec |
|
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 743 | 240 | 7 | 510 | 0.991 | 0.680 | 0.835 | 0.244 | 0.756 | 0.857 |
| 10 | 721 | 129 | 29 | 621 | 0.961 | 0.828 | 0.895 | 0.152 | 0.848 | 0.901 |
| 20 | 697 | 77 | 53 | 673 | 0.929 | 0.897 | 0.913 | 0.099 | 0.901 | 0.915 |
| 30 | 659 | 50 | 91 | 700 | 0.879 | 0.933 | 0.906 | 0.071 | 0.929 | 0.903 |
| 40 | 630 | 35 | 120 | 715 | 0.840 | 0.953 | 0.897 | 0.053 | 0.947 | 0.890 |
| 50 | 599 | 23 | 151 | 727 | 0.799 | 0.969 | 0.884 | 0.037 | 0.963 | 0.873 |
| 60 | 575 | 18 | 175 | 732 | 0.767 | 0.976 | 0.871 | 0.030 | 0.970 | 0.856 |
| 70 | 544 | 12 | 206 | 738 | 0.725 | 0.984 | 0.855 | 0.022 | 0.978 | 0.833 |
| 80 | 516 | 9 | 234 | 741 | 0.688 | 0.988 | 0.838 | 0.017 | 0.983 | 0.809 |
| 90 | 489 | 6 | 261 | 744 | 0.652 | 0.992 | 0.822 | 0.012 | 0.988 | 0.786 |
| 100 | 460 | 3 | 290 | 747 | 0.613 | 0.996 | 0.805 | 0.006 | 0.994 | 0.758 |
TP: true positive, spectra of phosphorylated peptides passing the neutral loss check and for which at least a hit was found; FP: false positive, spectra of nonphosphorylated peptides passing the neutral loss check and for which at least a hit was found; FN, false negative; TN: true negative; Sens: sensitivity or recall, TP/(TP + FN); Spec: specificity, TN/(TN + FP); Acc: accuracy, (TP + TN)/all spectra; FDR: false discovery rate, FP/(TP + FP); Prec: precision, TP/(TP + FP); F : F-measure, 2 ∗ precision ∗ recall/(precision + recall).
Performance of PhosphoHunter in terms of sequence detection and phosphorylation sites localization.
| Intensity threshold | Sequences identified | Sequences identified | ||
|---|---|---|---|---|
| (all the hits) | (hits with | |||
| Correct sequences | Correct sites | Correct sequences | Correct sites | |
| 0 | 750 | 678 | 743 | 671 |
| 10 | 727 | 658 | 721 | 652 |
| 20 | 703 | 636 | 697 | 630 |
| 30 | 664 | 598 | 659 | 593 |
| 40 | 634 | 572 | 630 | 568 |
| 50 | 603 | 544 | 599 | 540 |
| 60 | 579 | 523 | 575 | 519 |
| 70 | 548 | 499 | 544 | 495 |
| 80 | 520 | 477 | 516 | 473 |
| 90 | 493 | 452 | 489 | 448 |
| 100 | 464 | 425 | 460 | 421 |
The 750 MS/MS spectra of phosphorylated peptides were considered and the intensity threshold was tuned from 0 to 100.
Performance of Mascot and Inspect in terms of phosphorylated spectra detection (1500 MS/MS spectra analyzed, 750 phosphorylated peptides + 750 nonphosphorylated peptides).
| Tool | TP | FP | FN | TN | Sens | Spec | Acc | FDR | Prec |
|
|---|---|---|---|---|---|---|---|---|---|---|
| Mascot | 552 | 0 | 198 | 750 | 0.736 | 1 | 0.868 | 0 | 1 | 0.848 |
| Inspect | 650 | 15 | 100 | 735 | 0.867 | 0.98 | 0.923 | 0.023 | 0.977 | 0.919 |
TP: true positive, spectra of phosphorylated peptides in which the first hit (or the majority) of the hit list is phosphorylated; FP: false positive, spectra of nonphosphorylated peptides in which the first hit (or the majority) of the hit list is phosphorylated; FN: false negative; TN: true negative; Sens: sensitivity or recall, TP/(TP + FN); Spec: specificity, TN/(TN + FP); Acc: accuracy, (TP + TN)/all spectra; FDR: false discovery rate, FP/(TP + FP); Prec: precision, TP/(TP + FP); F : F-measure, 2 ∗ precision ∗ recall/(precision + recall).
Performance of Mascot and Inspect in terms of sequence detection and phosphorylation sites localization.
| Tool | Correct sequence | Correct sites |
|---|---|---|
| Mascot | 552 | 513 |
| Inspect | 646 | 615 |
The 750 MS/MS spectra of phosphorylated peptides are considered.