| Literature DB >> 25329667 |
Malik N Akhtar1, Bruce R Southey1, Per E Andrén2, Jonathan V Sweedler3, Sandra L Rodriguez-Zas4.
Abstract
In support of accurate neuropeptide identification in mass spectrometry experiments, novel Monte Carlo permutation testing was used to compute significance values. Testing was based on k-permuted decoy databases, where k denotes the number of permutations. These databases were integrated with a range of peptide identification indicators from three popular open-source database search software (OMSSA, Crux, and X! Tandem) to assess the statistical significance of neuropeptide spectra matches. Significance p-values were computed as the fraction of the sequences in the database with match indicator value better than or equal to the true target spectra. When applied to a test-bed of all known manually annotated mouse neuropeptides, permutation tests with k-permuted decoy databases identified up to 100% of the neuropeptides at p-value < 10(-5). The permutation test p-values using hyperscore (X! Tandem), E-value (OMSSA) and Sp score (Crux) match indicators outperformed all other match indicators. The robust performance to detect peptides of the intuitive indicator "number of matched ions between the experimental and theoretical spectra" highlights the importance of considering this indicator when the p-value was borderline significant. Our findings suggest permutation decoy databases of size 1×105 are adequate to accurately detect neuropeptides and this can be exploited to increase the speed of the search. The straightforward Monte Carlo permutation testing (comparable to a zero order Markov model) can be easily combined with existing peptide identification software to enable accurate and effective neuropeptide detection. The source code is available at http://stagbeetle.animal.uiuc.edu/pepshop/MSMSpermutationtesting.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25329667 PMCID: PMC4201571 DOI: 10.1371/journal.pone.0111112
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of neuropeptides length in target database peptides (less than 60 amino acid in length are shown), 103 studied peptides, and 236 peptides that fall within ±12 Da of the 103 peptides.
Figure 2Frequency (number) of spectra with 1 to 10 homeometric matches for K106 k-permuted decoy databases across the three database search programs (X! Tandem, OMSSA, and Crux).
Peptide detection significance levels using ideal simulated spectra of the 103 peptides with and without any post-translational modifications (PTMs) and all b- and y-ions including neutral mass losses against a standard target database across database search programs (OMSSA, X! Tandem, and Crux).
| Program | PTMs | Significance | P≤10−2b | ||||||
| 0 | 1 | 2 | 3 | 4 | 5 | ≥6 | |||
| X! Tandem | None | 0 | 0 | 4 | 4 | 2 | 6 | 58 | 74 |
| Amidation | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 9 | |
| Oxidation | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Pyroglutamination | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 4 | |
| Phosphorylation | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 4 | |
| N-terminal acetylation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| OMSSA | None | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 79 |
| Amidation | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 9 | |
| Oxidation | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Pyroglutamination | 0 | 0 | 1 | 2 | 1 | 0 | 0 | 4 | |
| Phosphorylation | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | |
| N-terminal acetylation | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 5 | |
| Crux | None | 2 | 5 | 12 | 52 | 3 | 1 | 2 | 70 |
| Amidation | 0 | 0 | 2 | 3 | 2 | 0 | 2 | 9 | |
| Oxidation | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | |
| Pyroglutamination | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 3 | |
| Phosphorylation | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 4 | |
| N-terminal acetylation | 0 | 0 | 0 | 3 | 1 | 0 | 1 | 5 | |
Significance threshold (t) for matched to be considered significant at an E-value or p-value < 1×10−t (t = 0 to > = 6).
Cumulative number of peptides with an E-value or p-value < 1×10−2.
Peptide detection significance levels using experimental spectra of the 103 peptides with and without any post-translational modifications (PTMs) against a standard target database across database search programs (OMSSA, X! Tandem, and Crux).
| Program | PTMs | Significance | Cum N | ||||||||
| Miss | Inc | 0 | 1 | 2 | 3 | 4 | 5 | ≥6 | P≤10−2 | ||
| X! Tandem | None | 0 | 0 | 1 | 8 | 11 | 15 | 16 | 11 | 18 | 71 |
| Amidation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 9 | |
| Oxidation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Pyroglutamination | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 4 | |
| Phosphorylation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 4 | |
| N-terminal acetylation | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| OMSSA | None | 0 | 0 | 0 | 0 | 1 | 2 | 1 | 3 | 73 | 80 |
| Amidation | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 6 | 7 | |
| Oxidation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Pyroglutamination | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 3 | 4 | |
| Phosphorylation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | |
| N-terminal acetylation | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | |
| Crux | None | 0 | 0 | 9 | 8 | 9 | 44 | 1 | 0 | 9 | 63 |
| Amidation | 0 | 0 | 0 | 1 | 5 | 1 | 1 | 0 | 1 | 8 | |
| Oxidation | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | |
| Pyroglutamination | 0 | 0 | 1 | 1 | 0 | 2 | 0 | 0 | 0 | 2 | |
| Phosphorylation | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 2 | |
| N-terminal acetylation | 0 | 0 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 4 | |
Significance threshold (t) for matched to be considered significant at an E-value or p-value < 1×10−t (t = 0 to > = 6).
Cumulative number of peptides with an E-value or p-value < 1×10−2.
Number of peptides missed by program.
Number of peptides with incorrect post-translational modification assignment.
Performance of the target and alternative k-permuted decoy databases used with the X! Tandem database search program using spectra from 80 unmodified neuropeptides.
| Database | Indicator | Significance Levels of the Permutation | Cum. Num. of Peptides | |||||||
| 0 | 1 | 2 | 3 | 4 | 5 | ≥6 | ≥10−2 | ≥10−4 | ||
| Target |
| 1 | 8 | 11 | 15 | 16 | 12 | 17 | 71 | 45 |
| K103 | # ions | 0 | 0 | 76 | 4 | 0 | 0 | 0 | 80 | 0 |
| Hyperscore | 0 | 0 | 76 | 4 | 0 | 0 | 0 | 80 | 0 | |
| Convolution | 0 | 8 | 70 | 2 | 0 | 0 | 0 | 72 | 0 | |
|
| 0 | 0 | 76 | 4 | 0 | 0 | 0 | 80 | 0 | |
| K104 | # ions | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 80 | 0 |
| Hyperscore | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 80 | 0 | |
| Convolution | 0 | 5 | 44 | 31 | 0 | 0 | 0 | 75 | 0 | |
|
| 0 | 0 | 0 | 80 | 0 | 0 | 0 | 80 | 0 | |
| K105 | # ions | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 |
| Hyperscore | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 | |
| Convolution | 0 | 3 | 36 | 32 | 9 | 0 | 0 | 77 | 9 | |
|
| 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 | |
| K106 | # ions | 0 | 0 | 0 | 0 | 1 | 79 | 0 | 80 | 80 |
| Hyperscore | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 80 | 80 | |
| Convolution | 0 | 4 | 30 | 36 | 5 | 5 | 0 | 76 | 10 | |
|
| 0 | 0 | 0 | 0 | 0 | 80 | 0 | 80 | 80 | |
Target: database of 236 neuropeptide sequences; K103: k-permuted decoy database size of 236,000 peptides; K104: k-permuted decoy database size = 2,360,000 peptides; K105: k-permuted decoy database size = 23,600,000 peptides; K106: k-permuted decoy database size = 236,000,000 peptides.
Significance threshold (t) for target spectrum to be considered significant at significance thresholds <1×10−t (t = 0 to > = 6).
The cumulative number of peptides at 1×10−2 and 1×10−4 thresholds.
Performance of the target and alternative k-permuted decoy databases used with the Crux database search program using spectra from 80 unmodified neuropeptides.
| Database | Indicator | Significance Levels of the Permutation | Cum. Num. of peptides | |||||||
|
|
|
|
|
|
|
|
|
| ||
| Target |
| 9 | 8 | 9 | 44 | 1 | 0 | 9 | 63 | 10 |
| K103 | # ions | 0 | 2 | 78 | 0 | 0 | 0 | 0 | 78 | 0 |
| XCorr | 3 | 11 | 66 | 0 | 0 | 0 | 0 | 66 | 0 | |
| Sp | 0 | 2 | 78 | 0 | 0 | 0 | 0 | 78 | 0 | |
| ΔCn | 3 | 11 | 66 | 0 | 0 | 0 | 0 | 66 | 0 | |
| K104 | # ions | 0 | 0 | 1 | 79 | 0 | 0 | 0 | 80 | 0 |
| XCorr | 3 | 10 | 14 | 53 | 0 | 0 | 0 | 67 | 0 | |
| Sp | 0 | 0 | 1 | 79 | 0 | 0 | 0 | 80 | 0 | |
| ΔCn | 3 | 10 | 14 | 53 | 0 | 0 | 0 | 67 | 0 | |
| K105 | # ions | 0 | 0 | 0 | 1 | 79 | 0 | 0 | 80 | 79 |
| XCorr | 3 | 10 | 8 | 23 | 36 | 0 | 0 | 67 | 36 | |
| Sp | 0 | 0 | 0 | 1 | 79 | 0 | 0 | 80 | 79 | |
| ΔCn | 3 | 10 | 8 | 23 | 36 | 0 | 0 | 67 | 36 | |
| K106 | # ions | 0 | 0 | 0 | 0 | 2 | 78 | 0 | 80 | 80 |
| XCorr | 3 | 10 | 9 | 19 | 22 | 17 | 0 | 67 | 39 | |
| Sp | 0 | 0 | 0 | 0 | 4 | 76 | 0 | 80 | 80 | |
| ΔCn | 3 | 10 | 9 | 19 | 22 | 17 | 0 | 67 | 39 | |
Target: database of 236 neuropeptide sequences; K103: k-permuted decoy database size of 236,000 peptides; K104: k-permuted decoy database size = 2,360,000 peptides; K105: k-permuted decoy database size = 23,600,000 peptides; K106: k-permuted decoy database size = 236,000,000 peptides.
# ions: permutation p-values computed for the number of matched b- and y-ions. XCorr: permutation p-values computed from the XCorr scores of the matches. Sp: permutation p-values computed from the Sp scores of the matches. ΔCn: permutation p-values computed using X! Tandem ΔCn.
Significance threshold (t) for matched to be considered significant at p-value<1×10−t.
Cumulative number of peptides with p-values thresholds of 1×10−2 and 1×10−4.
Performance of the target alternative k-permuted decoy databases used with the OMSSA database search program using spectra from 80 unmodified neuropeptides.
| Database | Indicator | Significance Levels of the Permutation | Cum. Num. of Peptides | |||||||
| 0 | 1 | 2 | 3 | 4 | 5 | ≥6 | ≥10−2 | ≥10−4 | ||
| Target |
| 0 | 0 | 1 | 2 | 1 | 3 | 73 | 80 | 77 |
| K103 | # ions | 0 | 2 | 78 | 0 | 0 | 0 | 0 | 78 | 0 |
| Lambda | 0 | 9 | 71 | 0 | 0 | 0 | 0 | 71 | 0 | |
|
| 0 | 2 | 78 | 0 | 0 | 0 | 0 | 78 | 0 | |
|
| 0 | 2 | 78 | 0 | 0 | 0 | 0 | 78 | 0 | |
| K104 | # ions | 0 | 0 | 1 | 79 | 0 | 0 | 0 | 80 | 0 |
| Lambda | 0 | 5 | 11 | 64 | 0 | 0 | 0 | 75 | 0 | |
|
| 0 | 0 | 1 | 79 | 0 | 0 | 0 | 80 | 0 | |
|
| 0 | 0 | 1 | 79 | 0 | 0 | 0 | 80 | 0 | |
| K105 | # ions | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 |
| Lambda | 0 | 5 | 8 | 24 | 43 | 0 | 0 | 75 | 43 | |
|
| 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 | |
|
| 0 | 0 | 0 | 0 | 80 | 0 | 0 | 80 | 80 | |
| K106 | # ions | 0 | 0 | 0 | 0 | 2 | 78 | 0 | 80 | 80 |
| Lambda | 0 | 5 | 8 | 17 | 18 | 32 | 0 | 75 | 50 | |
|
| 0 | 0 | 0 | 0 | 0 | 80 | 0 | 80 | 80 | |
|
| 0 | 0 | 0 | 0 | 0 | 80 | 0 | 80 | 80 | |
Target: database of 236 neuropeptide sequences; K103: k-permuted decoy database size of 236,000 peptides; K104: k-permuted decoy database size = 2,360,000 peptides; K105: k-permuted decoy database size = 23,600,000 peptides; K106: k-permuted decoy database size = 236,000,000 peptides.
# ions: permutation p-values computed for the number of matched b- and y-ions. Lambda: permutation p-values computed from the Poisson mean of matches. p-value: permutation p-values computed from the p-value reported by the OMSSA for the matches. E-value: permutation p-values computed using OMSSA E-values.
Significance threshold (t) for matched to be considered significant at p-value<1×10−t.
Incorrect: the program provided an incorrect match.
Cumulative number of peptides with p-value<1×10−2.
Cumulative number of peptides with p-value<1×10−4.
Computation times in seconds for search of 80 unmodified spectra against different databases using a single process Intel Core i7-3770 CPU @ 3.40 GHz.
| Database | Database Search Program | ||
| Crux | OMSSA | X! Tandem | |
| Target | 5 | 11 | 1 |
| K103 | 7 | 56 | 41 |
| K104 | 61 | 915 | 476 |
| K105 | 200 | 1220 | 467 |
| K106 | 2162 | 24475 | 5196 |
Target: database of 236 neuropeptide sequences; K103: k-permuted decoy database size of 236,000 peptides; K104: k-permuted decoy database size = 2,360,000 peptides; K105: k-permuted decoy database size = 23,600,000 peptides; K106: k-permuted decoy database size = 236,000,000 peptides.