| Literature DB >> 17227583 |
Jian Liu1, Alexander W Bell, John J M Bergeron, Corey M Yanofsky, Brian Carrillo, Christian E H Beaudrie, Robert E Kearney.
Abstract
BACKGROUND: Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies.Entities:
Year: 2007 PMID: 17227583 PMCID: PMC1783643 DOI: 10.1186/1477-5956-5-3
Source DB: PubMed Journal: Proteome Sci ISSN: 1477-5956 Impact factor: 2.480
Figure 1The peak intensities in GluFib MS/MS spectra can be approximated by a Poisson distribution. Histograms of peak intensities observed in sequential ms/ms runs are shown. The superimposed fitting curves are those predicted for the Poisson distributions. Each panel illustrates the result for peaks at a different m/z value.
Figure 2Distributions of scores for different similarity measures for spectra (dataset 1) from the same and different peptides: (a) shared peak ratio, (b) cosine value and (c) correlation coefficient.
Specification of eight typical configurations for spectral comparison. (For pairwise comparison in 1–5, the experimental spectra with the highest scores were selected as references from the clusters. SMZ in row 4 indicates that the intensity is weighted by the square of m/z as in [20].)
| Setting ID | Intensity transform | Binning method | Reference spectra | ROC area |
| 1 | logarithm | profiling | individual spectra | 0.993 |
| 2 | no transform | profiling | individual spectra | 0.992 |
| 3 | square root w/SMZ | profiling | individual spectra | 0.993 |
| 4 | square root | direct binning | individual spectra | 0.997 |
| 5 | square root | profiling | individual spectra | 0.998 |
| 6 | square root | profiling | theoretical spectra | 0.987 |
| 7 | square root | profiling | closest neighbors | 0.999 |
| 8 | square root | profiling | average spectra | 0.999 |
Figure 3ROC curves obtained for spectral comparison (dataset 1) using different signal transforms. The square root transform performs significantly better than other methods.
Figure 4Effects of different models of reference spectra (dataset 1) on the ROC curves. Ensemble average and closest neighbor perform better than other methods.
Performance comparison using average and individual spectra as references
| # of candidate peptides | ROC area | |
| Average spectra | Individual spectra | |
| 10 | 0.993 | 0.983 |
| 100 | 0.942 | 0.900 |
| 500 | 0.826 | 0.762 |
| 1,000 | 0.764 | 0.685 |
Figure 5Cross validation between datasets 1 and 2: cumulative distributions of similarity scores for the same and different peptides in spectrum clusters of different sizes. Increasing the sizes of clusters leads to better separation of spectral similarity scores between the same and different peptides.
Figure 6MASCOT Scores for the individual and average spectra from the 35 peptides in dataset 3, sorted by the MASCOT score of the ensemble average spectrum. MASCOT scores of individual spectra are indicated by blue dots ("0" indicates that the spectrum was not identified by MASCOT); those of average spectra are marked by red crosses.