| Literature DB >> 19208151 |
An-Min Zou1, Fang-Xiang Wu, Jia-Rui Ding, Guy G Poirier.
Abstract
BACKGROUND: Tandem mass spectrometry has become particularly useful for the rapid identification and characterization of protein components of complex biological mixtures. Powerful database search methods have been developed for the peptide identification, such as SEQUEST and MASCOT, which are implemented by comparing the mass spectra obtained from unknown proteins or peptides with theoretically predicted spectra derived from protein databases. However, the majority of spectra generated from a mass spectrometry experiment are of too poor quality to be interpreted while some of spectra with high quality cannot be interpreted by one method but perhaps by others. Hence a filtering algorithm that removes those spectra with poor quality prior to the database search is appealing.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19208151 PMCID: PMC2648784 DOI: 10.1186/1471-2105-10-S1-S49
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The distribution of multiply charged spectra in the ISB dataset
| H* | P* | Total | |
| Doubly charged | 1242 | 17253 | 18495 |
| Triply charged | 573 | 17471 | 18044 |
| Total | 1815 | 34724 | 36529 |
*A doubly charged spectrum is of high quality if its SQUEST Xcorr score is greater than 2.5, and otherwise it is of poor quality. A triply charged spectrum is of high quality if its SQUEST Xcorr score is greater than 3.5, and otherwise it is of poor quality.
The distribution of spectra in the TOV dataset
| H* | P* | Total | |
| Singly charged | 10 | 917 | 927 |
| Doubly charged | 667 | 5575 | 6242 |
| Triply charged | 189 | 6109 | 6298 |
| Total | 866 | 12601 | 13467 |
*A spectrum is of high quality if its PeptideProphet score is greater than 0.8, and otherwise it is of poor quality.
The number of the samples in the training and test sets
| SVM classifier | Training set | Test set |
| SVM2ISB | 430:430 | 812:16833 |
| SVM3ISB | 300:300 | 273:17171 |
| SVMMISB | 605:605 | 1210:34623 |
| SVM2TOV | 350:350 | 317:5225 |
The results in the ISB test data with the SVM classifier for doubly charged spectra
| Times | FP | FN | TPR (%) | TNR (%) |
| 1 | 1608 | 64 | 92.1 | 90.4 |
| 2 | 1631 | 69 | 91.5 | 90.3 |
| 3 | 1853 | 52 | 93.6 | 89.0 |
| 4 | 1879 | 46 | ||
| 5 | 1719 | 63 | 92.2 | 89.8 |
| 6 | 1887 | 55 | 93.2 | 88.8 |
| 7 | 1633 | 65 | 92.0 | 90.3 |
| 8 | 1667 | 74 | 90.9 | 90.1 |
| 9 | 1643 | 70 | 91.4 | 90.2 |
| 10 | 1660 | 80 | 90.2 | 90.1 |
| 11 | 2070 | 59 | 92.7 | 87.7 |
| 12 | 1723 | 58 | 92.9 | 89.8 |
| 13 | 1739 | 73 | 91.0 | 89.7 |
| 14 | 1813 | 74 | 90.9 | 89.2 |
| 15 | 1667 | 68 | 91.6 | 90.1 |
| 16 | 1921 | 57 | 93.0 | 88.6 |
| 17 | 1793 | 54 | 93.4 | 89.3 |
| 18 | 1756 | 77 | 90.5 | 89.6 |
| 19 | 1767 | 55 | 93.2 | 89.5 |
| 20 | 1653 | 73 | 91.0 | 90.2 |
| Ave. | 1754 | 64 | 92.1 | 89.6 |
| SD | 120.5 | 9.5 | 1.17 | 0.72 |
The results in the ISB test data with the SVM classifier for triply charged spectra
| Times | FP | FN | TPR (%) | TNR (%) |
| 1 | 2120 | 24 | 91.2 | 87.7 |
| 2 | 2091 | 15 | 94.5 | 87.8 |
| 3 | 2195 | 12 | ||
| 4 | 2324 | 19 | 93.0 | 86.5 |
| 5 | 2029 | 26 | 90.5 | 88.2 |
| 6 | 1952 | 23 | 91.6 | 88.6 |
| 7 | 2163 | 12 | 95.6 | 87.4 |
| 8 | 2350 | 19 | 93.0 | 86.3 |
| 9 | 1967 | 20 | 92.7 | 88.5 |
| 10 | 1994 | 21 | 92.3 | 88.4 |
| 11 | 2071 | 17 | 93.7 | 87.9 |
| 12 | 1948 | 21 | 92.3 | 88.7 |
| 13 | 2163 | 26 | 90.5 | 87.4 |
| 14 | 1998 | 26 | 90.5 | 88.4 |
| 15 | 2162 | 23 | 91.6 | 87.4 |
| 16 | 2101 | 16 | 94.1 | 87.8 |
| 17 | 2005 | 21 | 92.3 | 88.3 |
| 18 | 2161 | 20 | 92.7 | 87.4 |
| 19 | 1930 | 28 | 89.7 | 88.7 |
| 20 | 2134 | 17 | 93.8 | 87.6 |
| Ave. | 2093 | 20 | 92.7 | 87.8 |
| SD | 118.3 | 4.6 | 1.67 | 0.69 |
The results in the ISB test data with the SVM classifier for multiply charged spectra
| Times | FP | FN | TPR (%) | TNR (%) |
| 1 | 4412 | 121 | 90.8 | 87.3 |
| 2 | 4430 | 101 | 91.7 | 87.2 |
| 3 | 4663 | 118 | 90.3 | 86.5 |
| 4 | 4348 | 106 | 91.2 | 87.4 |
| 5 | 4337 | 122 | 89.9 | 87.5 |
| 6 | 4639 | 106 | 91.2 | 86.6 |
| 7 | 3944 | 121 | 90.0 | 88.6 |
| 8 | 4444 | 103 | 91.5 | 87.2 |
| 9 | 4684 | 109 | 91.0 | 86.5 |
| 10 | 4705 | 92 | 92.4 | 86.4 |
| 11 | 4296 | 109 | 91.0 | 87.6 |
| 12 | 4383 | 114 | 90.6 | 87.3 |
| 13 | 4342 | 121 | 90.0 | 87.5 |
| 14 | 4485 | 94 | ||
| 15 | 4197 | 114 | 90.6 | 87.9 |
| 16 | 4604 | 107 | 91.2 | 86.7 |
| 17 | 4499 | 110 | 90.9 | 87.0 |
| 18 | 4007 | 131 | 89.2 | 88.4 |
| 19 | 4009 | 138 | 88.6 | 88.4 |
| 20 | 4275 | 111 | 90.8 | 87.7 |
| Ave. | 4385 | 112 | 90.7 | 87.3 |
| SD | 223.7 | 11.4 | 0.94 | 0.65 |
The results in the TOV test data with the SVM classifier for doubly charged spectra
| Times | FP | FN | TPR (%) | TNR (%) |
| 1 | 814 | 31 | 90.2 | 84.4 |
| 2 | 925 | 23 | 92.7 | 82.3 |
| 3 | 839 | 29 | 90.9 | 83.9 |
| 4 | 911 | 29 | 90.9 | 82.6 |
| 5 | 856 | 24 | 92.4 | 83.6 |
| 6 | 800 | 22 | 93.1 | 84.7 |
| 7 | 816 | 30 | 90.5 | 84.4 |
| 8 | 920 | 15 | 95.3 | 82.4 |
| 9 | 788 | 27 | 91.5 | 84.9 |
| 10 | 799 | 27 | 91.5 | 84.7 |
| 11 | 790 | 30 | 90.5 | 84.9 |
| 12 | 787 | 15 | ||
| 13 | 922 | 29 | 90.9 | 82.4 |
| 14 | 766 | 27 | 91.5 | 85.3 |
| 15 | 957 | 18 | 89.9 | 81.7 |
| 16 | 819 | 28 | 91.2 | 84.93 |
| 17 | 871 | 22 | 93.1 | 83.3 |
| 18 | 885 | 25 | 92.1 | 83.1 |
| 19 | 830 | 27 | 91.5 | 84.1 |
| 20 | 823 | 23 | 92.7 | 83.8 |
| Ave. | 846 | 25 | 92.1 | 83.8 |
| SD | 56.4 | 4.8 | 1.51 | 1.08 |
Figure 1ROC curve for the SVM classifier for multiply charged ISB spectra. Even if only 2% loss of high quality multiply charged spectra is allowed, the proposed method can filter out about 70% of the poor quality ones.
Figure 2ROC curve for SVM classifier for doubly charged TOV spectra. Even if only 2% loss of high quality doubly charged spectra is allowed, the proposed mrthod can filter out over 65% of the poor quality ones.
Correct rates of the proposed method and some existing methods
| TPR (%) | TNR (%) | |
| SVM2ISB | 92.1 | 89.6 |
| SVM3ISB | 92.7 | 87.8 |
| SVMMISB | 90.7 | 87.3 |
| SVM2TOV | 92.1 | 83.8 |
| Flikka | 90.0 | 83.0 |
| Wong | 90.0 | 83.0 |
| Bern | 90.0 | 75.1 |
| Salmi | 90.0 | 75.0 |
| Na | 90.0 | 75.0 |
| Purvine | - | 55.0 |
| Tabb | - | 40.0 |
Figure 3A false negative spectrum from the ISB dataset.