| Literature DB >> 20158870 |
Nedim Mujezinovic1, Georg Schneider, Michael Wildpaner, Karl Mechtler, Frank Eisenhaber.
Abstract
BACKGROUND: Tandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (most often, only a few %) and about 10% of the peaks per spectrum contribute to the final result if protein identification is not prevented by the noise at all.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20158870 PMCID: PMC2822527 DOI: 10.1186/1471-2164-11-S1-S13
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Influence of background removal on the recovery of BSA, ADH and TRF in MS/MS spectra of 100 fmol test samples. The original number of MS/MS spectra for the BSA (bovine serum albumine), ADH (yeast alcoholdehydrogenase) and TRF (human transferring) datasets (recorded on a DecaXP machine) are 2679, 2325 and 2608 respectively. The intensity threshold s (column 3) describes the search of the sequence ladder (length n in column 2) within the 15%, 20%, 25% or 30% top peaks (100% - all peaks are considered). The following three columns show the MS Cleaner output - number of spectra with background removal, number of unselected spectra and the MS Cleaner CPU time on a single-processor Windows XP computer (Pentium IV 2.4 GHz; to get exact measurements of computation time, we did not use the cluster version). The remaining four columns present the MASCOT output - the CPU time on the same machine, the protein score, the number of spectra matching peptides in a MASCOT search and the final sequence coverage. For each dataset, the first line shows the results for the case when MS Cleaner is not used for pre-processing and the MS/MS data is immediately interpreted by MASCOT.
| protein | sequence ladder length | intensity threshold | cleaned spectra | bad spectra | MS Cleaner time [min] | MASCOT time [min] | MASCOT score | queries matched | sequence coverage |
|---|---|---|---|---|---|---|---|---|---|
| BSA | 0 | 100 | - | - | - | 61 | 586 | 89 | 55 |
| 3 | 100 | 1664 | 1015 | 3.92 | 44 | 720 | 91 | 57 | |
| 3 | 15 | 390 | 2289 | 1.21 | 17 | 1991 | 84 | 52 | |
| 3 | 20 | 490 | 2189 | 1.40 | 21 | 2108 | 87 | 57 | |
| 3 | 25 | 601 | 2078 | 1.61 | 26 | 2114 | 89 | 57 | |
| 3 | 30 | 688 | 1991 | 1.75 | 29 | 2114 | 90 | 57 | |
| 4 | 100 | 940 | 1739 | 3.80 | 36 | 2108 | 91 | 57 | |
| 4 | 15 | 260 | 2419 | 0.91 | 12 | 1875 | 78 | 47 | |
| 4 | 20 | 321 | 2358 | 1.06 | 14 | 1911 | 80 | 47 | |
| 4 | 25 | 380 | 2299 | 1.25 | 18 | 2114 | 86 | 57 | |
| 4 | 30 | 441 | 2238 | 1.30 | 19 | 2114 | 89 | 57 | |
| 5 | 100 | 593 | 2086 | 3.82 | 26 | 2108 | 91 | 57 | |
| 5 | 15 | 174 | 2505 | 0.60 | 9 | 1579 | 60 | 41 | |
| 5 | 20 | 232 | 2447 | 0.85 | 11 | 1809 | 72 | 44 | |
| 5 | 25 | 281 | 2398 | 1.00 | 13 | 1963 | 81 | 49 | |
| 5 | 30 | 313 | 2366 | 0.85 | 14 | 2058 | 86 | 54 | |
| ADH | 0 | 100 | - | - | - | 64 | 242 | 39 | 39 |
| 3 | 100 | 1446 | 879 | 4.15 | 45 | 327 | 34 | 39 | |
| 3 | 15 | 269 | 2056 | 0.88 | 12 | 673 | 29 | 35 | |
| 3 | 20 | 347 | 1978 | 1.10 | 13 | 696 | 31 | 37 | |
| 3 | 25 | 440 | 1885 | 1.33 | 17 | 697 | 32 | 37 | |
| 3 | 30 | 697 | 1628 | 1.53 | 20 | 697 | 33 | 37 | |
| 4 | 100 | 902 | 1423 | 4.15 | 35 | 733 | 34 | 39 | |
| 4 | 15 | 173 | 2152 | 0.58 | 7 | 562 | 26 | 28 | |
| 4 | 20 | 216 | 2109 | 0.71 | 9 | 673 | 30 | 35 | |
| 4 | 25 | 271 | 2054 | 0.90 | 12 | 607 | 28 | 33 | |
| 4 | 30 | 325 | 2000 | 1.05 | 13 | 697 | 32 | 37 | |
| 5 | 100 | 594 | 1731 | 4.20 | 23 | 712 | 33 | 39 | |
| 5 | 15 | 94 | 2231 | 0.35 | 5 | 311 | 15 | 21 | |
| 5 | 20 | 125 | 2200 | 0.46 | 6 | 366 | 17 | 25 | |
| 5 | 25 | 145 | 2180 | 0.53 | 7 | 434 | 19 | 26 | |
| 5 | 30 | 186 | 2139 | 0.66 | 9 | 589 | 24 | 31 | |
| TRF | 0 | 100 | - | - | - | 52 | 588 | 86 | 47 |
| 3 | 100 | 1587 | 1021 | 3.57 | 42 | 768 | 87 | 49 | |
| 3 | 15 | 373 | 2235 | 1.00 | 17 | 1988 | 86 | 49 | |
| 3 | 20 | 485 | 2123 | 1.23 | 20 | 1988 | 86 | 49 | |
| 3 | 25 | 568 | 2040 | 1.36 | 24 | 1998 | 87 | 49 | |
| 3 | 30 | 639 | 1969 | 0.78 | 27 | 1998 | 87 | 49 | |
| 4 | 100 | 864 | 1744 | 3.62 | 34 | 1973 | 87 | 49 | |
| 4 | 15 | 231 | 2377 | 0.70 | 11 | 1987 | 81 | 49 | |
| 4 | 20 | 298 | 2310 | 0.86 | 13 | 1988 | 84 | 49 | |
| 4 | 25 | 360 | 2248 | 1.00 | 16 | 1988 | 85 | 49 | |
| 4 | 30 | 414 | 2194 | 1.12 | 19 | 1998 | 87 | 49 | |
| 5 | 100 | 540 | 2068 | 3.63 | 23 | 1973 | 87 | 49 | |
| 5 | 15 | 164 | 2444 | 0.55 | 9 | 1785 | 68 | 45 | |
| 5 | 20 | 194 | 2414 | 0.61 | 10 | 1890 | 74 | 47 | |
| 5 | 25 | 245 | 2363 | 0.75 | 12 | 1957 | 80 | 48 | |
| 5 | 30 | 286 | 2322 | 0.86 | 14 | 1968 | 84 | 48 | |
Performance of the MSCleaner version 2.0 over a large test set.
| A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | A10 | A11 | A12 | A13 | A14 | A15 | A16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10108 | 633 | 24 | 11.30 | 667 | 24 | 31.65 | 60.07 | 667 | 24 | 15.13 | 51.09 | 667 | 24 | 18.07 | |
| 10184 | 698 | 35 | 9.82 | 780 | 35 | 34.20 | 50.22 | 780 | 35 | 19.05 | 20.25 | 780 | 35 | 22.76 | |
| 10030 | 736 | 28 | 13.26 | 761 | 28 | 28.40 | 79.24 | 761 | 28 | 8.66 | 73.58 | 761 | 28 | 10.63 | |
| 9870 | 801 | 36 | 13.31 | 860 | 37 | 29.50 | 72.62 | 860 | 37 | 11.70 | 63.95 | 860 | 37 | 14.29 | |
| 10032 | 2606 | 63 | 11.72 | 2814 | 63 | 30.76 | 63.10 | 2814 | 63 | 13.93 | 54.49 | 2814 | 63 | 16.78 | |
| 10090 | 2571 | 60 | 12.13 | 2761 | 60 | 32.95 | 53.12 | 2761 | 60 | 17.53 | 44.32 | 2761 | 60 | 21.03 | |
| 10324 | 1459 | 56 | 7.17 | 1567 | 57 | 34.98 | 48.06 | 1567 | 57 | 22.05 | 40.53 | 1567 | 57 | 24.60 | |
| 10368 | 1309 | 51 | 8.12 | 1508 | 56 | 36.71 | 42.90 | 1454 | 55 | 24.76 | 33.10 | 1454 | 55 | 28.61 | |
| 9946 | 586 | 49 | 12.35 | 616 | 49 | 26.35 | 90.31 | 573 | 49 | 3.65 | 84.94 | 607 | 49 | 5.48 | |
| 9534 | 582 | 52 | 13.40 | 616 | 52 | 26.27 | 86.07 | 616 | 52 | 5.08 | 78.44 | 616 | 52 | 7.66 | |
| 10098 | 1798 | 61 | 11.13 | 1886 | 61 | 30.88 | 67.26 | 1879 | 61 | 13.13 | 57.89 | 1879 | 61 | 16.50 | |
| 10034 | 1567 | 65 | 11.78 | 1693 | 65 | 31.90 | 59.50 | 1693 | 65 | 15.91 | 48.55 | 1693 | 65 | 19.56 | |
| 10118 | 2780 | 59 | 10.30 | 3079 | 61 | 35.13 | 63.49 | 3014 | 60 | 14.26 | 54.46 | 3047 | 61 | 17.25 | |
| 10096 | 2655 | 61 | 10.52 | 3116 | 65 | 32.58 | 53.96 | 3084 | 65 | 17.58 | 44.31 | 3116 | 65 | 21.16 | |
| 10006 | 892 | 36 | 11.29 | 986 | 36 | 27.30 | 79.55 | 986 | 36 | 7.75 | 73.42 | 986 | 36 | 9.71 | |
| 9886 | 850 | 34 | 11.81 | 962 | 34 | 28.73 | 72.51 | 962 | 34 | 10.13 | 62.25 | 962 | 34 | 13.51 | |
| 10022 | 351 | 25 | 10.36 | 389 | 25 | 28.61 | 71.64 | 348 | 25 | 10.25 | 62.78 | 389 | 25 | 14.30 | |
| 10156 | 341 | 33 | 9.18 | 384 | 33 | 31.31 | 61.15 | 384 | 33 | 14.25 | 49.59 | 384 | 33 | 28.11 | |
| 10330 | 506 | 35 | 9.27 | 565 | 35 | 36.20 | 42.30 | 565 | 35 | 24.95 | 34.44 | 565 | 35 | 27.66 | |
| 10334 | 356 | 66 | 8.61 | 500 | 66 | 38.05 | 37.06 | 500 | 66 | 27.31 | 28.47 | 500 | 66 | 30.31 | |
| 10286 | 1549 | 58 | 10.36 | 1694 | 58 | 35.36 | 53.20 | 1694 | 58 | 20.03 | 44.86 | 1694 | 58 | 23.15 | |
| 10250 | 1346 | 54 | 9.07 | 1483 | 54 | 36.48 | 40.16 | 1483 | 54 | 25.60 | 31.67 | 1483 | 54 | 28.31 | |
| 10242 | 1613 | 45 | 13.16 | 1764 | 45 | 34.78 | 62.12 | 1756 | 45 | 15.91 | 52.37 | 1764 | 45 | 19.53 | |
| 10402 | 1679 | 43 | 9.09 | 1890 | 44 | 35.18 | 51.70 | 1890 | 44 | 20.31 | 41.76 | 1890 | 44 | 23.85 | |
| 9958 | 561 | 66 | 11.67 | 594 | 66 | 27.26 | 85.42 | 594 | 66 | 5.46 | 79.25 | 594 | 66 | 7.45 | |
| 9744 | 530 | 66 | 12.15 | 584 | 66 | 28.01 | 80.83 | 584 | 66 | 6.95 | 70.92 | 584 | 66 | 10.35 |
A1 name of test set (.mgf file; see Methods),
A2 total number of spectra (.dta files),
A3 MASCOT score of top protein hit with the original .mgf file
(without application of MS Cleaner),
A4 sequence coverage (in %) without application of MS Cleaner,
A5 fraction of non-interpretable "bad" spectra found with sequence ladder
length = 4 among all peaks (intensity threshold = 100%)
A6 MASCOT score of the top protein hit for this search,
A7 sequence coverage (in % of the whole protein length) for this search,
A8 MS Cleaner processing time (in min) on a PC with a single Pentium IV (to
achieve exact time consumption values, we did not use the cluster version and
stopped the "soft frequency recognition option")
A9 fraction of non-interpretable "bad" spectra found with sequence ladder
length = 4 among the = 20% most intense peaks
A10 MASCOT score of the top protein hit for this search,
A11 sequence coverage (in % of the whole protein length) for this search,
A12 MS Cleaner processing time (in min),
A13 fraction of non-interpretable "bad" spectra found with sequence ladder
length = 4 among the = 25% most intense peaks (in % of A2; i.e.,
of all spectra)
A14 MASCOT score of the top protein hit for this search,
A15 sequence coverage (in % of the whole protein length) for this MASCOT
search,
A16 MS Cleaner processing time on the same machine as described in the legend of
Table 1 (in min).
The sequence ladder criterion (minimal ladder length 4 with varying peak intensity thresholds) and the noise suppression algorithms of MS Cleaner 2.0 have been applied over a large set of tandem MS results. For each of the test proteins, two independent sample preparations and dataset recordings (marked with appendices _col1 and _col2 in the dataset name) were carried out: α-amylase, amylogucosidase, apo-transferrin, β-galactidase, carbonic anhydrase, catalase, phosphorylase B, glutamic dehydrogenase, glutathione transferase, immunoglobulin γ, lactic dehydrogenase, lactoperoxidase, myoglobin). For these datasets, the MASCOT interpretation was carried out on a cluster in parallel with other jobs; therefore, no computation time is provided.