| Literature DB >> 25167464 |
Zuo-Fei Yuan1, Shu Lin, Rosalynn C Molden, Benjamin A Garcia.
Abstract
Identification of histone post-translational modifications (PTMs) is challenging for proteomics search engines. Including many histone PTMs in one search increases the number of candidate peptides dramatically, leading to low search speed and fewer identified spectra. To evaluate database search engines on identifying histone PTMs, we present a method in which one kind of modification is searched each time, for example, unmodified, individually modified, and multimodified, each search result is filtered with false discovery rate less than 1%, and the identifications of multiple search engines are combined to obtain confident results. We apply this method for eight search engines on histone data sets. We find that two search engines, pFind and Mascot, identify most of the confident results at a reasonable speed, so we recommend using them to identify histone modifications. During the evaluation, we also find some important aspects for the analysis of histone modifications. Our evaluation of different search engines on identifying histone modifications will hopefully help those who are hoping to enter the histone proteomics field. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001118.Entities:
Keywords: data analysis; histone; post-translational modification; proteomics; search engine
Mesh:
Substances:
Year: 2014 PMID: 25167464 PMCID: PMC4184451 DOI: 10.1021/pr5008015
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Parameters for Database Search
| precursor | ±10 | |
| fragment | HCD: ± 0.02; CID: ± 0.4 | |
| fully enzymatic | trypsin cleaves after arginine | |
| max missed cleavages | 2 | |
| fixed modification | Propionyl[Peptide N-term]/+56.026 | |
| variable modifications | first(un) | Propionyl[K]/+56.026 |
| second(ac) | Propionyl[K]/+56.026; Acetyl[K]/+42.011 | |
| third(me) | Propionyl[K]/+56.026; Methyl_Propionyl[K]/+70.042 | |
| fourth(di) | Propionyl[K]/+56.026; Dimethyl[K]/+28.031 | |
| fifth(tr) | Propionyl[K]/+56.026; Trimethyl[K]/+42.047 | |
| sixth(ph) | Propionyl[K]/+56.026; Phospho[ST]/+79.966 | |
| seventh(co) | Propionyl[K]/+56.026; Acetyl[K]/+42.011; Methyl_Propionyl[K]/+70.042; Dimethyl[K]/+28.031; Trimethyl[K]/+42.047; Phospho[ST]/+79.966 | |
| database | 57 human histone proteins and their reversed form | |
Figure 1Workflow of evaluating database search engines on identifying histone modifications. There are five steps: (1) converting RAW files to MGF files, TXT files, and mzXML files, (2) searching with separate modifications, (3) filtering with FDR <1%, (4) removing redundant IDs from different searches with lower scores, and (5) combining IDs of all search engines.
Figure 2Number of histone IDs before and after combination of search engines. The last green solid bar indicates all confident IDs. Other solid bars indicate IDs before combination. Dashed bars indicate IDs after combination.
Proportions of Confident IDs for Each Engine
| pFind | Mascot | SEQUEST | ProteinPilot | PEAKS | OMSSA | X!Tandem | MaxQuant | ||
|---|---|---|---|---|---|---|---|---|---|
| HCD_Histone | un | 96% (100%, 92%) | 84% (99%, 73%) | 79% (100%, 65%) | 24% (99%, 14%) | 73% (97%, 59%) | |||
| ac | 42% (87%, 28%) | 71% (97%, 55%) | 42% (98%, 27%) | 79% (98%, 67%) | 78% (92%, 67%) | 72% (87%, 62%) | |||
| me | 55% (61%, 51%) | 80% (100%, 67%) | 0% (0%, 0%) | 52% (80%, 38%) | 58% (70%, 50%) | ||||
| di | 87% (79%, 97%) | 83% (97%, 73%) | 5% (100%, 2%) | 68% (97%, 53%) | 73% (92%, 60%) | ||||
| tr | 77% (68%, 88%) | 61% (95%, 45%) | 0% (0%, 0%) | 83% (98%, 72%) | 77% (82%, 72%) | ||||
| ph | 65% (50%, 93%) | 57% (86%, 43%) | 0% (0%, 0%) | 53% (100%, 36%) | 57% (57%, 57%) | ||||
| co | 45% (30%, 87%) | 63% (92%, 49%) | 0% (0%, 0%) | 0% (0%, 0%) | 5% (15%, 3%) | 36% (48%, 29%) | |||
| CID_Histone | un | 96% (100%, 93%) | 90% (100%, 81%) | 72% (100%, 56%) | 12% (95%, 6%) | 71% (99%, 55%) | |||
| ac | 50% (79%, 36%) | 21% (88%, 12%) | 55% (99%, 38%) | 39% (99%, 24%) | 6% (70%, 3%) | 62% (59%, 65%) | |||
| me | 73% (95%, 59%) | 53% (95%, 36%) | 0% (0%, 0%) | 57% (98%, 41%) | 78% (84%, 73%) | ||||
| di | 72% (92%, 59%) | 7% (67%, 4%) | 0% (0%, 0%) | 64% (93%, 49%) | 78% (99%, 65%) | ||||
| tr | 82% (83%, 80%) | 21% (17%, 28%) | 0% (0%, 0%) | 25% (92%, 14%) | 31% (96%, 19%) | ||||
| ph | 40% (25%, 100%) | 33% (20%, 100%) | 0% (0%, 0%) | 0% (0%, 0%) | 0% (0%, 0%) | ||||
| co | 50% (41%, 66%) | 48% (87%, 33%) | 0% (0%, 0%) | 0% (0%, 0%) | 0% (0%, 0%) | 15% (10%, 34%) | |||
Precision = #IDs after combination/#IDs before combination. Recall = #IDs after combination/#all confident IDs, F score = 2/(1/Precision + 1/Recall).
Largest three F-scores (the third largest ≥80%) in each row are underlined.
Figure 3Number of confident IDs between pFind and other engines. Other-pFind indicates IDs only in the other search engine. pFind∩Other indicates IDs in both pFind and the other search engine. pFind-Other indicates IDs only in pFind.
Proportions of Confident IDs between pFind and Other Engines
| harmonic mean (Common/pFind,
Common/Other) | Mascot | SEQUEST | ProteinPilot | PEAKS | OMSSA | X!Tandem | MaxQuant | |
|---|---|---|---|---|---|---|---|---|
| HCD_Histone | un | 84% (74%, 99%) | 78% (65%, 98%) | 24% (14%, 97%) | 74% (59%, 99%) | |||
| ac | 44% (28%, 97%) | 74% (58%, 100%) | 43% (27%, 96%) | 76% (63%, 96%) | ||||
| me | 69% (53%, 100%) | 0% (0%, 0%) | 54% (38%, 96%) | 64% (49%, 94%) | ||||
| di | 84% (75%, 96%) | 5% (2%, 100%) | 71% (55%, 98%) | 73% (60%, 93%) | ||||
| tr | 63% (47%, 92%) | 0% (0%, 0%) | 81% (74%, 90%) | 76% (69%, 84%) | ||||
| ph | 53% (38%, 83%) | 0% (0%, 0%) | 44% (31%, 80%) | 67% (54%, 88%) | ||||
| co | 63% (47%, 94%) | 0% (0%, 0%) | 0% (0%, 0%) | 6% (3%, 100%) | 46% (30%, 100%) | |||
| CID_Histone | un | 90% (82%, 100%) | 72% (56%, 99%) | 11% (6%, 98%) | 71% (55%, 99%) | |||
| ac | 55% (38%, 100%) | 22% (13%, 100%) | 57% (39%, 100%) | 39% (24%, 97%) | 6% (3%, 100%) | |||
| me | 77% (63%, 98%) | 56% (39%, 100%) | 0% (0%, 0%) | 61% (44%, 100%) | ||||
| di | 74% (59%, 100%) | 7% (4%, 100%) | 0% (0%, 0%) | 66% (49%, 100%) | 79% (65%, 100%) | |||
| tr | 46% (30%, 99%) | 0% (0%, 0%) | 18% (10%, 68%) | 30% (18%, 89%) | ||||
| ph | 0% (0%, 0%) | 0% (0%, 0%) | 0% (0%, 0%) | |||||
| co | 54% (40%, 85%) | 0% (0%, 0%) | 0% (0%, 0%) | 0% (0%, 0%) | 35% (26%, 54%) | |||
Common means IDs in both pFind and the other engine. Common/pFind = #IDs in common/#IDs in pFind, Common/Other = #IDs in common/#IDs in the other engine. Harmonic mean = 2/(1/(Common/pFind) + 1/(Common/Other)).
Largest three harmonic means (the third largest ≥80%) in each row are underlined.
Search Time and Result Spacea
| median search time (s) for six searches | ||||||||
|---|---|---|---|---|---|---|---|---|
| pFind | Mascot | SEQUEST | ProteinPilot | PEAKS | OMSSA | X!Tandem | MaxQuant | |
| HCD_Histone | 45.25 | 263 | 83.4 | 140 | 24413 | 20 | 20.5 | 963 |
| CID_Histone | 46.15 | 313.5 | 127.2 | 441.5 | 5024.5 | 60 | 16.5 | 950.5 |
Search time is obtained in the PC: processor with Intel Core i7-3770 CPU @ 3.4 GHz and 8 cores, RAM with 8 GB, 64-bit Windows 7 Professional.