| Literature DB >> 23176207 |
Kyowon Jeong1, Sangtae Kim, Nuno Bandeira.
Abstract
Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold x% with the expectation that at most x% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives. Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS/MS spectra from samples where we were able to define a factual FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10× higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy. While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23176207 PMCID: PMC3489529 DOI: 10.1186/1471-2105-13-S16-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Details on experiments performed
| Search#1 | Spectra2 | Database3 | Decoy4 | PMTol8 | Note11 |
|---|---|---|---|---|---|
| I-1 | ISB-02 | ISB | Rev5 | ||
| I-2 | ISB-02 | ISB | Shfl6 | ||
| I-3 | ISB-02 | ISB+ | Rev | ||
| I-4 | ISB-02 | ISB+ | Shfl | ||
| I-5 | ISB-02+ | ISB+ | Rev | ||
| I-6 | ISB-02+ | ISB+ | Shfl | ||
| I-7 | ISB-02+ | ISB+ | Rev | ||
| I-8 | ISB-02+ | ISB+ | Shfl | ||
| I-9 | ISB-02 | ISB | Sep.Rev7 | ||
| I-10 | ISB-02+ | ISB+ | Sep.Rev | ||
| I-11 | ISB-02+ | ISB+ | Sep.Rev | ||
| I-12 | ISB-02 | ISB+ | Rev | Alt.Formula12 | |
| I-13 | ISB-02+ | ISB+ | Rev | Alt.Formula | |
| I-14 | ISB-02+ | ISB | Rev | ||
| I-15 | ISB-02+ | ISB+ | Rev | ||
| I-16 | ISB-02 | ISB | Rev | 30 ppm10 | |
| I-17 | ISB-02+ | ISB+ | Rev | 30 ppm | |
| I-18 | ISB-02+ | ISB+ | Rev | 30 ppm | |
| I-19 | ISB-02 | ISB | Rev | Alt.Score13 | |
| I-20 | ISB-02+ | ISB+ | Rev | Alt.Score | |
| I-21 | ISB-02+ | ISB+ | Rev | Alt.Score | |
| I-22 | ISB-All+ | ISB+ | Rev | ||
| I-23 | ISB-All+ | ISB | Rev | ||
| Y-1 | Y-Small+ | Yeast+ | Rev | 30 ppm | |
| Y-2 | Y-Small+ | Yeast+ | Shfl | 30 ppm | |
| Y-3 | Y-Small+ | Yeast+ | Sep.Rev | 30 ppm | |
| Y-4 | Y-Small+ | Yeast+ | Rev | 30 ppm | Alt.Formula |
| Y-5 | Y-Small+ | Yeast | Rev | 30 ppm | |
| Y-6 | Y-Small+ | Yeast+ | Rev | 30 ppm | |
| Y-7 | Y-Small+ | Yeast+ | Rev | 30 ppm | Alt.Score |
| Y-8 | Y-All+ | Yeast+ | Rev | 30 ppm | |
| Y-9 | Y-All+ | Yeast | Rev | 30 ppm | |
| Y-10 | Y-Small+ | Yeast+ | Rev | 30 ppm | TwoPass(1)14 |
| Y-11 | Y-Small+ | Yeast+ | Rev | 30 ppm | TwoPass(2) |
| Y-12 | Y-Small+ | Yeast+ | Rev | 30 ppm | TwoPass(3) |
| Y-13 | Y-Small+ | Yeast+ | Rev | 30 ppm | TwoPass(4) |
For each of the searches I-1 to I-23 (Y-1 to Y-13), we counted the numbers of positive target PSMs (N) at factual/empirical FDR 5% (1%) and computed the corresponding factual/empirical FDR of the positive PSMs. The underlined characters represent either dummy spectra, dummy databases, or dummy tolerance. 1Search identifier; 2MS/MS spectra used; 3Protein database; 4Decoy database type; 5Reversed decoy database; 6Shuffled decoy database; 7Separate search against target and reversed decoy database; 8Parent mass tolerance; 9Dalton; 10Parts per million; 11Additional note; 12Alternative formula was used to calculate FDR (see text); 13Alternative score was used to calculate FDR (see text); 14Two-pass searches (see text and Table 11).
2 ×2 tables for Fisher's exact test.
| Estimator | # positives | # estimated false positives |
|---|---|---|
| FactFDR1 | ||
| EmpiricalFDR2 | ||
| FactFDR | 2 · | |
| EmpiricalFDR | 2 · | |
| FactPepFDR3 | ||
| EmpiricalFDR | ||
| FactPepFDR | ||
| EmpiricalPepFDR4 | ||
When the p-value of the Fisher's exact test (the Fisher's p-value) for a specific experiment was smaller than 5%, we regarded the empirical FDR for the experiment as inaccurate. For most searches the definitions in (a) were used. The definitions in (b) were used only for the searches I-12, I-13, and Y-4 (i.e., searches using the alternative formula - see Table 5 and text). The definitions in (c) were used for experiments in Table 10 (empirical PSM-level FDR vs. factual peptide-level FDR), and the definitions in (d) were for experiments in Table 11 (empirical peptide-level FDR vs. factual peptide-level FDR). 1factual FDR; 2empirical FDR; 3factual peptide-level FDR; 4empirical peptide-level FDR; N: the number of positive target PSMs; N: the number of positive dummy PSMs; N: the estimated number of false positive putative PSMs; N: the number of positive decoy PSMs; N: the number of positive target peptides; N: the number of positive dummy peptides; N: the estimated number of false positive putative peptides; N: the number of positive decoy peptides.
Comparison between searches using reversed or shuffled decoy databases
| Search# | Spectra | Database | PMTol | Decoy | EmpiricalFDR1 fixed | FactFDR2 fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) 4 | EmpiricalFDR(%) | |||||||
| I-1 | ISB-02 | ISB | Rev | 2329/1009 | 5.8/4.4 | 10.9/30.9 | 2279/1024 | 3.9/5.7 | |
| I-2 | ISB-02 | ISB | Shfl | 2339/1023 | 6.0/4.6 | 7.2/38.6 | 2279/1025 | 3.6/5.1 | |
| I-3 | ISB-02 | ISB+ | Rev | 1578/602 | 4.7/5.1 | 40.8/50.3 | 1583/596 | 5.1/3.9 | |
| I-4 | ISB-02 | ISB+ | Shfl | 1597/577 | 5.0/4.2 | 50.2/34.5 | 1589/588 | 4.8/5.6 | |
| I-5 | ISB-02+ | ISB+ | Rev | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-6 | ISB-02+ | ISB+ | Shfl | 1488/530 | 5.0/4.0 | 50.2/28.7 | 1478/550 | 4.9/5.5 | |
| I-7 | ISB-02+ | ISB+ | Rev | 1320/441 | 4.6/4.1 | 36.6/38.0 | 1342/464 | 5.8/7.3 | |
| I-8 | ISB-02+ | ISB+ | Shfl | 1287/441 | 3.4/4.1 | 7.5/38.0 | 1342/464 | 6.8/7.3 | |
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | Rev | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
| Y-2 | Y-Small+ | Yeast+ | 30 ppm | Shfl | 2554/1940 | 0.9/1.2 | 38.7/27.3 | 2620/1758 | 1.1/0.6 |
All searches followed the standard TDA procedure except the step 2 for shuffled database searches. The results in columns labeled "FDR fixed" are obtained at empirical FDR threshold of 5% (the searches I-1 to I-8) or 1%(the searches Y-1 to Y-2). The results in columns labeled "FactFDR fixed" are obtained at factual FDR threshold of 5% (the searches I-1 to I-8) or 1%(the searches Y-1 to Y-2). The underlined characters represent either dummy spectra, dummy databases, or dummy tolerance. The first numbers in N/FactFDR/FDR/p-value fields are from MS-GFDB, and the second from X!Tandem. Note that we do not aim to compare database search engines (i.e., MS-GFDB vs. X!Tandem). We only evaluate how FDR estimation via TDA is reliable and how the number of positive PSMs (or peptides) changes for different search strategies with different parameters or protocols.
In contrast with popular belief, we did not observe a conservative estimation of FDR with shuffled decoy when compared to the reverse decoy database.
1the empirical FDR; 2the factual FDR; 3 the number of positive target PSMs; 4 Fisher p-value (see Table 2) - Fisher p-values less than 5% were emphasized with bold fonts.
Comparison between concatenated-decoy searches and separate-decoy searches
| Search# | Spectra | Database | PMTol | Decoy | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| I-1 | ISB-02 | ISB | Rev | 2329/1009 | 5.8/4.4 | 10.9/30.9 | 2279/1024 | 3.9/5.7 | |
| I-9 | ISB-02 | ISB | Sep.Rev | 2159/941 | 2.1/2.4 | 2287/1028 | 8.6/12.2 | ||
| I-5 | ISB-02+ | ISB+ | Rev | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-10 | ISB-02+ | ISB+ | Sep.Rev | 1462/504 | 4.6/3.6 | 40.3/18.7 | 1482/544 | 5.3/7.9 | |
| I-7 | ISB-02+ | ISB+ | Rev | 1320/441 | 4.6/4.1 | 36.6/38.0 | 1342/464 | 5.8/7.3 | |
| I-11 | ISB-02+ | ISB+ | Sep.Rev | 1287/453 | 3.4/4.9 | 1342/456 | 7.0/5.5 | ||
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | Rev | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
| Y-3 | Y-Small+ | Yeast+ | 30 ppm | Sep.Rev | 2501/1605 | 0.8/.0.7 | 33.1/28.7 | 2589/1759 | 1.4/1.5 |
All searches followed the standard TDA procedure except the step 2 for separate reverse database searches. The search I-9 demonstrates that the separate-decoy searches result in more conservative FDR estimation than concatenated-decoy searches, in particular for small databases.
Comparison between two FDR formulas
| Search# | Spectra | Database | PMTol | Formula | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| I-3 | ISB-02 | ISB+ | 1 | 1578/602 | 4.7/5.1 | 40.8/50.3 | 1583/596 | 5.1/3.9 | |
| I-12 | ISB-02 | ISB+ | 2 | 1452/550 | 2.3/2.5 | 1583/596 | 9.6/7.4 | ||
| I-5 | ISB-02+ | ISB+ | 1 | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-13 | ISB-02+ | ISB+ | 2 | 1387/502 | 2.8/3.2 | 1480/553 | 8.7/8.3 | ||
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | 1 | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
| Y-4 | Y-Small+ | Yeast+ | 30 ppm | 2 | 2453/1626 | 0.8/0.8 | 27.8/36.2 | 2588/1759 | 2.0/1.0 |
The Formula field in the fifth column specifies the formula for the FDR calculation: 1 for Nand 2 for 2 · N(N+N). For all searches, the standard TDA procedure was followed except the step 4 for searches using formula 2. The searches I-12 and I-13 show that using formula 2 results in conservative FDR estimation.
Comparison between searches against databases of different sizes
| Search# | Spectra | Database | PMTol | DB size | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| I-1 | ISB-02 | ISB | 7,440 | 2329/1009 | 5.8/4.4 | 10.9/30.9 | 2279/1024 | 3.9/5.7 | |
| I-3 | ISB-02 | ISB+ | 3,019,432 | 1578/602 | 4.7/5.1 | 40.8/50.3 | 1583/596 | 5.1/3.9 | |
| I-14 | ISB-02+ | ISB | 7,440 | 2262/984 | 5.8/4.5 | 5.7/34.5 | 2221/995 | 4.0/6.0 | |
| I-5 | ISB-02+ | ISB+ | 3,019,432 | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-7 | ISB-02+ | ISB+ | 13,475,763 | 1320/441 | 4.6/4.1 | 36.6/38.0 | 1342/464 | 5.8/7.3 | |
| Y-5 | Y-Small+ | Yeast | 30 ppm | 3,011,992 | 3340/2734 | 1.2/1.0 | 30.0/50.1 | 3209/2717 | 0.8/1.0 |
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | 16,480,315 | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
As expected, for smaller databases, TDA yielded more resulting PSMs. Fisher p-values were higher than 5% for all cases, which indicates that the FDR estimation via empirical FDR is reliable regardless of the database size.
Comparisons between searches with different portions of unidentifiable spectra
| Search# | Spectra | Database | PMTol | # spec | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| I-1 | ISB-02 | ISB | 4,966 | 2329/1009 | 5.8/4.4 | 10.9/30.9 | 2279/1024 | 3.9/5.7 | |
| I-14 | ISB-02+ | ISB | 11,285 | 2262/984 | 5.8/4.5 | 5.7/34.5 | 2221/995 | 4.0/6.0 | |
| I-3 | ISB-02 | ISB+ | 4,966 | 1578/602 | 4.7/5.1 | 40.8/50.3 | 1583/596 | 5.1/3.9 | |
| I-5 | ISB-02+ | ISB+ | 11,285 | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-15 | ISB-02+ | ISB+ | 24,948 | 1367/531 | 4.2/5.3 | 33.0/34.5 | 1393/518 | 5.5/4.2 | |
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | 16,077 | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
| Y-6 | Y-Small+ | Yeast+ | 30 ppm | 29,740 | 2238/1913 | 1.1/1.4 | 44.2/14.7 | 2208/1629 | 0.9/0.7 |
Adding unidentifiable spectra reduces the number of positive PSMs, but does not change the accuracy of FDR estimations significantly.
Comparison between searches with strict and loose parent mass tolerance.
| Search# | Spectra | Database | PMTol | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | ||||||
| I-1 | ISB-02 | ISB | 2329/1009 | 5.8/4.4 | 10.9/30.9 | 2279/1024 | 3.9/5.7 | |
| I-16 | ISB-02 | ISB | 30 ppm | 2128/1009 | N/A1 | N/A | N/A | N/A |
| I-5 | ISB-02+ | ISB+ | 1490/569 | 5.0/5.8 | 50.2/31.2 | 1480/553 | 4.5/4.3 | |
| I-17 | ISB-02+ | ISB+ | 30 ppm | 1638/569 | 6.4/5.3 | 1570/565 | 3.9/4.8 | |
| I-7 | ISB-02+ | ISB+ | 1320/441 | 4.6/4.1 | 36.6/38.0 | 1342/464 | 5.8/7.3 | |
| I-18 | ISB-02+ | ISB+ | 30 ppm | 1358/463 | 3.2/5.1 | 1425/463 | 6.5/4.3 | |
As expected, when using strict parent mass tolerance more PSMs were identified (at the same factual FDR threshold) in most cases.
1 For the search I-16, the factual FDR is not available because no dummy element is used.
Comparison between searches with differently normalized scoring functions
| Search# | Spectra | Database | PMTol | Score | EmpiricalFDR fixed | FactFDR fixed | |||
|---|---|---|---|---|---|---|---|---|---|
| FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| I-1 | ISB-02 | ISB | SpecProb1 | 2329 | 5.8 | 10.9 | 2279 | 4.4 | |
| I-19 | ISB-02 | ISB | MSGFRaw2 | 2079 | 4.6 | 36.5 | 2079 | 4.6 | |
| I-5 | ISB-02+ | ISB+ | SpecProb | 1490 | 5.0 | 50.2 | 1480 | 4.5 | |
| I-20 | ISB-02+ | ISB+ | MSGFRaw | 1272 | 5.7 | 25.2 | 1210 | 4.5 | |
| I-7 | ISB-02+ | ISB+ | SpecProb | 1320 | 4.6 | 36.6 | 1342 | 5.8 | |
| I-21 | ISB-02+ | ISB+ | MSGFRaw | 987 | 3.9 | 37.3 | 1064 | 6.1 | |
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | SpecProb | 2574 | 1.0 | 50.1 | 2588 | 1.0 |
| Y-7 | Y-Small+ | Yeast+ | 30 ppm | MSGFRaw | 1215 | 1.9 | 861 | 0.3 | |
The spectral probability can be considered simply as "better normalized" score of the MS-GF score for this experiment [23]. Using the well-normalized score (i.e., the spectral probability) always produces substantially more resulting PSMs, with higher gains for larger databases. Furthermore, as in the search Y-7, the TDA-determined empirical FDR tended to be more accurate when well-normalized score was used.
1Spectral probability was used to compute the FDR; 2MS-GF score was used to compute the FDR.
Comparison between peptide-level factual FDR and PSM-level FDR
| EmpiricalFDR fixed | |||||||
|---|---|---|---|---|---|---|---|
| Search# | Spectra | Database | PMTol | # peptides1 | FactPepFDR(%)2 | p-value(%) | |
| I-5 | ISB-02+ | ISB+ | 1490/569 | 600/262 | 12.0/11.8 | ||
| I-22 | ISB-All+ | ISB+ | 13441/5086 | 1375/538 | 38.6/38.1 | ||
| I-14 | ISB-02+ | ISB | 2262/984 | 815/361 | 13.1/10.5 | ||
| I-23 | ISB-All+ | ISB | 19501/8497 | 1628/556 | 42.8/39.6 | ||
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | 2574/1988 | 2355/1841 | 1.1/1.4 | 37.8/15.8 |
| Y-8 | Y-All+ | Yeast+ | 30 ppm | 9005/6269 | 3567/2640 | 2.4/2.0 | |
| Y-5 | Y-Small+ | Yeast | 30 ppm | 3340/2734 | 3033/2515 | 1.2/1.0 | 19.1/54.5 |
| Y-9 | Y-All+ | Yeast | 30 ppm | 11151/8969 | 4341/3582 | 2.3/2.3 | |
Score thresholds were determined using PSM-level FDR thresholds and used to calculate factual peptide-level FDRs. The results illustrate that PSM-level empirical FDR underestimates peptide-level FDR significantly (e.g., the searches I-22 and I-23).
1Number of distinct peptides; 2Factual peptide-level empirical FDR.
Comparison between peptide-level factual FDR and peptide-level FDR
| EmpiricalPepFDR1 fixed | FactPepFDR fixed | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Search# | Spectra | Database | PMTol | # peptides1 | FactPepFDR(%) | p-value(%) | # peptides | EmpiricalPepFDR(%) | ||
| I-5 | ISB-02+ | ISB+ | 1340/498 | 532/234 | 5.3/6.4 | 45.0/22.6 | 1335/479 | 529/226 | 4.9/4.4 | |
| I-22 | ISB-All+ | ISB+ | 10245/3688 | 758/304 | 6.6/6.6 | 7.0/ | 9448/3596 | 696/292 | 2.4/4.1 | |
| I-14 | ISB-02+ | ISB | 2088/907 | 727/333 | 6.7/5.1 | 10.4/50.3 | 1994/900 | 693/332 | 2.9/4.8 | |
| I-23 | ISB-All+ | ISB | 16602/6676 | 1083/416 | 9.2/8.9 | 15663/6203 | 1015/385 | 1.7/1.3 | ||
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | 2574/1963 | 2355/1818 | 1.1/1.3 | 38.7/17.6 | 2556/1759 | 2339/1636 | 0.9/0.6 |
| Y-8 | Y-All+ | Yeast+ | 30 ppm | 7867/5068 | 3142/2201 | 1.3/1.0 | 6.1/50.1 | 7121/5068 | 2849/2201 | 0.4/1.0 |
| Y-5 | Y-Small+ | Yeast | 30 ppm | 3209/2666 | 2916/2455 | 1.0/0.9 | 44.7/50.1 | 3209/2734 | 2916/2515 | 0.9/1.0 |
| Y-9 | Y-All+ | Yeast | 30 ppm | 10005/7309 | 3885/2987 | 1.0/0.9 | 50.0/44.9 | 10005/7309 | 3885/2987 | 0.9/0.9 |
Score thresholds were determined using empirical/factual peptide-level FDR and used to calculate factual/empirical FDRs. For the searches I-5, I-22, I-14, and I-23, the peptide-level FDR thresholds were set to 5%, and for the remaining searches they were set to 1%. The search I-23 illustrates the difficulty of enforcing peptide-level FDR when searching small databases.
1Empirical peptide-level FDR.
Comparison between the single-pass search (the search Y-1) and various two-pass search methods (the searches Y-10 to Y-13)
| Search# | Spectra | Database | PMTol | MSR2 | EmpiricalFDR fixed | FactFDR fixed | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| IsTwoPass | 2th decoy1 | FactFDR(%) | p-value(%) | EmpiricalFDR(%) | |||||||
| Y-1 | Y-Small+ | Yeast+ | 30 ppm | No | Rev | N/A | 2574/1988 | 1.0/1.3 | 50.1/22.7 | 2588/1759 | 1.0/0.5 |
| Y-10 | Y-Small+ | Yeast+ | 30 ppm | Yes | Trad3 | No | 5361/5744 | 15.9/20.1 | 3260/2655 | 0.6/0.3 | |
| Y-11 | Y-Small+ | Yeast+ | 30 ppm | Yes | Trad | Yes | 4114/3925 | 7.3/10.1 | 3102/2320 | 0.9/0.5 | |
| Y-12 | Y-Small+ | Yeast+ | 30 ppm | Yes | BK4 | No | 3529/3089 | 1.2/1.0 | 24.9/45.0 | 3262/3074 | 0.7/0.9 |
| Y-13 | Y-Small+ | Yeast+ | 30 ppm | Yes | BK | Yes | 3137/2514 | 1.1/1.1 | 40.1/39.3 | 3103/2521 | 1.0/0.8 |
For the searches Y-10 and Y-11, the traditional second pass decoy database was used to estimate FDR (see text). For the searches Y-12 and Y-13, the decoy database proposed by Bern et al. [25] was used. Also, for the searches Y-11 and Y-13, the matched spectrum removal (MSR) step was used.
Low Fisher p-values in Y-10 and Y-11 illustrate that using the traditional second pass decoy database results in significant underestimation of the true FDR.
1The decoy database for the second pass search; 2Whether the matched spectrum removal step was used; 3The traditional decoy database; 4The BK decoy database.