Literature DB >> 19061407

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

Guanghui Wang1, Wells W Wu, Zheng Zhang, Shyama Masilamani, Rong-Fong Shen.   

Abstract

The potential of getting a significant number of false positives (FPs) in peptide-spectrum matches (PSMs) obtained by proteomic database search has been well-recognized. Among the attempts to assess FPs, the concomitant use of target and decoy databases is widely practiced. By adjusting filtering criteria, FPs and false discovery <span class="Species">rate (FDR) can be controlled at a desired level. Although the target-decoy approach is gaining in popularity, subtle differences in decoy construction (e.g., reversing vs stochastic methods), rate calculation (e.g., total vs unique PSMs), or searching (separate vs composite) do exist among various implementations. In the present study, we evaluated the effects of these differences on FP and FDR estimations using a rat kidney protein sample and the SEQUEST search engine as an example. On the effects of decoy construction, we found that, when a single scoring filter (XCorr) was used, stochastic methods generated a higher estimation of FPs and FDR than sequence reversing methods, likely due to an increase in unique peptides. This higher estimation could largely be attenuated by creating decoy databases similar in effective size but not by a simple normalization with a unique-peptide coefficient. When multiple filters were applied, the differences seen between reversing and stochastic methods significantly diminished, suggesting multiple filterings reduce the dependency on how a decoy is constructed. For a fixed set of filtering criteria, FDR and FPs estimated by using unique PSMs were almost twice those using total PSMs. The higher estimation seemed to be dependent on data acquisition setup. As to the differences between performing separate or composite searches, in general, FDR estimated from the separate search was about three times that from the composite search. The degree of difference gradually decreased as the filtering criteria became more stringent. Paradoxically, the estimated true positives in separate search were higher when multiple filters were used. By analyzing a standard protein mixture, we demonstrated that the higher estimation of FDR and FPs in the separate search likely reflected an overestimation, which could be corrected with a simple merging procedure. Our study illustrates the relative merits of different implementations of the target-decoy strategy, which should be worth contemplating when large-scale proteomic biomarker discovery is to be attempted.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19061407      PMCID: PMC2653784          DOI: 10.1021/ac801664q

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


  22 in total

1.  Qscore: an algorithm for evaluating SEQUEST database search results.

Authors:  Roger E Moore; Mary K Young; Terry D Lee
Journal:  J Am Soc Mass Spectrom       Date:  2002-04       Impact factor: 3.109

2.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.

Authors:  Rovshan G Sadygov; John R Yates
Journal:  Anal Chem       Date:  2003-08-01       Impact factor: 6.986

3.  Global analysis of the cortical neuron proteome.

Authors:  Li-Rong Yu; Thomas P Conrads; Takuma Uo; Yoshito Kinoshita; Richard S Morrison; David A Lucas; King C Chan; Josip Blonder; Haleem J Issaq; Timothy D Veenstra
Journal:  Mol Cell Proteomics       Date:  2004-06-30       Impact factor: 5.911

4.  Open mass spectrometry search algorithm.

Authors:  Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal:  J Proteome Res       Date:  2004 Sep-Oct       Impact factor: 4.466

5.  Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy.

Authors:  Edward L Huttlin; Adrian D Hegeman; Amy C Harms; Michael R Sussman
Journal:  J Proteome Res       Date:  2007-01       Impact factor: 4.466

6.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization.

Authors:  Sean A Beausoleil; Judit Villén; Scott A Gerber; John Rush; Steven P Gygi
Journal:  Nat Biotechnol       Date:  2006-09-10       Impact factor: 54.908

7.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.

Authors:  Junmin Peng; Joshua E Elias; Carson C Thoreen; Larry J Licklider; Steven P Gygi
Journal:  J Proteome Res       Date:  2003 Jan-Feb       Impact factor: 4.466

Review 8.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases.

Authors:  Lukas Käll; John D Storey; Michael J MacCoss; William Stafford Noble
Journal:  J Proteome Res       Date:  2007-12-08       Impact factor: 4.466

9.  Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy.

Authors:  Brian M Balgley; Tom Laudeman; Li Yang; Tao Song; Cheng S Lee
Journal:  Mol Cell Proteomics       Date:  2007-05-28       Impact factor: 5.911

10.  Random sequences.

Authors:  W M Fitch
Journal:  J Mol Biol       Date:  1983-01-15       Impact factor: 5.469

View more
  27 in total

1.  Two-dimensional target decoy strategy for shotgun proteomics.

Authors:  Marshall W Bern; Yong J Kil
Journal:  J Proteome Res       Date:  2011-11-07       Impact factor: 4.466

2.  iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.

Authors:  David Shteynberg; Eric W Deutsch; Henry Lam; Jimmy K Eng; Zhi Sun; Natalie Tasman; Luis Mendoza; Robert L Moritz; Ruedi Aebersold; Alexey I Nesvizhskii
Journal:  Mol Cell Proteomics       Date:  2011-08-29       Impact factor: 5.911

3.  A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.

Authors:  Mikhail M Savitski; Mathias Wilhelm; Hannes Hahne; Bernhard Kuster; Marcus Bantscheff
Journal:  Mol Cell Proteomics       Date:  2015-05-17       Impact factor: 5.911

4.  An improved toolbox to unravel the plant cellular machinery by tandem affinity purification of Arabidopsis protein complexes.

Authors:  Jelle Van Leene; Dominique Eeckhout; Bernard Cannoot; Nancy De Winne; Geert Persiau; Eveline Van De Slijke; Leen Vercruysse; Maarten Dedecker; Aurine Verkest; Klaas Vandepoele; Lennart Martens; Erwin Witters; Kris Gevaert; Geert De Jaeger
Journal:  Nat Protoc       Date:  2014-12-18       Impact factor: 13.491

5.  Mass spectrometry-based protein identification with accurate statistical significance assignment.

Authors:  Gelio Alves; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2014-10-31       Impact factor: 6.937

Review 6.  A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

Authors:  Alexey I Nesvizhskii
Journal:  J Proteomics       Date:  2010-09-08       Impact factor: 4.044

7.  Proteomics of dense core secretory vesicles reveal distinct protein categories for secretion of neuroeffectors for cell-cell communication.

Authors:  Jill L Wegrzyn; Steven J Bark; Lydiane Funkelstein; Charles Mosier; Angel Yap; Parsa Kazemi-Esfarjani; Albert R La Spada; Christina Sigurdson; Daniel T O'Connor; Vivian Hook
Journal:  J Proteome Res       Date:  2010-10-01       Impact factor: 4.466

8.  Response of porcine intestinal in vitro organ culture tissues following exposure to Lactobacillus plantarum JC1 and Salmonella enterica serovar Typhimurium SL1344.

Authors:  J W Collins; N G Coldham; F J Salguero; W A Cooley; W R Newell; R A Rastall; G R Gibson; M J Woodward; R M La Ragione
Journal:  Appl Environ Microbiol       Date:  2010-07-16       Impact factor: 4.792

Review 9.  Metaproteomics of the human gut microbiota: Challenges and contributions to other OMICS.

Authors:  Ngom Issa Isaac; Decloquement Philippe; Armstrong Nicholas; Didier Raoult; Chabrière Eric
Journal:  Clin Mass Spectrom       Date:  2019-06-04

10.  PRESnovo: Prescreening Prior to de novo Sequencing to Improve Accuracy and Sensitivity of Neuropeptide Identification.

Authors:  Kellen DeLaney; Weifeng Cao; Yadi Ma; Mingming Ma; Yuzhuo Zhang; Lingjun Li
Journal:  J Am Soc Mass Spectrom       Date:  2020-04-26       Impact factor: 3.109

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.