Gelio Alves1, Yi-Kuo Yu1. 1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
Abstract
MOTIVATION: Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. RESULTS: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. AVAILABILITY AND IMPLEMENTATION: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
MOTIVATION: Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. RESULTS: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. AVAILABILITY AND IMPLEMENTATION: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Authors: Xiaoyu Yang; Vijay Dondeti; Rebecca Dezube; Dawn M Maynard; Lewis Y Geer; Jonathan Epstein; Xiongfong Chen; Sanford P Markey; Jeffrey A Kowalak Journal: J Proteome Res Date: 2004 Sep-Oct Impact factor: 4.466
Authors: Yi-Kuo Yu; E Michael Gertz; Richa Agarwala; Alejandro A Schäffer; Stephen F Altschul Journal: Nucleic Acids Res Date: 2006-10-26 Impact factor: 16.971
Authors: Matthew The; Fredrik Edfors; Yasset Perez-Riverol; Samuel H Payne; Michael R Hoopmann; Magnus Palmblad; Björn Forsström; Lukas Käll Journal: J Proteome Res Date: 2018-04-16 Impact factor: 4.466
Authors: Gelio Alves; Guanghui Wang; Aleksey Y Ogurtsov; Steven K Drake; Marjan Gucek; Anthony F Suffredini; David B Sacks; Yi-Kuo Yu Journal: J Am Soc Mass Spectrom Date: 2015-10-28 Impact factor: 3.109
Authors: Gelio Alves; Guanghui Wang; Aleksey Y Ogurtsov; Steven K Drake; Marjan Gucek; David B Sacks; Yi-Kuo Yu Journal: J Am Soc Mass Spectrom Date: 2018-06-05 Impact factor: 3.109
Authors: Gelio Alves; Aleksey Ogurtsov; Roger Karlsson; Daniel Jaén-Luchoro; Beatriz Piñeiro-Iglesias; Francisco Salvà-Serra; Björn Andersson; Edward R B Moore; Yi-Kuo Yu Journal: J Am Soc Mass Spectrom Date: 2022-05-02 Impact factor: 3.262