Literature DB >> 17094248

An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry.

Haipeng Wang1, Yan Fu, Ruixiang Sun, Simin He, Rong Zeng, Wen Gao.   

Abstract

Tandem mass spectrometry (MS/MS) has become increasingly important and indispensable in high-throughput proteomics for identifying complex protein mixtures. Database searching is the standard method to accomplish this purpose. A key sub-routine, peptide identification, is used to generate a list of candidate peptides from a protein database according to an experimental MS/MS spectrum, and then validate these candidate peptides for protein identification. Although currently there are many algorithms for peptide identification, most of them either lack an effective validation module or only validate the first-ranked peptide, thus leading to a low identification reliability or sensitivity. This paper proposes a new algorithm, named pepReap, to overcome the above drawbacks. It consists of a two-layered scoring scheme based on machine learning. The first layer is a rough scoring function which uses some simple and heuristic factors to measure the degree of the matches between an experimental MS/MS spectrum and the candidate peptides; thus a ranked list of candidate peptides is generated at a relatively low computational cost. The second layer is a fine scoring function which re-ranks the candidate peptides generated in the first layer and determines which one among them is the true positive. The fine scoring function was designed based on support vector machines (SVMs) using more comprehensive factors, such as the correlations between ions, the mass matching errors of fragment and peptide ions, etc. Consequently, the SVM classifier serves as not only a scorer but also a validation module. Experimental comparison with the popular SEQUEST algorithm coupled with threshold validation criteria on a reported dataset demonstrates that the pepReap algorithm achieves higher performance in terms of identification sensitivity with comparable precision.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17094248

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  4 in total

Review 1.  Penalized feature selection and classification in bioinformatics.

Authors:  Shuangge Ma; Jian Huang
Journal:  Brief Bioinform       Date:  2008-06-18       Impact factor: 11.622

Review 2.  Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.

Authors:  Bobbie-Jo M Webb-Robertson; Holli K Wiberg; Melissa M Matzke; Joseph N Brown; Jing Wang; Jason E McDermott; Richard D Smith; Karin D Rodland; Thomas O Metz; Joel G Pounds; Katrina M Waters
Journal:  J Proteome Res       Date:  2015-04-22       Impact factor: 4.466

3.  Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.

Authors:  Bobbie-Jo M Webb-Robertson; Melissa M Matzke; Thomas O Metz; Jason E McDermott; Hyunjoo Walker; Karin D Rodland; Joel G Pounds; Katrina M Waters
Journal:  Biotechniques       Date:  2013-03       Impact factor: 1.993

Review 4.  Computational methods for protein identification from mass spectrometry data.

Authors:  Leo McHugh; Jonathan W Arthur
Journal:  PLoS Comput Biol       Date:  2008-02       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.