Literature DB >> 14534180

A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications.

Bingwen Lu1, Ting Chen.   

Abstract

MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.

Mesh:

Substances:

Year:  2003        PMID: 14534180     DOI: 10.1093/bioinformatics/btg1068

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Improving gene annotation using peptide mass spectrometry.

Authors:  Stephen Tanner; Zhouxin Shen; Julio Ng; Liliana Florea; Roderic Guigó; Steven P Briggs; Vineet Bafna
Journal:  Genome Res       Date:  2006-12-22       Impact factor: 9.043

2.  Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra.

Authors:  Sangtae Kim; Nitin Gupta; Nuno Bandeira; Pavel A Pevzner
Journal:  Mol Cell Proteomics       Date:  2008-08-14       Impact factor: 5.911

3.  A novel approach for untargeted post-translational modification identification using integer linear optimization and tandem mass spectrometry.

Authors:  Richard C Baliban; Peter A DiMaggio; Mariana D Plazas-Mayorca; Nicolas L Young; Benjamin A Garcia; Christodoulos A Floudas
Journal:  Mol Cell Proteomics       Date:  2010-01-26       Impact factor: 5.911

4.  ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.

Authors:  T Xu; S K Park; J D Venable; J A Wohlschlegel; J K Diedrich; D Cociorva; B Lu; L Liao; J Hewel; X Han; C C L Wong; B Fonslow; C Delahunty; Y Gao; H Shah; J R Yates
Journal:  J Proteomics       Date:  2015-07-11       Impact factor: 4.044

5.  PILOT_PROTEIN: identification of unmodified and modified proteins via high-resolution mass spectrometry and mixed-integer linear optimization.

Authors:  Richard C Baliban; Peter A Dimaggio; Mariana D Plazas-Mayorca; Benjamin A Garcia; Christodoulos A Floudas
Journal:  J Proteome Res       Date:  2012-07-26       Impact factor: 4.466

6.  Speeding up tandem mass spectrometry-based database searching by longest common prefix.

Authors:  Chen Zhou; Hao Chi; Le-Heng Wang; You Li; Yan-Jie Wu; Yan Fu; Rui-Xiang Sun; Si-Min He
Journal:  BMC Bioinformatics       Date:  2010-11-25       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.