Literature DB >> 18187442

Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification.

Jainab Khatun1, Eric Hamlett, Morgan C Giddings.   

Abstract

MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. MODEL: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match.
RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.

Mesh:

Substances:

Year:  2008        PMID: 18187442      PMCID: PMC2699941          DOI: 10.1093/bioinformatics/btn011

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  32 in total

1.  De novo peptide sequencing via tandem mass spectrometry.

Authors:  V Dancík; T A Addona; K R Clauser; J E Vath; P A Pevzner
Journal:  J Comput Biol       Date:  1999 Fall-Winter       Impact factor: 1.479

2.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database.

Authors:  V Bafna; N Edwards
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

3.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry.

Authors:  J A Taylor; R S Johnson
Journal:  Anal Chem       Date:  2001-06-01       Impact factor: 6.986

4.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.

Authors:  Rovshan G Sadygov; John R Yates
Journal:  Anal Chem       Date:  2003-08-01       Impact factor: 6.986

5.  Open mass spectrometry search algorithm.

Authors:  Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal:  J Proteome Res       Date:  2004 Sep-Oct       Impact factor: 4.466

6.  Shotgun protein sequencing by tandem mass spectra assembly.

Authors:  Nuno Bandeira; Haixu Tang; Vineet Bafna; Pavel Pevzner
Journal:  Anal Chem       Date:  2004-12-15       Impact factor: 6.986

7.  PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors:  Ari Frank; Pavel Pevzner
Journal:  Anal Chem       Date:  2005-02-15       Impact factor: 6.986

8.  Fragmentation characteristics of collision-induced dissociation in MALDI TOF/TOF mass spectrometry.

Authors:  Jainab Khatun; Kevin Ramkissoon; Morgan C Giddings
Journal:  Anal Chem       Date:  2007-03-17       Impact factor: 6.986

9.  The complete genome sequence of Escherichia coli K-12.

Authors:  F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao
Journal:  Science       Date:  1997-09-05       Impact factor: 47.728

10.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

Authors:  M Mann; M Wilm
Journal:  Anal Chem       Date:  1994-12-15       Impact factor: 6.986

View more
  5 in total

1.  Context-sensitive markov models for peptide scoring and identification from tandem mass spectrometry.

Authors:  Himanshu Grover; Garrick Wallstrom; Christine C Wu; Vanathi Gopalakrishnan
Journal:  OMICS       Date:  2013-01-05

2.  A user's guide to the encyclopedia of DNA elements (ENCODE).

Authors: 
Journal:  PLoS Biol       Date:  2011-04-19       Impact factor: 8.029

3.  ENCODE whole-genome data in the UCSC genome browser (2011 update).

Authors:  Brian J Raney; Melissa S Cline; Kate R Rosenbloom; Timothy R Dreszer; Katrina Learned; Galt P Barber; Laurence R Meyer; Cricket A Sloan; Venkat S Malladi; Krishna M Roskin; Bernard B Suh; Angie S Hinrichs; Hiram Clawson; Ann S Zweig; Vanessa Kirkup; Pauline A Fujita; Brooke Rhead; Kayla E Smith; Andy Pohl; Robert M Kuhn; Donna Karolchik; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2010-10-30       Impact factor: 16.971

4.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Authors:  H B Rao; F Zhu; G B Yang; Z R Li; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2011-05-23       Impact factor: 16.971

5.  Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

Authors:  Jainab Khatun; Yanbao Yu; John A Wrobel; Brian A Risk; Harsha P Gunawardena; Ashley Secrest; Wendy J Spitzer; Ling Xie; Li Wang; Xian Chen; Morgan C Giddings
Journal:  BMC Genomics       Date:  2013-02-28       Impact factor: 3.969

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.