Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification.

Literature DB >> 18187442

Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification.

Jainab Khatun¹, Eric Hamlett, Morgan C Giddings.

Abstract

MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. MODEL: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match.
RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.

Mesh：

Substances：
Peptides

Year: 2008 PMID： 18187442 PMCID： PMC2699941 DOI： 10.1093/bioinformatics/btn011

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

32 in total

1. De novo peptide sequencing via tandem mass spectrometry.

Authors: V Dancík; T A Addona; K R Clauser; J E Vath; P A Pevzner
Journal: J Comput Biol Date: 1999 Fall-Winter Impact factor: 1.479

2. SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database.

Authors: V Bafna; N Edwards
Journal: Bioinformatics Date: 2001 Impact factor: 6.937

3. Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry.

Authors: J A Taylor; R S Johnson
Journal: Anal Chem Date: 2001-06-01 Impact factor: 6.986

4. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.

Authors: Rovshan G Sadygov; John R Yates
Journal: Anal Chem Date: 2003-08-01 Impact factor: 6.986

5. Open mass spectrometry search algorithm.

Authors: Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal: J Proteome Res Date: 2004 Sep-Oct Impact factor: 4.466

6. Shotgun protein sequencing by tandem mass spectra assembly.

Authors: Nuno Bandeira; Haixu Tang; Vineet Bafna; Pavel Pevzner
Journal: Anal Chem Date: 2004-12-15 Impact factor: 6.986

7. PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors: Ari Frank; Pavel Pevzner
Journal: Anal Chem Date: 2005-02-15 Impact factor: 6.986

8. Fragmentation characteristics of collision-induced dissociation in MALDI TOF/TOF mass spectrometry.

Authors: Jainab Khatun; Kevin Ramkissoon; Morgan C Giddings
Journal: Anal Chem Date: 2007-03-17 Impact factor: 6.986

9. The complete genome sequence of Escherichia coli K-12.

Authors: F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao
Journal: Science Date: 1997-09-05 Impact factor: 47.728

10. Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

Authors: M Mann; M Wilm
Journal: Anal Chem Date: 1994-12-15 Impact factor: 6.986

5 in total

1. Context-sensitive markov models for peptide scoring and identification from tandem mass spectrometry.

Authors: Himanshu Grover; Garrick Wallstrom; Christine C Wu; Vanathi Gopalakrishnan
Journal: OMICS Date: 2013-01-05

2. A user's guide to the encyclopedia of DNA elements (ENCODE).

Authors:
Journal: PLoS Biol Date: 2011-04-19 Impact factor: 8.029

3. ENCODE whole-genome data in the UCSC genome browser (2011 update).

Authors: Brian J Raney; Melissa S Cline; Kate R Rosenbloom; Timothy R Dreszer; Katrina Learned; Galt P Barber; Laurence R Meyer; Cricket A Sloan; Venkat S Malladi; Krishna M Roskin; Bernard B Suh; Angie S Hinrichs; Hiram Clawson; Ann S Zweig; Vanessa Kirkup; Pauline A Fujita; Brooke Rhead; Kayla E Smith; Andy Pohl; Robert M Kuhn; Donna Karolchik; David Haussler; W James Kent
Journal: Nucleic Acids Res Date: 2010-10-30 Impact factor: 16.971

4. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Authors: H B Rao; F Zhu; G B Yang; Z R Li; Y Z Chen
Journal: Nucleic Acids Res Date: 2011-05-23 Impact factor: 16.971

5. Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

Authors: Jainab Khatun; Yanbao Yu; John A Wrobel; Brian A Risk; Harsha P Gunawardena; Ashley Secrest; Wendy J Spitzer; Ling Xie; Li Wang; Xian Chen; Morgan C Giddings
Journal: BMC Genomics Date: 2013-02-28 Impact factor: 3.969

5 in total