Literature DB >> 23471471

An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data.

Fahad Saeed1, Trairak Pisitkun, Mark A Knepper, Jason D Hoffert.   

Abstract

High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra.

Entities:  

Keywords:  Clustering; Efficient Algorithms; Graph Theory; Mass spectrometry

Year:  2012        PMID: 23471471      PMCID: PMC3588597          DOI: 10.1109/BIBM.2012.6392738

Source DB:  PubMed          Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)        ISSN: 2156-1125


  12 in total

1.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility.

Authors:  David L Tabb; Michael J MacCoss; Christine C Wu; Scott D Anderson; John R Yates
Journal:  Anal Chem       Date:  2003-05-15       Impact factor: 6.986

2.  Improving large-scale proteomics by clustering of mass spectrometry data.

Authors:  Ilan Beer; Eilon Barnea; Tamar Ziv; Arie Admon
Journal:  Proteomics       Date:  2004-04       Impact factor: 3.984

3.  Dynamics of the G protein-coupled vasopressin V2 receptor signaling network revealed by quantitative phosphoproteomics.

Authors:  Jason D Hoffert; Trairak Pisitkun; Fahad Saeed; Jae H Song; Chung-Lin Chou; Mark A Knepper
Journal:  Mol Cell Proteomics       Date:  2011-11-21       Impact factor: 5.911

4.  Tracing cancer networks with phosphoproteomics.

Authors:  David B Solit; Ingo K Mellinghoff
Journal:  Nat Biotechnol       Date:  2010-10       Impact factor: 54.908

5.  MS2Grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra.

Authors:  David L Tabb; Melissa R Thompson; Gurusahai Khalsa-Moyers; Nathan C VerBerkmoes; W Hayes McDonald
Journal:  J Am Soc Mass Spectrom       Date:  2005-08       Impact factor: 3.109

6.  A fast coarse filtering method for peptide identification by mass spectrometry.

Authors:  Smriti R Ramakrishnan; Rui Mao; Aleksey A Nakorchevskiy; John T Prince; Willard S Willard; Weijia Xu; Edward M Marcotte; Daniel P Miranker
Journal:  Bioinformatics       Date:  2006-04-03       Impact factor: 6.937

7.  Clustering millions of tandem mass spectra.

Authors:  Ari M Frank; Nuno Bandeira; Zhouxin Shen; Stephen Tanner; Steven P Briggs; Richard D Smith; Pavel A Pevzner
Journal:  J Proteome Res       Date:  2007-12-08       Impact factor: 4.466

8.  Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry.

Authors:  Henrik Molina; David M Horn; Ning Tang; Suresh Mathivanan; Akhilesh Pandey
Journal:  Proc Natl Acad Sci U S A       Date:  2007-02-07       Impact factor: 11.205

9.  Glycoprotein capture and quantitative phosphoproteomics indicate coordinated regulation of cell migration upon lysophosphatidic acid stimulation.

Authors:  Nina Mäusbacher; Thiemo B Schreiber; Henrik Daub
Journal:  Mol Cell Proteomics       Date:  2010-07-16       Impact factor: 5.911

10.  PhosSA: Fast and accurate phosphorylation site assignment algorithm for mass spectrometry data.

Authors:  Fahad Saeed; Trairak Pisitkun; Jason D Hoffert; Sara Rashidian; Guanghui Wang; Marjan Gucek; Mark A Knepper
Journal:  Proteome Sci       Date:  2013-11-07       Impact factor: 2.480

View more
  4 in total

1.  Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures.

Authors:  Fahad Saeed; Jason D Hoffert; Trairak Pisitkun; Mark A Knepper
Journal:  Netw Model Anal Health Inform Bioinform       Date:  2014-04

2.  CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling.

Authors:  Fahad Saeed; Jason D Hoffert; Mark A Knepper
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2014 Jan-Feb       Impact factor: 3.710

3.  An Efficient Dynamic Programming Algorithm for Phosphorylation Site Assignment of Large-Scale Mass Spectrometry Data.

Authors:  Fahad Saeed; Trairak Pisitkun; Jason D Hoffert; Guanghui Wang; Marjan Gucek; Mark A Knepper
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2012-10-04

4.  PhosSA: Fast and accurate phosphorylation site assignment algorithm for mass spectrometry data.

Authors:  Fahad Saeed; Trairak Pisitkun; Jason D Hoffert; Sara Rashidian; Guanghui Wang; Marjan Gucek; Mark A Knepper
Journal:  Proteome Sci       Date:  2013-11-07       Impact factor: 2.480

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.