Literature DB >> 21766049

Comparison of Programmatic Approaches for Efficient Accessing to mzML Files.

Miroslaw J Gilski1, Rovshan G Sadygov.   

Abstract

The Human Proteome Organization (HUPO) Proteomics Standard Initiative has been tasked with developing file formats for storing raw data (mzML) and the results of spectral processing (protein identification and quantification) from proteomics experiments (mzIndentML). In order to fully characterize complex experiments, special data types have been designed. Standardized file formats will promote visualization, validation and dissemination of data independent of the vendor-specific binary data storage files. Innovative programmatic solutions for robust and efficient data access to standardized file formats will contribute to more rapid wide-scale acceptance of these file formats by the proteomics community.In this work, we compare algorithms for accessing spectral data in the mzML file format. As an XML file, mzML files allow efficient parsing of data structures when using XML-specific class types. These classes provide only sequential access to files. However, random access to spectral data is needed in many algorithmic applications for processing proteomics datasets. Here, we demonstrate implementation of memory streams to convert a sequential access into random access. Our application preserves the elegant XML parsing capabilities. Benchmarking file access times in sequential and random access modes show that while for small number of spectra the random access is more time efficient, when retrieving large number of spectra sequential access becomes more efficient. We also provide comparisons to other file accessing methods from academia and industry.

Entities:  

Year:  2011        PMID: 21766049      PMCID: PMC3135311          DOI: 10.4172/2153-0602.1000109

Source DB:  PubMed          Journal:  J Data Mining Genomics Proteomics


  27 in total

1.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors:  Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal:  Anal Chem       Date:  2002-10-15       Impact factor: 6.986

2.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model.

Authors:  David L Tabb; Anita Saraf; John R Yates
Journal:  Anal Chem       Date:  2003-12-01       Impact factor: 6.986

3.  Open mass spectrometry search algorithm.

Authors:  Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal:  J Proteome Res       Date:  2004 Sep-Oct       Impact factor: 4.466

4.  An efficient data format for mass spectrometry-based proteomics.

Authors:  Anuj R Shah; Jennifer Davidson; Matthew E Monroe; Anoop M Mayampurath; William F Danielson; Yan Shi; Aaron C Robinson; Brian H Clowers; Mikhail E Belov; Gordon A Anderson; Richard D Smith
Journal:  J Am Soc Mass Spectrom       Date:  2010-07-07       Impact factor: 3.109

5.  Open source system for analyzing, validating, and storing protein identification data.

Authors:  Robertson Craig; John P Cortens; Ronald C Beavis
Journal:  J Proteome Res       Date:  2004 Nov-Dec       Impact factor: 4.466

6.  A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.

Authors:  Matthew Bellew; Marc Coram; Matthew Fitzgibbon; Mark Igra; Tim Randolph; Pei Wang; Damon May; Jimmy Eng; Ruihua Fang; Chenwei Lin; Jinzhi Chen; David Goodlett; Jeffrey Whiteaker; Amanda Paulovich; Martin McIntosh
Journal:  Bioinformatics       Date:  2006-06-09       Impact factor: 6.937

7.  Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO-Proteomics Standards Initiative April 23-25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France.

Authors:  Sandra Orchard; Luisa Montechi-Palazzi; Eric W Deutsch; Pierre-Alain Binz; Andrew R Jones; Norman Paton; Angel Pizarro; David M Creasy; Jérôme Wojcik; Henning Hermjakob
Journal:  Proteomics       Date:  2007-10       Impact factor: 3.984

Review 8.  The minimum information about a proteomics experiment (MIAPE).

Authors:  Chris F Taylor; Norman W Paton; Kathryn S Lilley; Pierre-Alain Binz; Randall K Julian; Andrew R Jones; Weimin Zhu; Rolf Apweiler; Ruedi Aebersold; Eric W Deutsch; Michael J Dunn; Albert J R Heck; Alexander Leitner; Marcus Macht; Matthias Mann; Lennart Martens; Thomas A Neubert; Scott D Patterson; Peipei Ping; Sean L Seymour; Puneet Souda; Akira Tsugita; Joel Vandekerckhove; Thomas M Vondriska; Julian P Whitelegge; Marc R Wilkins; Ioannnis Xenarios; John R Yates; Henning Hermjakob
Journal:  Nat Biotechnol       Date:  2007-08       Impact factor: 54.908

9.  jmzML, an open-source Java API for mzML, the PSI standard for MS data.

Authors:  Richard G Côté; Florian Reisinger; Lennart Martens
Journal:  Proteomics       Date:  2010-04       Impact factor: 3.984

10.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

Authors:  M Mann; M Wilm
Journal:  Anal Chem       Date:  1994-12-15       Impact factor: 6.986

View more
  3 in total

1.  Current Bioinformatics Challenges in Proteome Dynamics using Heavy Water-based Metabolic Labeling.

Authors:  Takhar Kasumov; Belinda Willard; Rovshan G Sadygov
Journal:  J Data Mining Genomics Proteomics       Date:  2014-02

2.  Use of theoretical peptide distributions in phosphoproteome analysis.

Authors:  Mridul Kalita; Takhar Kasumov; Allan R Brasier; Rovshan G Sadygov
Journal:  J Proteome Res       Date:  2013-06-19       Impact factor: 4.466

3.  Use of singular value decomposition analysis to differentiate phosphorylated precursors in strong cation exchange fractions.

Authors:  Rovshan G Sadygov
Journal:  Electrophoresis       Date:  2014-07-24       Impact factor: 3.535

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.