Literature DB >> 25505153

mzDB: a file format using multiple indexing strategies for the efficient analysis of large LC-MS/MS and SWATH-MS data sets.

David Bouyssié1, Marc Dubois2, Sara Nasso3, Anne Gonzalez de Peredo2, Odile Burlet-Schiltz2, Ruedi Aebersold4, Bernard Monsarrat2.   

Abstract

The analysis and management of MS data, especially those generated by data independent MS acquisition, exemplified by SWATH-MS, pose significant challenges for proteomics bioinformatics. The large size and vast amount of information inherent to these data sets need to be properly structured to enable an efficient and straightforward extraction of the signals used to identify specific target peptides. Standard XML based formats are not well suited to large MS data files, for example, those generated by SWATH-MS, and compromise high-throughput data processing and storing. We developed mzDB, an efficient file format for large MS data sets. It relies on the SQLite software library and consists of a standardized and portable server-less single-file database. An optimized 3D indexing approach is adopted, where the LC-MS coordinates (retention time and m/z), along with the precursor m/z for SWATH-MS data, are used to query the database for data extraction. In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.
© 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25505153      PMCID: PMC4349994          DOI: 10.1074/mcp.O114.039115

Source DB:  PubMed          Journal:  Mol Cell Proteomics        ISSN: 1535-9476            Impact factor:   5.911


  38 in total

1.  Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry.

Authors:  Xiao-Jun Li; Hui Zhang; Jeffrey A Ranish; Ruedi Aebersold
Journal:  Anal Chem       Date:  2003-12-01       Impact factor: 6.986

Review 2.  What is mzXML good for?

Authors:  Simon M Lin; Lihua Zhu; Andrew Q Winter; Maciek Sasinowski; Warren A Kibbe
Journal:  Expert Rev Proteomics       Date:  2005-12       Impact factor: 3.940

3.  A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.

Authors:  Matthew Bellew; Marc Coram; Matthew Fitzgibbon; Mark Igra; Tim Randolph; Pei Wang; Damon May; Jimmy Eng; Ruihua Fang; Chenwei Lin; Jinzhi Chen; David Goodlett; Jeffrey Whiteaker; Amanda Paulovich; Martin McIntosh
Journal:  Bioinformatics       Date:  2006-06-09       Impact factor: 6.937

4.  PEPPeR, a platform for experimental proteomic pattern recognition.

Authors:  Jacob D Jaffe; D R Mani; Kyriacos C Leptos; George M Church; Michael A Gillette; Steven A Carr
Journal:  Mol Cell Proteomics       Date:  2006-07-19       Impact factor: 5.911

5.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors:  Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal:  Anal Chem       Date:  2006-02-01       Impact factor: 6.986

6.  TOPP--the OpenMS proteomics pipeline.

Authors:  Oliver Kohlbacher; Knut Reinert; Clemens Gröpl; Eva Lange; Nico Pfeifer; Ole Schulz-Trieglaff; Marc Sturm
Journal:  Bioinformatics       Date:  2007-01-15       Impact factor: 6.937

Review 7.  Quantitative mass spectrometry in proteomics: a critical review.

Authors:  Marcus Bantscheff; Markus Schirle; Gavain Sweetman; Jens Rick; Bernhard Kuster
Journal:  Anal Bioanal Chem       Date:  2007-08-01       Impact factor: 4.142

8.  Mascot file parsing and quantification (MFPaQ), a new software to parse, validate, and quantify proteomics data generated by ICAT and SILAC mass spectrometric analyses: application to the proteomics study of membrane proteins from primary human endothelial cells.

Authors:  David Bouyssié; Anne Gonzalez de Peredo; Emmanuelle Mouton; Renaud Albigot; Lucie Roussel; Nathalie Ortega; Corinne Cayrol; Odile Burlet-Schiltz; Jean-Philippe Girard; Bernard Monsarrat
Journal:  Mol Cell Proteomics       Date:  2007-05-28       Impact factor: 5.911

9.  Processing methods for differential analysis of LC/MS profile data.

Authors:  Mikko Katajamaa; Matej Oresic
Journal:  BMC Bioinformatics       Date:  2005-07-18       Impact factor: 3.169

10.  A common open representation of mass spectrometry data and its application to proteomics research.

Authors:  Patrick G A Pedrioli; Jimmy K Eng; Robert Hubley; Mathijs Vogelzang; Eric W Deutsch; Brian Raught; Brian Pratt; Erik Nilsson; Ruth H Angeletti; Rolf Apweiler; Kei Cheung; Catherine E Costello; Henning Hermjakob; Sequin Huang; Randall K Julian; Eugene Kapp; Mark E McComb; Stephen G Oliver; Gilbert Omenn; Norman W Paton; Richard Simpson; Richard Smith; Chris F Taylor; Weimin Zhu; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2004-11       Impact factor: 54.908

View more
  8 in total

1.  StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio.

Authors:  Jinyin Wang; Miaoshan Lu; Ruimin Wang; Shaowei An; Cong Xie; Changbin Yu
Journal:  Sci Rep       Date:  2022-03-30       Impact factor: 4.996

2.  A Web Service Framework for Interactive Analysis of Metabolomics Data.

Authors:  Yaroslav Lyutvinskiy; Jeramie D Watrous; Mohit Jain; Roland Nilsson
Journal:  Anal Chem       Date:  2017-05-17       Impact factor: 6.986

Review 3.  Expanding the Use of Spectral Libraries in Proteomics.

Authors:  Eric W Deutsch; Yasset Perez-Riverol; Robert J Chalkley; Mathias Wilhelm; Stephen Tate; Timo Sachsenberg; Mathias Walzer; Lukas Käll; Bernard Delanghe; Sebastian Böcker; Emma L Schymanski; Paul Wilmes; Viktoria Dorfer; Bernhard Kuster; Pieter-Jan Volders; Nico Jehmlich; Johannes P C Vissers; Dennis W Wolan; Ana Y Wang; Luis Mendoza; Jim Shofstahl; Andrew W Dowsey; Johannes Griss; Reza M Salek; Steffen Neumann; Pierre-Alain Binz; Henry Lam; Juan Antonio Vizcaíno; Nuno Bandeira; Hannes Röst
Journal:  J Proteome Res       Date:  2018-10-11       Impact factor: 4.466

4.  Fast, axis-agnostic, dynamically summarized storage and retrieval for mass spectrometry data.

Authors:  Kyle Handy; Jebediah Rosen; André Gillan; Rob Smith
Journal:  PLoS One       Date:  2017-11-15       Impact factor: 3.240

5.  Isoginkgetin derivative IP2 enhances the adaptive immune response against tumor antigens.

Authors:  Romain Darrigrand; Alison Pierson; Marine Rouillon; Dolor Renko; Mathilde Boulpicante; David Bouyssié; Emmanuelle Mouton-Barbosa; Julien Marcoux; Camille Garcia; Michael Ghosh; Mouad Alami; Sébastien Apcher
Journal:  Commun Biol       Date:  2021-03-01

6.  Aird: a computation-oriented mass spectrometry data format enables a higher compression ratio and less decoding time.

Authors:  Miaoshan Lu; Shaowei An; Ruimin Wang; Jinyin Wang; Changbin Yu
Journal:  BMC Bioinformatics       Date:  2022-01-12       Impact factor: 3.169

7.  mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements.

Authors:  Ranjeet S Bhamber; Andris Jankevics; Eric W Deutsch; Andrew R Jones; Andrew W Dowsey
Journal:  J Proteome Res       Date:  2020-10-29       Impact factor: 4.466

8.  Nuclear HMGB1 protects from nonalcoholic fatty liver disease through negative regulation of liver X receptor.

Authors:  Jean Personnaz; Enzo Piccolo; Alizée Dortignac; Jason S Iacovoni; Jérôme Mariette; Vincent Rocher; Arnaud Polizzi; Aurélie Batut; Simon Deleruyelle; Lucas Bourdens; Océane Delos; Lucie Combes-Soia; Romain Paccoud; Elsa Moreau; Frédéric Martins; Thomas Clouaire; Fadila Benhamed; Alexandra Montagner; Walter Wahli; Robert F Schwabe; Armelle Yart; Isabelle Castan-Laurell; Justine Bertrand-Michel; Odile Burlet-Schiltz; Catherine Postic; Pierre-Damien Denechaud; Cédric Moro; Gaelle Legube; Chih-Hao Lee; Hervé Guillou; Philippe Valet; Cédric Dray; Jean-Philippe Pradère
Journal:  Sci Adv       Date:  2022-03-25       Impact factor: 14.136

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.