Literature DB >> 32488104

Toffee - a highly efficient, lossless file format for DIA-MS.

Brett Tully1.   

Abstract

The closed nature of vendor file formats in mass spectrometry is a significant barrier to progress in developing robust bioinformatics software. In response, the community has developed the open mzML format, implemented in XML and based on controlled vocabularies. Widely adopted, mzML is an important step forward; however, it suffers from two challenges that are particularly apparent as the field moves to high-throughput proteomics: large increase in file size, and a largely sequential I/O access pattern. Described here is 'toffee', an open, random I/O format backed by HDF5, with lossless compression that gives file sizes similar to the original vendor format and can be reconverted back to mzML without penalty. It is shown that mzML and toffee are equivalent when processing data using OpenSWATH algorithms, in additional to novel applications that are enabled by new data access patterns. For instance, a peptide-centric deep-learning pipeline for peptide identification is proposed. Documentation and examples are available at https://toffee.readthedocs.io, and all code is MIT licensed at https://bitbucket.org/cmriprocan/toffee.

Entities:  

Year:  2020        PMID: 32488104      PMCID: PMC7265431          DOI: 10.1038/s41598-020-65015-y

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  5 in total

1.  Pyteomics 4.0: Five Years of Development of a Python Proteomics Framework.

Authors:  Lev I Levitsky; Joshua A Klein; Mark V Ivanov; Mikhail V Gorshkov
Journal:  J Proteome Res       Date:  2019-01-08       Impact factor: 4.466

2.  Pyteomics--a Python framework for exploratory data analysis and rapid software prototyping in proteomics.

Authors:  Anton A Goloborodko; Lev I Levitsky; Mark V Ivanov; Mikhail V Gorshkov
Journal:  J Am Soc Mass Spectrom       Date:  2013-01-05       Impact factor: 3.109

3.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data.

Authors:  Hannes L Röst; George Rosenberger; Pedro Navarro; Ludovic Gillet; Saša M Miladinović; Olga T Schubert; Witold Wolski; Ben C Collins; Johan Malmström; Lars Malmström; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

4.  A cross-platform toolkit for mass spectrometry and proteomics.

Authors:  Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal:  Nat Biotechnol       Date:  2012-10       Impact factor: 54.908

5.  TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics.

Authors:  Hannes L Röst; Yansheng Liu; Giuseppe D'Agostino; Matteo Zanella; Pedro Navarro; George Rosenberger; Ben C Collins; Ludovic Gillet; Giuseppe Testa; Lars Malmström; Ruedi Aebersold
Journal:  Nat Methods       Date:  2016-08-01       Impact factor: 28.547

  5 in total
  4 in total

1.  StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio.

Authors:  Jinyin Wang; Miaoshan Lu; Ruimin Wang; Shaowei An; Cong Xie; Changbin Yu
Journal:  Sci Rep       Date:  2022-03-30       Impact factor: 4.996

2.  Aird: a computation-oriented mass spectrometry data format enables a higher compression ratio and less decoding time.

Authors:  Miaoshan Lu; Shaowei An; Ruimin Wang; Jinyin Wang; Changbin Yu
Journal:  BMC Bioinformatics       Date:  2022-01-12       Impact factor: 3.169

3.  Pan-cancer proteomic map of 949 human cell lines.

Authors:  Emanuel Gonçalves; Rebecca C Poulos; Zhaoxiang Cai; Syd Barthorpe; Srikanth S Manda; Natasha Lucas; Alexandra Beck; Daniel Bucio-Noble; Michael Dausmann; Caitlin Hall; Michael Hecker; Jennifer Koh; Howard Lightfoot; Sadia Mahboob; Iman Mali; James Morris; Laura Richardson; Akila J Seneviratne; Rebecca Shepherd; Erin Sykes; Frances Thomas; Sara Valentini; Steven G Williams; Yangxiu Wu; Dylan Xavier; Karen L MacKenzie; Peter G Hains; Brett Tully; Phillip J Robinson; Qing Zhong; Mathew J Garnett; Roger R Reddel
Journal:  Cancer Cell       Date:  2022-07-14       Impact factor: 38.585

4.  Improved identification and quantification of peptides in mass spectrometry data via chemical and random additive noise elimination (CRANE).

Authors:  Akila J Seneviratne; Sean Peters; David Clarke; Michael Dausmann; Michael Hecker; Brett Tully; Peter G Hains; Qing Zhong
Journal:  Bioinformatics       Date:  2021-07-29       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.