| Literature DB >> 27741559 |
Manor Askenazi1, Hisham Ben Hamidane2, Johannes Graumann2.
Abstract
The evolution of data exchange in Mass Spectrometry spans decades and has ranged from human-readable text files representing individual scans or collections thereof (McDonald et al., 2004) through the official standard XML-based (Harold, Means, & Udemadu, 2005) data interchange standard (Deutsch, 2012), to increasingly compressed (Teleman et al., 2014) variants of this standard sometimes requiring purely binary adjunct files (Römpp et al., 2011). While the desire to maintain even partial human readability is understandable, the inherent mismatch between XML's textual and irregular format relative to the numeric and highly regular nature of actual spectral data, along with the explosive growth in dataset scales and the resulting need for efficient (binary and indexed) access has led to a phenomenon referred to as "technical drift" (Davis, 2013). While the drift is being continuously corrected using adjunct formats, compression schemes, and programs (Röst et al., 2015), we propose that the future of Mass Spectrometry Exchange Formats lies in the continued reliance and development of the PSI-MS (Mayer et al., 2014) controlled vocabulary, along with an expedited shift to an alternative, thriving and well-supported ecosystem for scientific data-exchange, storage, and access in binary form, namely that of HDF5 (Koranne, 2011). Indeed, pioneering efforts to leverage this universal, binary, and hierarchical data-format have already been published (Wilhelm et al., 2012; Rübel et al., 2013) though they have under-utilized self-description, a key property shared by HDF5 and XML. We demonstrate that a straightforward usage of plain ("vanilla") HDF5 yields immediate returns including, but not limited to, highly efficient data access, platform independent data viewers, a variety of libraries (Collette, 2014) for data retrieval and manipulation in many programming languages and remote data access through comprehensive RESTful data-servers.Entities:
Keywords: HDF5; data exchange formats; mass spectrometry
Year: 2016 PMID: 27741559 PMCID: PMC6088231 DOI: 10.1002/mas.21522
Source DB: PubMed Journal: Mass Spectrom Rev ISSN: 0277-7037 Impact factor: 10.946
Figure 1HDFView enables cross‐platform access to a Shaduf generated HDF5 file.
Figure 2Accessing the same HDF5 file using both the Python and R scripting languages to produce a base peak chromatogram and an individual spectrum, respectively. The spectrum in question is an ms/ms spectrum of a doubly charged precursor ion at 858.92 m/z which corresponds to peptide DALSSVQESQVAQQAR from Bovine Apolipoprotein C‐III.
Figure 3The Lorikeet spectral viewer connected to HDF5‐JSON data provided by h5serv. The ms/ms spectrum being visualized is identical to the one accessed through an R script in the previous figure.