| Literature DB >> 18930952 |
John T Prince1, Edward M Marcotte.
Abstract
UNLABELLED: Mass spectrometry-based proteomics stands to gain from additional analysis of its data, but its large, complex datasets make demands on speed and memory usage requiring special consideration from scripting languages. The software library 'mspire'-developed in the Ruby programming language-offers quick and memory-efficient readers for standard xml proteomics formats, converters for intermediate file types in typical proteomics spectral-identification work flows (including the Bioworks .srf format), and modules for the calculation of peptide false identification rates. AVAILABILITY: Freely available at http://mspire.rubyforge.org. Additional data models, usage information, and methods available at http://bioinformatics.icmb.utexas.edu/mspireEntities:
Mesh:
Year: 2008 PMID: 18930952 PMCID: PMC2639276 DOI: 10.1093/bioinformatics/btn513
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) Overview of mspire functionality. Black arrows and gray boxes depict mspire functionality. From left to right, mspire creates randomized databases (DBs) for FIR determination. MS::MSRun is a unified model for working with LC-MS/MS datasets. The Bioworks search engine produces peptide spectral matches (PSMs) in a .srf binary file or XML format. mspire extracts PSMs and presents them via a simple interface, SpecID, while preserving access to the underlying data structures. FIRs can be determined with various downstream software tools and reread into SpecID objects. SBV, sample bias validation. (B) mspire uses Arrayclass objects for efficient memory usage. GC, garbage collection; AC, Array-class; AF, Arrayfields; class, a traditional ruby object; SStruct, SuperStruct. (C) Lazy evaluation of spectra allows very large files to be read quickly. Shown are the times to read all 7830 well-formed mzXML files from PeptideAtlas and access two spectra for ‘io’ and ‘string’ lazy evaluation methods. A total of 181 files >350 MB in size were not read with the ‘string’ option. (D) Object model for capturing MS runs. (E) 3: an MSRun object can be instantiated with several lazy evaluation schemes. 4: typical instantiation. 6–8: total number of scans, the number of MS scans, and the number of MS/MS scans. 9: retrieves the start and end m/z values for all MS/MS scans. 11: a Ruby block that selects only MS/MS scans. 13–16: the scans are mapped to intensities; the block (designated between the ‘do’ and ‘end’ receives the scan object and returns the value of the last line, which is collected as an array (list_of_intensities). 14–15: chained method calls (equivalent to calling prc.intensity).