Literature DB >> 26801958

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.

Muaaz Gul Awan1, Fahad Saeed2.   

Abstract

MOTIVATION: Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks.
RESULTS: We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server.
AVAILABILITY AND IMPLEMENTATION: The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2016        PMID: 26801958     DOI: 10.1093/bioinformatics/btw023

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Bolt: a New Age Peptide Search Engine for Comprehensive MS/MS Sequencing Through Vast Protein Databases in Minutes.

Authors:  Amol Prakash; Shadab Ahmad; Swetaketu Majumder; Conor Jenkins; Ben Orsburn
Journal:  J Am Soc Mass Spectrom       Date:  2019-08-26       Impact factor: 3.109

2.  MaSS-Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms.

Authors:  Muaaz Gul Awan; Fahad Saeed
Journal:  Proteomics       Date:  2018-09-28       Impact factor: 3.984

3.  An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics.

Authors:  Muaaz Gul Awan; Fahad Saeed
Journal:  ACM BCB       Date:  2017-08

4.  GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data.

Authors:  Muaaz Gul Awan; Taban Eslami; Fahad Saeed
Journal:  Comput Biol Med       Date:  2018-08-16       Impact factor: 4.589

5.  Improved identification and quantification of peptides in mass spectrometry data via chemical and random additive noise elimination (CRANE).

Authors:  Akila J Seneviratne; Sean Peters; David Clarke; Michael Dausmann; Michael Hecker; Brett Tully; Peter G Hains; Qing Zhong
Journal:  Bioinformatics       Date:  2021-07-29       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.