Literature DB >> 34898836

Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data.

Fahad Saeed1, Muhammad Haseeb1, S S Iyengar1.   

Abstract

Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the existing parallel algorithms is Ω ( m n + 2 r q p ) , where m and n are the dimensions of the theoretical database matrix, q and r are dimensions of spectra, and p is the number of processors. We further prove that communication-optimal strategy with fast-memory M = m n + 2 q r p can achieve Ω ( 2 m n q p ) but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of provable, and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems.

Entities:  

Year:  2021        PMID: 34898836      PMCID: PMC8658624          DOI: 10.1016/j.jpdc.2021.11.001

Source DB:  PubMed          Journal:  J Parallel Distrib Comput        ISSN: 0743-7315            Impact factor:   3.734


  23 in total

1.  Parallel tandem: a program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem.

Authors:  Dexter T Duncan; Robertson Craig; Andrew J Link
Journal:  J Proteome Res       Date:  2005 Sep-Oct       Impact factor: 4.466

2.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers.

Authors:  Robert D Bjornson; Nicholas J Carriero; Christopher Colangelo; Mark Shifman; Kei-Hoi Cheung; Perry L Miller; Kenneth Williams
Journal:  J Proteome Res       Date:  2007-09-29       Impact factor: 4.466

3.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

Authors:  J K Eng; A L McCormack; J R Yates
Journal:  J Am Soc Mass Spectrom       Date:  1994-11       Impact factor: 3.109

4.  MRUniNovo: an efficient tool for de novo peptide sequencing utilizing the hadoop distributed computing framework.

Authors:  Chuang Li; Tao Chen; Qiang He; Yunping Zhu; Kenli Li
Journal:  Bioinformatics       Date:  2017-03-15       Impact factor: 6.937

5.  An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics.

Authors:  Muaaz Gul Awan; Fahad Saeed
Journal:  ACM BCB       Date:  2017-08

Review 6.  Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing.

Authors:  Mak A Saito; Erin M Bertrand; Megan E Duffy; David A Gaylord; Noelle A Held; William Judson Hervey; Robert L Hettich; Pratik D Jagtap; Michael G Janech; Danie B Kinkade; Dagmar H Leary; Matthew R McIlvin; Eli K Moore; Robert M Morris; Benjamin A Neely; Brook L Nunn; Jaclyn K Saunders; Adam I Shepherd; Nicholas I Symmonds; David A Walsh
Journal:  J Proteome Res       Date:  2019-03-12       Impact factor: 4.466

7.  High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry Data.

Authors:  Muhammad Haseeb; Fahad Saeed
Journal:  Nat Comput Sci       Date:  2021-08-20

8.  An improved peptide-spectral matching algorithm through distributed search over multiple cores and multiple CPUs.

Authors:  Jian Sun; Bolin Chen; Fang-Xiang Wu
Journal:  Proteome Sci       Date:  2014-04-11       Impact factor: 2.480

9.  MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics.

Authors:  Andy T Kong; Felipe V Leprevost; Dmitry M Avtonomov; Dattatreya Mellacheruvu; Alexey I Nesvizhskii
Journal:  Nat Methods       Date:  2017-04-10       Impact factor: 28.547

10.  SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions.

Authors:  Muhammad Usman Tariq; Fahad Saeed
Journal:  PLoS One       Date:  2021-10-29       Impact factor: 3.240

View more
  1 in total

1.  Communication-avoiding micro-architecture to compute Xcorr scores for peptide identification.

Authors:  Sumesh Kumar; Fahad Saeed
Journal:  Int Conf Field Program Log Appl       Date:  2021-10-12
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.