Mehrab Ghanat Bari1, Nelson Ramirez2, Zhiwei Wang2, Jianqiu Michelle Zhang1. 1. Department of Electrical and Computer Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX, 78249, USA. 2. Computational Biology Initiative, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX, 78249, USA.
Abstract
RATIONALE: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously. METHODS: We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft. RESULTS: The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases. CONCLUSION: MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.
RATIONALE: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously. METHODS: We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft. RESULTS: The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases. CONCLUSION: MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.
Authors: Kevin Blackburn; Flaubert Mbeunkui; Srijeet K Mitra; Tobias Mentzel; Michael B Goshe Journal: J Proteome Res Date: 2010-07-02 Impact factor: 4.466
Authors: Björn Voss; Michael Hanselmann; Bernhard Y Renard; Martin S Lindner; Ullrich Köthe; Marc Kirchner; Fred A Hamprecht Journal: Bioinformatics Date: 2011-02-03 Impact factor: 6.937
Authors: David C Trudgian; Gabriela Ridlova; Roman Fischer; Mukram M Mackeen; Nicola Ternette; Oreste Acuto; Benedikt M Kessler; Benjamin Thomas Journal: Proteomics Date: 2011-06-08 Impact factor: 3.984
Authors: Eric W Deutsch; Luis Mendoza; David Shteynberg; Terry Farrah; Henry Lam; Natalie Tasman; Zhi Sun; Erik Nilsson; Brian Pratt; Bryan Prazen; Jimmy K Eng; Daniel B Martin; Alexey I Nesvizhskii; Ruedi Aebersold Journal: Proteomics Date: 2010-03 Impact factor: 3.984