| Literature DB >> 25309967 |
Himanshu Grover1, Vanathi Gopalakrishnan2.
Abstract
Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.Entities:
Keywords: Bioinformatics; High-throughput Proteomics; Indexing; Multiprocessing; Parallelization
Year: 2012 PMID: 25309967 PMCID: PMC4190677 DOI: 10.4108/icst.collaboratecom.2012.250716
Source DB: PubMed Journal: Int Conf Collab Comput