| Literature DB >> 26124572 |
Christopher Leela Biji1, Manu K Madhu2, Vineetha Vishnu3, Satheesh Kumar K4, Achuthsankar S Nair1.
Abstract
UNLABELLED: The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk"foot print" of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD. AVAILABILITY: The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/.Entities:
Keywords: Big data storage; Genome Analysis; Genome compression; Parallel Computing; Sequence analysis
Year: 2015 PMID: 26124572 PMCID: PMC4464544 DOI: 10.6026/97320630011267
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Flow Chart of Compression of Large Genome Dataset using COMRAD on Parallel Computing Platform
Figure 2An example showing the code book generation for an input string with string length, L=2 and threshold frequency F=2