| Literature DB >> 30999943 |
Simon P Sadedin1,2, Alicia Oshlack3,4.
Abstract
The vast quantities of short-read sequencing data being generated are often exchanged and stored as aligned reads. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Here we describe Bazam, a tool that efficiently extracts the original paired FASTQ from alignment files (BAM or CRAM format) in a format that directly allows efficient realignment. Bazam facilitates up to a 90% reduction in the time for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions for applications such as targeted region analyses, quality control, structural variant calling, and alignment comparisons.Entities:
Mesh:
Year: 2019 PMID: 30999943 PMCID: PMC6472072 DOI: 10.1186/s13059-019-1688-1
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Different configurations for using Bazam. a Simple realignment from one reference genome to another without intermediate storage or steps. b Extraction of filtered reads such as those overlapping a specific locus. Reads can be streamed to downstream tools directly, or stored in FASTQ format for further processing. c Sharded realignment allows for many copies of the aligner to run on different subsets of the data, greatly speeding up realignment
Comparison of run time, memory, and storage space between Bazam and alternative processes for realignment. Timings encompass the end to end process starting from read extraction and ending with completion of realigned and sorted BAM files
| Tool | Storage used | Memory | Effective Cores | Time |
|---|---|---|---|---|
| Sort-Extract-Realign | 282 GB | 20 GB | 16 | 13 h, 15 min |
| Picard SamToFastq | 148 GB | 78 GB | 16 | 16 h, 14 min |
| Biobambam bamtofastq | 149 GB | 30 GB | 16 | 15 h 30 min |
| Bazam (no sharding) | 68 GB | 28 GB | 16 | 14 h, 55 min |
| Bazam 10-way sharding | 102 GB | 20 GB | 160 | 1 h, 11 min |