| Literature DB >> 31703611 |
Jacob R Heldenbrand1, Saurabh Baheti2, Matthew A Bockol3, Travis M Drucker3, Steven N Hart4, Matthew E Hudson5,6, Ravishankar K Iyer7, Michael T Kalmbach3, Katherine I Kendig1, Eric W Klee4, Nathan R Mattson3, Eric D Wieben8, Mathieu Wiepert3, Derek E Wildman9,6, Liudmila S Mainzer10,11.
Abstract
BACKGROUND: Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance.Entities:
Keywords: Best practices; Cluster computing; Computational performance; GATK; Genomic variant calling; Parallelization
Mesh:
Year: 2019 PMID: 31703611 PMCID: PMC6842142 DOI: 10.1186/s12859-019-3169-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1GATK3.8 Thread Scalability. a Scalability of BaseRecalibrator, PrintReads and HaplotypeCaller. Sample: NA12878 WGS. Fold change refers to the fold difference in walltime between the new measurement when compared to the performance with a single thread ((newtime−baselinetime)/baselinetime). b Scalability of PrintReads, in more detail. Normally walltime should decrease with thread count, as the computation is performed in parallel by multiple threads. However, in the case of PrintReads the opposite is observed. The increasing walltime as a function of thread count therefore signifies poor scalability and explains the decreasing trend for PrintReads line on panel (a). Sample: NA12878 chr 21. Error bars denote 1 SD around the mean of three replicates
Fig. 2GATK4 thread scalability for Java parallel garbage collection. Sample: NA12878 WGS. The measurements at 1 PGC thread represent the default, meaning that PGC is not enabled. Error bars denote SD around the mean of three replicates. a MarkDuplicates. b BaseRecalibrator
Effects of asynchronous I/O settings on walltime (hours) in GATK4
| Async I/O activated? | |||
|---|---|---|---|
| Tool Name | no | all | only for samtools I/O |
| BaseRecalibrator | 4.07 | 2.95 | 2.88 |
| ApplyBQSR | 2.38 | 2.07 | 2.08 |
| HaplotypeCaller | 17.25 | 17.31 | 17.08 |
Sample: NA12878 WGS.
Fig. 3GATK4 thread scalability in HaplotypeCaller. Sample: NA12878 chr21. Error bars denote 1 SD around the mean of three replicates
Splitting the genome by chromosomes
Horizontal lines segregate the chunks. Numbers indicate the total number of nucleotides in each resultant chunk of data.
Fig. 4Effects of data-level parallelization in GATK3.8. Sample: NA12878 WGS. The “Baseline” was a naive approach where we gave each tool 40 threads (1 thread per core). The “Baseline Optimized” gave each tool 40 threads, except for PrintReads, which utilized 3 threads. MarkDuplicates and BaseRecalibrator were given 2 and 20 parallel garbage collection threads, respectively. “Split 2,” “Split 3,” etc. means that the aligned sorted BAM was split into 2, 3, etc. chunks, as shown in Table 2. Panel (a) shows experiments with chunks computing on the same node. In panel (b) computation was spread across nodes in groups of 3 chunks per node
Fig. 5Effects of data-level parallelization in GATK 4. All compute was kept within the same node. Sample: NA12878 WGS. “Split 2,” “Split 3,” etc. means that the aligned sorted BAM was split into 2, 3, etc. chunks, as shown in Table 2
Fig. 6GATK4 throughput testing. Total walltime was benchmarked while running multiple samples simultaneously on the same node. As more samples are placed on the node, the threads given to HaplotypeCaller were reduced accordingly. Sample: NA12878 WGS. a Total walltime for running a batch of many samples on the same node. b Number of samples effectively processed per hour
Summary of optimized parameter values
| Tool name | GATK3.8 | GATK4 | |||
|---|---|---|---|---|---|
| PGC | Tool threads | PGC | Async | AVX threads | |
| MarkDuplicates | 2 threads | 1 | 2 threads | N/A | N/A |
| BaseRecalibrator | 20 threads | -nct 40 | 20 threads | Yes for Samtools, No for Tribble | N/A |
| ApplyBQSR | off | -nct 3 | off | N/A | |
| HaplotypeCaller | off | -nt 1 -nct 39 | off | 8 | |
Financial costs per sample when running an optimized pipeline, based on AWS on-demand pricing as of August 2019: c5.9xlarge at $1.53 per hour and c5.18xlarge at $3.06 per hour
| GATK version | Splitting | Samples | Nodes | Walltime, hrs | c5.9xlarge | c5.18xlarge |
|---|---|---|---|---|---|---|
| GATK 4.0.1.2 | no splitting | 1 | 1 | 20.7 | $31.7 | $63.3 |
| GATK 3.8 | no splitting | 1 | 1 | 15.3 | $23.4 | $46.8 |
| GATK 3.8 | 12 chunks | 1 | 4 | 3.4 | $20.8 | $41.6 |
| GATK 3.8 | 6 chunks | 1 | 2 | 4.7 | $14.4 | $28.8 |
| GATK 3.8 | 16 chunks | 1 | 1 | 5.0 | $7.7 | $15.3 |
| GATK 4.0.1.2 | 16 chunks | 1 | 1 | 3.6 | $5.5 | $11.0 |
| GATK 4.0.1.2 | no splitting | 40 | 1 | 34.1 | $1.3 | $2.6 |
Configurations are sorted by cost.