| Literature DB >> 28831090 |
Nagarajan Kathiresan1, Ramzi Temanni2, Hakeem Almabrazi2, Najeeb Syed2, Puthen V Jithesh2, Rashid Al-Ali2.
Abstract
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.Entities:
Mesh:
Year: 2017 PMID: 28831090 PMCID: PMC5567265 DOI: 10.1038/s41598-017-09089-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Next Generation Sequencing analysis workflow for the discovery of new functional variants.
Computational steps, dependency conditions and their execution time in the NGS workflow.
| Step ID | Job step name | Application name | Application module | Input file name | Application parameters | Output file name | Recommended no. of cores | Job dependency condition | % of execution time |
|---|---|---|---|---|---|---|---|---|---|
| S1 | Map to Reference | BWA KIT | Seqtk, trimadap, SamTools, bwa mem, samblaster | *.fastq.gz | Default | *.bam | N/M | — | 6.5% |
| S2 | Build a standard BAM INDEX | sambamba | Index | *.bam | Default | *.bam.bai | 1 | S1 | 0.5% |
| S3 | Realigner TargetCreator | GATK | Target creator | *.aln.bam | − | *.realigner. intervals | 4 or 8 | S2 | 3% |
| S4 | Indel Realigner | GATK | INDEL | *aln.bam, *.realigner. intervals | − | *.realigned. bam | 1 | S3 | 2% |
| S5 | Base Recalibrator | GATK | Base Recalibration | *.realigned. bam | − | *.recal.table | N/M | S4 | 13% |
| S6 | Print Reads | GATK | Analyse the Reads | *.realigned. bam, *. recal.table | − | *.realigned. recal.bam | 2 or 4 | S5 | 25% |
| S7 | Haplotype Caller | GATK | Haplotype | *.realigned. recal.bam | − | *.raw.snps. indels.g.vcf | 4 or 8 | S6 | 43% |
| S8 | Variant Recalibrator | GATK | Variant recalibration | *.realigned. bam, *.recal.table | −T BaseRecalibrator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, −BQSR | *.after_recal. table | N | S5 | 6% |
| S9 | Analyze Covariates | GATK | Analyse the variant | *.recal.table, *.after_ recal. table | −T AnalyzeCovariates −before −after | *.recal_plots. pdf | 1 | S8 | 1% |
Where, N is the total number of cores and M is the number of CPUs.
Performance effect of Java, parallel garbage collection and temporary I/O directory tuning parameters.
| GATK version | Java version | Parallel GC option | Java temporary I/O directory | HaplotypeCaller execution time (in Hours) |
|---|---|---|---|---|
| GATK3.3_src | Java 1.7 | N/A | /tmp | 5.26 |
| Java 1.7 | Parallel GC = 32 | /tmp | 3.81 | |
| Java 1.7 | Parallel GC = 32 | /gpfs | 5.03 | |
| GATK3.3_src | Java 1.8 | Parallel GC = 32 | /gpfs | 4.78 |
| Java 1.8 | Parallel GC = 32 | — | 2.51 | |
| Java 1.8 | CMS GC | — | 2.52 | |
| GATK 3.7_jar | Java 1.8 | Parallel GC = 32 | — | 1.17 |
| GATK 3.7_src | Java 1.8 | Parallel GC = 32 | — | 1.14 |
Performance effect of CPU frequency scaling in the HaplotypeCaller.
| GATK version | Type of CPU frequency scaling | HaplotypeCaller execution time (in Hours) | Power (in KWh) |
|---|---|---|---|
| GATK 3.3_src | Performance | 2.51 | 1.34 |
| On-demand | 3.72 | 0.47 | |
| GATK 3.7_src | Performance | 1.14 | 0.59 |
| On-demand | 1.49 | 0.19 |
Performance effect of kernel shared memory.
| Kernel shared memory size | HaplotypeCaller execution time for GATK 3.3 (in Hours) | HaplotypeCaller execution time for GATK 3.7 (in Hours) |
|---|---|---|
| 4 GB | 4.36 | 1.39 |
| 8 GB | 3.01 | 1.39 |
| 64 GB | 2.84 | 1.18 |
| 128 GB | 2.51 | 1.14 |
Figure 2Performance impact of PairHMM vectorization in the GATK 3.3 HaplotypeCaller using architecture aware implementation.
Performance impact of PairHMM library with heap memory implementation in the GATK 3.3 HaplotypeCaller.
| Benchmarking case study name | PairHMM library | Heap memory prerequisite | Committed heap memory | CPU utilization | HaplotypeCaller execution time (in hours) |
|---|---|---|---|---|---|
| Without PairHMM and No heap memory | No | 0 GB | 14.3 GB | upto 40% | 2.51 |
| Without PairHMM and with heap memory | No | 128 GB | 48.5 GB | upto 73% | 2.34 |
| With PairHMM and No heap memory | Yes | 0 GB | 12 GB | upto 58% | 1.18 |
| With PairHMM and with heap memory | Yes | 128 GB | 50.3 GB | upto 72% | 0.912 |
Performance effect of PairHMM library implementation in GATK 3.7 HaplotypeCaller.
| PairHMM library | Execution time of HaplotypeCaller using GCAT genome data (in Hours) | Execution time of HaplotypeCaller using Platinum genome data (in Hours) |
|---|---|---|
| No PairHMM library | 1.14 | 30.43 |
| VectorPairHMM library | 0.902 | 17.46 |
General recommendation for the generic HPC system.
| Parameter Name | Recommended value | Expected % performance improvement (in % | Remarks |
|---|---|---|---|
| Java heap size | upto 50% of main memory | N/A | GATK execution may fail due to insufficient memory when the heap size is small |
| Parallel garbage collection | upto 28% | N/A | |
| CMS garbage collection |
| upto 28% | Useful in modern HPC architecture |
| Java 1.8 | N/A | upto 52% | Don’t use java.io.tmpdir |
| CPU frequency scaling | Performance mode | upto 45% | By default, modern HPC architecture uses on-demand mode |
| Kernel shared memory | upto 50% of main memory | upto 48% | N/A |
| PairHmm library with heap memory | upto 145% | Use architecture specific libraries for GATK HaplotypeCaller |