| Literature DB >> 24498629 |
Francesco Lescai1, Elena Marasco2, Chiara Bacchelli3, Philip Stanier3, Vilma Mantovani4, Philip Beales3.
Abstract
The choice of an appropriate variant calling pipeline for exome sequencing data is becoming increasingly more important in translational medicine projects and clinical contexts. Within GOSgene, which facilitates genetic analysis as part of a joint effort of the University College London and the Great Ormond Street Hospital, we aimed to optimize a variant calling pipeline suitable for our clinical context. We implemented the GATK/Queue framework and evaluated the performance of its two callers: the classical UnifiedGenotyper and the new variant discovery tool HaplotypeCaller. We performed an experimental validation of the loss-of-function (LoF) variants called by the two methods using Sequenom technology. UnifiedGenotyper showed a total validation rate of 97.6% for LoF single-nucleotide polymorphisms (SNPs) and 92.0% for insertions or deletions (INDELs), whereas HaplotypeCaller was 91.7% for SNPs and 55.9% for INDELs. We confirm that GATK/Queue is a reliable pipeline in translational medicine and clinical context. We conclude that in our working environment, UnifiedGenotyper is the caller of choice, being an accurate method, with a high validation rate of error-prone calls like LoF variants. We finally highlight the importance of experimental validation, especially for INDELs, as part of a standard pipeline in clinical environments.Entities:
Keywords: GATK; pipelines; sequencing; variant calling
Year: 2013 PMID: 24498629 PMCID: PMC3907911 DOI: 10.1002/mgg3.42
Source DB: PubMed Journal: Mol Genet Genomic Med ISSN: 2324-9269 Impact factor: 2.183
Figure 1Comparison of the numbers of variants called by the UnifiedGenotyper and HaplotypeCaller. The specific variants identified by UnifiedGenotyper and HaplotypeCaller are shown as numbers within each circle. Variants common to both methods are in purple, whereas those unique to UnifiedGenotyper are in green and unique to HaplotyeCaller are in yellow. These represent (A) the total number of novel SNPs, (B) novel INDELs, (C) novel LoF SNPs, and (D) novel LoF INDELs (D).
Validation of variants by caller comparison
| Outcome | Intersection | UnifiedGenotyper only | HaplotypeCaller only |
|---|---|---|---|
| SNPs | |||
| Validated | 97 (98.0%) | 27 (96.4%) | 2 (22.2%) |
| Not validated | 2 (2.0%) | 1 (3.6%) | 7 (77.8%) |
| Fail | 3 | 0 | 1 |
| Total number of assays | 102 | 28 | 10 |
| Total number of working assays | 99 | 28 | 9 |
| INDELs | |||
| Validated | 35 (92.1%) | 11 (91.7%) | 3 (10.0%) |
| Not validated | 3 (7.9%) | 1 (8.3%) | 27 (90.0%) |
| Fail | 4 | 0 | 12 |
| Total number of assays | 42 | 12 | 42 |
| Total number of working assays | 38 | 12 | 30 |
The validation rates of LoF SNPs and INDEL calls from both methods (intersection) or uniquely called by UnifiedGenotyper or HaplotypeCaller. The failure rate of validation assays (fail) on the genotyping chip is given.
Figure 2Validation rates by caller comparison. The pies show the overall validation rate for LoF SNPs and INDELs called by UnifiedGenotyper and by HaplotypeCaller. HaplotypeCaller INDELs showed the lowest validation rate of 55.9% of the called variants (D).