Vandhana Krishnan1,2, Sowmithri Utiramerur3,4,5, Zena Ng6, Somalee Datta2,7, Michael P Snyder1,2, Euan A Ashley8,9,10. 1. Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA. 2. Stanford Center for Genomics and Personalized Medicine, Stanford University, Palo Alto, CA, USA. 3. Stanford Center for Genomics and Personalized Medicine, Stanford University, Palo Alto, CA, USA. sowmiu@gmail.com. 4. Clinical Genomics Program, Stanford Health Care, Stanford, CA, USA. sowmiu@gmail.com. 5. Roche Diagnostics Solutions, Research and Early Development, Pleasanton, CA, USA. sowmiu@gmail.com. 6. Clinical Genomics Program, Stanford Health Care, Stanford, CA, USA. 7. School of Medicine, Research IT - Technology and Digital Solutions, Stanford University, Redwood City, CA, USA. 8. Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA. euan@stanford.edu. 9. Department of Cardiovascular Medicine, Stanford University, Stanford, CA, USA. euan@stanford.edu. 10. Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. euan@stanford.edu.
Abstract
BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.
BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.
Entities:
Keywords:
Benchmarking; Docker; GIAB reference genomes; Germline variants; Lab developed tests; Precision; Recall; Truth set; Workflow
Authors: Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971
Authors: Justin M Zook; Brad Chapman; Jason Wang; David Mittelman; Oliver Hofmann; Winston Hide; Marc Salit Journal: Nat Biotechnol Date: 2014-02-16 Impact factor: 54.908
Authors: Helen V Firth; Shola M Richards; A Paul Bevan; Stephen Clayton; Manuel Corpas; Diana Rajan; Steven Van Vooren; Yves Moreau; Roger M Pettett; Nigel P Carter Journal: Am J Hum Genet Date: 2009-04-02 Impact factor: 11.025
Authors: Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent Journal: Nucleic Acids Res Date: 2010-10-18 Impact factor: 16.971
Authors: Brian D O'Connor; Denis Yuen; Vincent Chung; Andrew G Duncan; Xiang Kun Liu; Janice Patricia; Benedict Paten; Lincoln Stein; Vincent Ferretti Journal: F1000Res Date: 2017-01-18
Authors: Michael S Watson; Garry R Cutting; Robert J Desnick; Deborah A Driscoll; Katherine Klinger; Michael Mennuti; Glenn E Palomaki; Bradley W Popovich; Victoria M Pratt; Elizabeth M Rohlfs; Charles M Strom; C Sue Richards; David R Witt; Wayne W Grody Journal: Genet Med Date: 2004 Sep-Oct Impact factor: 8.822